CDN#
The amazee.io Content Delivery Network (CDN) is an opinionated implementation of Fastly's CDN.
How it works#
The amazee.io CDN sits in front of you web application, and acts as a caching proxy, and security defence layer. Generally, the more traffic your site gets, the more important it is for it to have a high cache hit rate.
There is no need to manually specify the PoPs to route through, this is done automatically through a system called Anycast routing. In practice, this means you are routed to the PoP that is closest to you (latency wise) that is healthy.
Architecture#
Shielding is made use of to increase your cache hit rate. This means your request can go through 1 or more Points of Presence (PoPs) before reaching origin.
graph LR
A[User] --> B[Edge PoP];
B --> C[Shield PoP];
C ----> D[Origin];
The shield PoP will be chosen for you based on which geographic location your cluster is.
Alerting for anomalous traffic#
amazee.io has an enterprise support subscription with Fastly, and access to their Mission Control service. This service will alert amazee.io to several things, including:
- Increase in HTTP 50x (including the start date, and which PoPs were impacted)
- Increase in HTTP 404s (including the start date, and which PoPs were impacted)
- Increase in overall requests
- Performance related issues impacting Fastly PoPs
amazee.io may raise a separate ticket with you if the alert was noteworthy, or some action was required to be undertaken to mitigate the threat.
How to configure Drupal for cache purging#
See the Lagoon documentation.
Debugging#
You can debug what the CDN is doing by reviewing the response HTTP headers. There is also an additional debug HTTP request header that can be sent, if you wish to see even deeper statistics.
Here is a helper function for your shell:
function curlf() { curl -sLIXGET -H 'Fastly-Debug:1' "$@" | grep -iE 'X-Cache|Cache-Control|Set-Cookie|X-Varnish|X-Hits|Vary|Fastly-Debug|X-Served|surrogate-control|surrogate-key' }
And an example of running it:
$ curlf https://www.example-site-fastly.com
cache-control: max-age=601, public, s-maxage=2764800
surrogate-control: max-age=2764800, public, stale-while-revalidate=3600, stale-if-error=3600
fastly-debug-path: (D cache-wlg10427-WLG 1612906144) (F cache-wlg10426-WLG 1612906141) (D cache-fra19179-FRA 1612906141) (F cache-fra19122-FRA 1612906141)
fastly-debug-ttl: (H cache-wlg10427-WLG - - 3) (M cache-fra19179-FRA - - 0)
fastly-debug-digest: 1118d9fefc8a514ca49d49cb6ece04649e1acf1663398212650bb462ba84c381
x-served-by: cache-fra19179-FRA, cache-wlg10427-WLG
x-cache: MISS, HIT
x-cache-hits: 0, 1
vary: Cookie, Accept-Encoding
From the above headers we can see that:
- The HTML page is cacheable
- Browsers will cache the page for 601 seconds
- Fastly will cache the page for 32 days (2,764,800 seconds)
- Tiered caching is in effect (edge PoP in Wellington, and shield PoP in France)
- The HTML page was a cache hit at the edge PoP
Alternatively, you can use a site like Caching Score to which will send the debug header on your behalf, and analyse the response for you.
Server timing#
We also send Server Timing headers, so that your browser is able to paint this timing information as well.
The header can be duplicated due to shielding.
Fastly Controller#
There is an application running in all amazee.io clusters that is responsible for syncing domains and TLS certificates up to Fastly. How the controller knows how to do this mapping is with an environment variable.
See the cloud documentation on Fastly Service configuration.
Common issues we see#
If you are moving to amazee.io from an existing relationship with Fastly (e.g. Pantheon, or Platform.sh), then there is an extra step needed in order for us to 'claim' the domain.
Please reach out to our support team well before go live, and they will be able to walk you through how to verify the domain with a TXT
record in DNS.
Cache tags#
It is possible for your application to send cache tags in an HTTP response header called Surrogate-Key
. See the Fastly documentation on surrogate keys.
In practice, this gives you flexibility to purge certain parts of your application, without purging the entire cache. e.g. you could attach a tag called news
to all news articles on your site.
Drupal has an advanced caching system that is able to leverage cache tags.
At amazee.io, we set a cache tag of the domain name being requested automatically. This makes it very convenient to purge just a single site from your Fastly service and leave the others alone.
Purging#
If you ever want to remove a specific object from cache manually, there are a few ways to do this. We do enforce authentication to do any purges though, so you will need access to a Fastly API token.
You can purge just a single URL:
curl -Ssi -XPURGE -H 'Fastly-Soft-Purge:1' -H "Fastly-Key:$FASTLY_API_TOKEN" https://www.example.com/subpage
You can also purge by cache tag:
curl -XPOST -H 'Fastly-Soft-Purge:1' -H "Fastly-Key:$FASTLY_API_TOKEN" https://api.fastly.com/service/$FASTLY_API_SERVICE/purge/<surrogatekey>
You are also able to drop the entire cache for the service, but this should be used extremely sparingly (as multiple websites can run through a single Fastly service).
You can also use the Fastly CLI which helps to make this a little nicer.
What is soft purging?
Soft purging only marks the object as expired in the CDN, but does not delete it. This means it can still be used later in the event the origin is unhealthy (e.g. during a deployment). In general, you should prefer soft purging. For more reading, see this blog post on soft purging.
How to purge a domain in a post-deploy task in Lagoon#
Sometimes it comes up that you want to purge your assets from the CDN after you do a deployment. This is pretty simple given a few environment variables:
- run:
name: Clear CDN cache
command: wget -qO- --header="Fastly-Soft-Purge:1" --header="Fastly-Key:$FASTLY_KEY" --post-data="" "https://api.fastly.com/service/${LAGOON_FASTLY_SERVICE_ID%:*}/purge/${LAGOON_ROUTE#*://}"
service: cli
shell: bash
Why use wget?
wget
is used here as this is installed in the Lagoon base images by default. If you have curl
installed, then by all means use that.
Optimising the cache hit rate#
There are several things we do to increase the cache hit rate at the CDN.
Cookie stripping#
We strip cookies that we believe are not useful for applications, and try to leave behind session cookies. If you want a cookie to always be sent to origin, and not to be stripped, then you can make the cookie name match one of the following regex patterns:
wordpress_[a-z0-9_]+
[a-z0-9_]+_session
S?SESS[a-z0-9]+
[A-Z]+_UUID
There are also a few exceptions in place for Simple SAML, CraftCMS, Laravel, Matomo, Ghost, Next.js and OAuth2.
Warning
Having any cookie that is not stripped will force a cache miss at the CDN. So be careful about when your application sets one of these cookies. Ideally the application only sets a session cookie after a user has successfully logged in for example.
Query parameter stripping#
We also remove query parameters from the URI, to increase the chances of a cache hit at the CDN:
utm_*
fbclid
dclid
gclid
wbraid
_ga
_gl
gclsrc
These query parameters are often used in marketing (e.g. paid advertising), and often contain unique values for every user and click. Often these query parameters are only read in Javascript on the page, and thus there is no need for the backend application to be aware they exist.
The original URI is sent in a request header X-Original-Url
in case you must see these query parameters in your backend application.
Basic filtering#
As a part of the CDN, we do block a number of requests that are from known bad bots, and other requests that are of no value to any application. On average this is around 15% of all requests, but it varies.
For example, we block any request to /Autodiscover.xml
to which otherwise Microsoft Excel would be sending POST requests to your origin.
Is this a fully fledged WAF offering?
This is not a fully fledged WAF, just basic filtering of junk requests. If you require an actual WAF, we have a separate product for that - the amazee.io Advanced WAF.
Custom block rules#
We are able to block all manner of requests with the amazee.io CDN, especially if they are impacting your application (e.g. a bot is out of control crawling). We are able to block by any request attribute, but common ones are:
- Client IP/CIDR - good for blocking small targeted attacks and crawls
- Country - useful for things like embargoes, and blocking traffic from countries that have no legitimate business visiting the site
- User-agent - useful for matching bots, e.g.
Bytespider
andClaudeBot
. - ASN - useful to prevent wider classes of attacks, e.g. blocking Alibaba
AS45102
which represents > 3 million individual IP addresses.
Our support team will be able to review the logs to ensure the best possible block is put in place.
Logging#
With a dedicated CDN service, we are able to forward all your CDN request logs to an endpoint of your choosing. A lot of the standard logging services are supported out of the box, please raise a ticket with our support team if you would like to do this.
Commonly requested endpoints to send logs to:
- ElasticSearch (or OpenSearch)
- New Relic
- Sumologic
- Logz.io
We do also send the CDN logs to our ElasticSearch (mainly for billing purposes), however these logs are not in customer facing.