With everything a Content Delivery Network (CDN) does, it can certainly seem like a massive and confusing piece of internet technology. However, when it comes to caching content (one of its most important duties) it turns out CDNs aren’t so different from most people in the sense that it gets pretty much all the information it needs from headers.
Think of the way you cruise a news website. You see a headline that reads Canadian Man Caught Smuggling 38 Turtles in Pants. Do you click on the article? Of course not. All the information you need is in the header. A man had 38 turtles in his pants!
Similarly (in a way), all the information a CDN needs in order to cache content and provide improved user experience on a website and save the website owner money on bandwidth is in the cache headers. Keep reading for everything you need to know about content caching, how cache headers work, different types of cache headers, and how a CDN comes into play.
CDNs and caching, in general
A CDN is a global network of servers designed to deliver your website content to your users as quickly as possible. In addition to redirecting users to the server closest to them, CDNs rely on caching to get that content delivered fast. When a user requests a web page or content from your site, it is retrieved for them from the closest proxy cache server where it is stored. This eliminates the time it would take for the request to reach the origin server and for the server to send back the requested content.
The most easily cacheable content is what’s referred to as static content—content on your website that is not expected to change over time, nor does it differ depending on who your users are. So a homepage that looks the same for everyone who visits that homepage is largely made up of static content, while a member-only page that a user logs into would be made up of dynamic files, as that page would have to be generated on the fly based on information in the database relating to that specific user.
All CDNs cache static content, and advanced CDNs can also cache dynamic content—for the period in which that dynamic content is expected to remain unchanged. And how does a CDN know how to treat all of your website’s content files in regards to caching?
It’s all in the headers
HTTP cache headers are used by web developers to not only identify cacheable content, but also to set rules for that content caching, such as the duration for which that content can be cached before it needs to be fetched from the origin server. Here are the most relevant cache headers when it comes to your web content.
- Expires: This one is pretty straightforward. It sets the point in the future when the content will “expire,” requiring the CDN to fetch the content from the origin server. This header was used to guarantee that, though cached, your content would be as fresh as possible.
- Cache control: This header has largely replaced the expires header thanks to its flexibility and wider scope. Not only does cache control dictate when the content has to be re-fetched from the origin server, but it also allows you to mark content as public or private, to mark content as no-cache, specifying that content must be revalidated before being delivered to a user, and to mark content as no-store, which keeps sensitive data like banking information from being stored anywhere, among other instructions.
- Etag: Etag headers provide your content with unique identifiers that allow for more sophisticated sorting and individual labeling. It also eliminates unnecessary content refreshment. When your content expires according to your ‘expires’ or ‘cache control’ headers, the proxy server can send its etag back to the origin server to check if the content has been changed, or if it remains the same. If it has remained the same, no content refreshment is necessary.
- Vary: These headers are used to manage multiple versions of the same content. Uncompressed files stored alongside compressed versions of the same files, for instance. Some browsers can’t handle compressed content, so the vary header would help get that content served in its uncompressed format. Though the vary header can be an effective tool, not every browser can handle it correctly, and it needs to be used judiciously.
- Surrogate: Surrogate headers provide you with better control over cache policies, allowing you to set policies with the authority of the origin server.
- Pragma: Pragma headers were previously used to set caching instructions for browsers. Similar to expires headers, pragma headers have been replaced by cache control.
Getting your headers in the game
If tagging your content files with cache headers sounds like something you frankly haven’t done, if you’re using a high-quality CDN, you don’t need to worry. Advanced, modern CDNs have intelligent cache control. As CDN provider Imperva Incapsula says in its guide to CDN caching, intelligent cache control is predictive through learning. Advanced CDNs are developing learning-based processes for monitoring, categorizing and caching a wider range of content to help eliminate the hands-on part of the caching process, saving you time and upping your caching efficiency.
So not only is a CDN responsible for serving up your content to your users as quickly as possible, improving your website user experience and removing the burden from your origin server in order to cut down on bandwidth costs, but it can also essentially take care of your caching strategy for you. Largely thanks to those all-important headers which handily convey more important information than even the best “Turtles in Pants” headline ever could.