Configuring Downstream Caches
Standard Configuration
Note: This feature is currently experimental. Options and configuration described here are subject to change in future releases. Please subscribe to the announcements mailing list to keep yourself informed of updates to this feature.
By default PageSpeed serves HTML files with Cache-Control: no-cache,
max-age=0
so that changes to the HTML and its resources are sent
fresh on each request. The HTML can be cached, however, if you:
- Set up a
PURGE
handler in your cache. - Tell PageSpeed the url for the
PURGE
handler. - Have the cache set the
PS-CapabilityList
header so PageSpeed will emit HTML that can be sent to any browser. - Have the cache occasionally pass through requests to the origin with
the
PS-ShouldBeacon
header set.
For example, if you're running a cache on port 80 that reverse proxies to
your site on port 8080, then you'd need to tell PageSpeed to send
its PURGE
requests to port 80:
- Apache:
ModPagespeedDownstreamCachePurgeLocationPrefix http://localhost:80
- Nginx:
pagespeed DownstreamCachePurgeLocationPrefix http://localhost:80;
You also need to give PageSpeed a key so it can allow the cache to request rebeaconing without allowing external entities to do so:
- Apache:
ModPagespeedDownstreamCacheRebeaconingKey "<your-secret-key>"
- Nginx:
pagespeed DownstreamCacheRebeaconingKey "<your-secret-key>";
These are the only changes you need to make to the PageSpeed configuration file, but before you restart you also need to make some changes to your cache configuration. These vary by cache; below are configurations for Varnish 3.x, Varnish 4.x, and Nginx's proxy_cache:
- Varnish 3.x:
acl purge { # If PageSpeed isn't running on the same server as your cache, list the IP(s) # of the PageSpeed machine(s) here. "127.0.0.1"; } sub vcl_recv { # Tell PageSpeed not to use optimizations specific to this request. set req.http.PS-CapabilityList = "fully general optimizations only"; # Don't allow external entities to force beaconing. remove req.http.PS-ShouldBeacon; # Authenticate the purge request by IP. if (req.request == "PURGE") { if (!client.ip ~ purge) { error 405 "Not allowed."; } return (lookup); } } # Mark HTML as uncacheable. If we can't send them purge requests they can't # cache our html. sub vcl_fetch { if (beresp.http.Content-Type ~ "text/html") { remove beresp.http.Cache-Control; set beresp.http.Cache-Control = "no-cache, max-age=0"; } return (deliver); } sub vcl_hit { # Make purging happen in response to a PURGE request. This happens # automatically in Varnish 4.x so we don't need it there. if (req.request == "PURGE") { purge; error 200 "Purged."; } # 5% of the time ignore that we got a cache hit and send the request to the # backend anyway for instrumentation. if (std.random(0, 100) < 5) { set req.http.PS-ShouldBeacon = "<your-secret-key>"; return (pass); } } sub vcl_miss { # Make purging happen in response to a PURGE request. This happens # automatically in Varnish 4.x so we don't need it there. if (req.request == "PURGE") { purge; error 200 "Purged."; } # Instrument 25% of cache misses. if (std.random(0, 100) < 25) { set req.http.PS-ShouldBeacon = "<your-secret-key>"; return (pass); } }
- Varnish 4.x:
acl purge { # If PageSpeed isn't running on the same server as your cache, list the IP(s) # of the PageSpeed machine(s) here. "127.0.0.1"; } sub vcl_recv { # Tell PageSpeed not to use optimizations specific to this request. set req.http.PS-CapabilityList = "fully general optimizations only"; # Don't allow external entities to force beaconing. unset req.http.PS-ShouldBeacon; # Authenticate the purge request by IP. if (req.method == "PURGE") { if (!client.ip ~ purge) { return (synth(405,"Not allowed.")); } return (purge); } } # Mark HTML as uncacheable. If we can't send them purge requests they can't # cache our html. sub vcl_backend_response { if (beresp.http.Content-Type ~ "text/html") { unset beresp.http.Cache-Control; set beresp.http.Cache-Control = "no-cache, max-age=0"; } return (deliver); } sub vcl_hit { # 5% of the time ignore that we got a cache hit and send the request to the # backend anyway for instrumentation. if (std.random(0, 100) < 5) { set req.http.PS-ShouldBeacon = "<your-secret-key>"; return (pass); } } sub vcl_miss { # Instrument 25% of cache misses. if (std.random(0, 100) < 25) { set req.http.PS-ShouldBeacon = "<your-secret-key>"; return (pass); } }
- Nginx proxy_cache:
http { # Define a mapping used to mark HTML as uncacheable. map $upstream_http_content_type $new_cache_control_header_val { default $upstream_http_cache_control; "~*text/html" "no-cache, max-age=0"; } server { # PageSpeed's beacon dependent filters need the cache to let some requests # through to the backend. This code below depends on the ngx_set_misc # module and randomly passes 5% of traffic to the backend for rebeaconing. set $should_beacon_header_val ""; set_random $rand 0 100; if ($rand ~* "^[0-4]$") { set $should_beacon_header_val "<your-secret-key>"; set $bypass_cache 1; } location / { # existing proxy_pass # existing proxy_cache # existing proxy_cache_key # What servers should we accept PURGE requests from? If PageSpeed isn't # running on the same server as your cache, list the IP(s) of the # PageSpeed machine(s) here. # # This requires rebuilding with the ngx_cache_purge module: # https://github.com/FRiCKLE/ngx_cache_purge proxy_cache_purge PURGE from 127.0.0.1; # Mark HTML as uncacheable. If we can't send them purge requests they # can't cache our html. Uses the map defined above. proxy_hide_header Cache-Control; add_header Cache-Control $new_cache_control_header_val; # Tell PageSpeed not to use optimizations specific to this request. proxy_set_header PS-CapabilityList "fully general optimizations only"; # See discussion of rebeaconing above. proxy_cache_bypass $bypass_cache; proxy_hide_header PS-ShouldBeacon; proxy_set_header PS-ShouldBeacon $should_beacon_header_val; } } }
When running with downstream caching all resources referenced from the HTML
will be cache-extended as usual, so if you have resources that need to be
cached for a short time then they can be stale. If so,
either Disallow
those resources, so PageSpeed doesn't inline or
cache-extend them, or decrease the cache lifetime on your HTML.
Additional Options
The configuration above should be a good fit for most sites, but PageSpeed's downstream caching is highly configurable with many options that allow you to tweak it for your particular setup.Beaconing
Several filters such as
inline_images,
inline_preview_images,
lazyload_images and
prioritize_critical_css
depend extensively on client beacons to determine critical images and
CSS. When such filters are enabled, pages periodically have beaconing
JavaScript inserted as part of the rewriting process.
The standard configuration passes through 5% of cache
hits to the backend with a PS-ShouldBeacon
header set, so that
these filters can continue to receive the beacons they need.
If you have a high traffic site, 5% is probably a larger share than you need for PageSpeed to receive sufficient beacons. In that case you can decrease the percentage of traffic to pass through. For example, here's how you'd decrease it to 2%:
- Varnish 3.x or 4.x:
- if (std.random(0, 100) < 5) { + if (std.random(0, 100) < 2) {
- Nginx proxy_cache
- if ($rand ~* "^[0-4]$") { + if ($rand ~* "^[01]$") {
Alternatively, you may be willing to give up the benefit of the beaconing-dependent filters in exchange for never intentionally bypassing the cache. If so, you should turn off beaconing and beacon-dependent filters in PageSpeed:
- Apache:
ModPagespeedCriticalImagesBeaconEnabled false ModPagespeedDisableFilters prioritize_critical_css
- Nginx:
pagespeed CriticalImagesBeaconEnabled false; pagespeed DisableFilters prioritize_critical_css;
Additionally you should remove the proxy config that handles beaconing:
- Varnish 3.x:
- remove req.http.PS-ShouldBeacon; ... - if (std.random(0, 100) < 5) { - set req.http.PS-ShouldBeacon = "<your-secret-key>"; - return (pass); - } ... - if (std.random(0, 100) < 25) { - set req.http.PS-ShouldBeacon = "<your-secret-key>"; - return (pass); - }
- Varnish 4.x:
- unset req.http.PS-ShouldBeacon; ... - sub vcl_hit { - if (std.random(0, 100) < 5) { - set req.http.PS-ShouldBeacon = "<your-secret-key>"; - return (pass); - } - } - sub vcl_miss { - if (std.random(0, 100) < 25) { - set req.http.PS-ShouldBeacon = "<your-secret-key>"; - return (pass); - } - }
- Nginx proxy_cache
- set $should_beacon_header_val ""; - set_random $rand 0 100; - if ($rand ~* "^[0-4]$") { - set $should_beacon_header_val "<your-secret-key>"; - set $bypass_cache 1; - } ... - proxy_cache_bypass $bypass_cache; - proxy_hide_header PS-ShouldBeacon; - proxy_set_header PS-ShouldBeacon $should_beacon_header_val;
PageSpeed Resources
Because PageSpeed already caches its optimized resources, you may want to exclude them caching by the downstream cache. If so, you can set:
- Varnish 3.x and 4.x:
+ if (req.url ~ "\.pagespeed\.([a-z]\.)?[a-z]{2}\.[^.]{10}\.[^.]+") { + return (pass); + }
- Nginx proxy_cache
+ if ($uri ~ "\.pagespeed\.([a-z]\.)?[a-z]{2}\.[^.]{10}\.[^.]+") { + set $bypass_cache "1"; + }
If you have enabled URL signing,
change the 10
in the regexp to 20
to account for the
additional characters in the hash.
PS-CapabilityList
Typically PageSpeed will produce different HTML for different browsers. For
example, when responding to a request that has Accept:
image/webp
, PageSpeed knows the requesting browser supports WebP and so
it can send these images, while if the Accept
header doesn't
mention WebP then it will send JPEG or PNG. To suppress this behavior,
the standard configuration above sets a header:
PS-CapabilityList: fully general optimizations only
This header can also be used to tell PageSpeed to make specific optimizations. There are five capabilities PageSpeed can take advantage of that aren't supported in all browsers, and it gives them each a code:
Capability | Code |
---|---|
Inline Images | ii
|
Lazyload Images | ll
|
WebP Images | jw
|
Lossless WebP Images | ws
|
Animated WebP Images | wa
|
Defer Javascript | dj
|
For example, you could include whether the Accept
header
includes image/webp
in your cache key, and then for the
fraction of traffic that claimed webp support send:
PS-CapabilityList: jw:
Every page would go through to your origin twice and be cached twice, once processed with WebP support and once without.
You can combine multiple capabilities together with a comma. For example, if you decided to make a cache fragment for Chrome 30+, which supports all of these, for that fragment you would send:
PS-CapabilityList: ll,ii,dj,jw,ws:
For Firefox 4+, which supports all of these but WebP, you would send:
PS-CapabilityList: ll,ii,dj:
To use this header properly, however, you have to know which capabilities are
supported by which browsers in the version of PageSpeed you're using and craft
regular expressions to match exactly those ones. This is very difficult to do
in general because it involves duplicating the code in
user_agent_matcher.cc
as regexes, but a simple division is:
- Chrome 32+:
ll,ii,dj,jw,wa,ws
- Firefox 4+, Safari, IE10 (but not IE11):
ll,ii,dj
- Everything else:
fully general optimizations only
Purging with GET
If you're integrating PageSpeed with a cache that doesn't
support PURGE
requests but does support purging in response to a
prefixed GET
request, PageSpeed can support that. You would
configure your cache to treat a GET
to
/purge/foo/bar
as a request to purge /foo/bar
and
configure PageSpeed as:
- Apache:
ModPagespeedDownstreamCachePurgeLocationPrefix http://CACHE-HOST:PORT/purge ModPagespeedDownstreamCachePurgeMethod GET
- Nginx:
pagespeed DownstreamCachePurgeLocationPrefix http://CACHE-HOST:PORT/purge; pagespeed DownstreamCachePurgeMethod GET;
Purge Threshold
Whenever PageSpeed serves an HTML response that is not fully optimized it continues rewriting in the background. When it finishes, if the HTML it served was less than 95% optimized, it sends a purge request to the downstream cache. The next request to come in will bypass the cache and come back to PageSpeed where it can serve out the now more highly optimized page. If you want to change what point PageSpeed considers the page done and stops optimizing, you can set a different value for this threshold. For example, to lower it to 80%, so that PageSpeed is satisfied with a page that is only 80% optimized, you would set:
- Apache:
ModPagespeedDownstreamCacheRewrittenPercentageThreshold 80
- Nginx:
pagespeed DownstreamCacheRewrittenPercentageThreshold 80;
Script Variables
Note: Nginx-only
Note: New feature as of 1.10.33.0
In ngx_pagespeed DownstreamCachePurgeLocationPrefix
,
DownstreamCachePurgeMethod
, and
DownstreamCacheRewrittenPercentageThreshold
support script
variables, so it's possible to set them on a per-request basis. Turn this on
with:
http { pagespeed ProcessScriptVariables on; ... }You can then use script variables in arguments for these commands:
pagespeed DownstreamCachePurgeLocationPrefix "$purge_location"; pagespeed DownstreamCachePurgeMethod "$cache_purge_method"; pagespeed DownstreamCacheRewrittenPercentageThreshold "$rewrite_threshold";
For more details on script variables, including how to handle dollar signs, see Script Variable Support.
Implementation Details
To support downstream caching PageSpeed sends a purge request to the caching layer whenever it identifies an opportunity for more rewriting to be done on content that was just served. Such opportunities could arise because of, say, the resources now becoming available in the PageSpeed cache or an image compression operation completing. The cache purge forces the next request for the HTML file to come all the way to the backend PageSpeed server and obtain better rewritten content, which is then stored in the cache. This interaction between the PageSpeed server and the downstream caching layer is depicted in the diagram given below.
In the interaction depicted above, note that the partially optimized HTML will be served from the cache until a purge request gets sent by the PageSpeed server. It is recommended to set up PageSpeed and the downstream caching layer servers on a one to one basis so that the purges can be sent to the correct downstream server.