The Memory-Mapped Cache: Zero-Copy Serving Between nginx and the Worker
In mod_pagespeed 1.x, a request for /styles/site.css or /photo.jpg paid for its own optimization. The web server ran the rewrite in-flight, on the request thread, and only then sent bytes to the client. The work was cached afterwards, but the architecture meant transformation latency sat on the critical path of real requests, in every web server process, every time the cache was cold.
ModPageSpeed 2.0 takes that work off the request path entirely. The optimizing process and the serving process are separate, and they meet at a single memory-mapped cache: one disk file that both nginx and the worker mmap into their own address space. When a request hits a cached variant, nginx hands the client a buffer that points straight at the mapped pages. No copy, no re-parse. The bytes the worker wrote are the bytes nginx serves.
Two processes, one memory-mapped cache
The 2.0 runtime is three cooperating pieces, and the cache is the one they all share:
- A thin C++ nginx module (the interceptor) that classifies requests, serves cached variants, and proxies misses to the origin.
- A standalone factory worker process running a libuv event loop. It receives notifications from nginx, reads original content from cache, optimizes it, and writes variant alternates back.
- Cyclone, a memory-mapped disk cache shared between the two.
Cyclone stores everything in one memory-mapped volume file rather than a directory tree. Both nginx and the worker map that same file. The consequence is the part that matters for serving: writes from either process are immediately visible to the other, because both open the cache with multi-process sharing enabled. The worker finishes encoding a WebP variant, writes it into the mapped region, and the next nginx request can read it without any handoff, flush, or re-open. There is no message that ships content between the two processes. The IPC notification carries identifying metadata — the URL, hostname and scheme, a content type, and the 32-bit capability mask — not the bytes. The content lives in the cache; the socket only points at it.
This is a deliberate inversion of the 1.x data flow. In 1.x, the rewritten bytes existed inside the web server’s request context and had to be threaded back into the response. In 2.0, the bytes are written once to a shared region and read in place by whoever needs them.
What “zero-copy” actually means on a cache hit
The interceptor’s serving path is short. On a request, it classifies the client into a CapabilityMask from the request headers, composes a CacheKey from the URL, hostname and scheme, and asks the cache for the best alternate for that mask. The 32-bit mask encodes the dimensions a variant can differ on: image format, viewport class, pixel density, Save-Data, and transfer encoding. Because a single URL can carry many alternates, the selector scores every stored alternate against the mask, and falls back to the original if no optimized variant exists yet.
When a lookup succeeds, the interceptor does not allocate a response buffer and memcpy the cached entry into it. It constructs an ngx_buf_t that points directly at the mmap’d data, sets Content-Type from the variant’s stored metadata and Content-Length from the size of the mapped content, and registers a cleanup handler to release the cache read handle when the request completes. The kernel already has those pages mapped; nginx is just describing a window into them.
That removes two costs from the hit path that 1.x could not avoid. There is no transformation — the optimization happened earlier, in the worker, off this request. And there is no copy of the cached bytes from a cache buffer into a response buffer, because the response buffer is the cache. A hit is bookkeeping plus a pointer.
The first request for a cold URL still gets the original response, marked X-PageSpeed: MISS. The worker optimizes asynchronously, and subsequent requests get X-PageSpeed: HIT with no processing overhead on the wire. The latency that 1.x charged every request, 2.0 charges once, to a background process, and never to the client.
Variants, metadata, and why a directory cache could not do this
The 1.x cache stored each rewritten resource as a separate entry in a directory hierarchy. That model is fine for “one input, one output,” but it does not express “one URL, several alternates chosen per request.” 2.0 needs the latter: the same /photo.jpg may resolve to a WebP variant for a browser that advertises it, or an optimized JPEG as the fallback, each potentially at different viewport sizes and pixel densities. Where a browser advertises a newer format the cache holds a matching alternate for, the negotiation picks it.
Cyclone keeps all of those as alternates of one URL, and each alternate carries its own metadata inside the cache: the capability mask, content type, the origin’s cache-control fields, the SSIMULACRA2 perceptual score, content class, ETag, and Last-Modified. That metadata is what lets nginx serve a variant correctly without re-deriving anything — it reads the stored Content-Type rather than sniffing, and it stores the origin’s cache-control fields alongside the bytes. The image format negotiation that 2.0 does at request time (reading Accept, serving the best available bytes from the original URL, no .webp extensions or JavaScript detection) only works because the alternates and their metadata sit side by side in the mapped file, addressable by key.
The Cyclone format is not compatible with the 1.x file cache, and there is no migration path. The cache starts cold and warms as traffic flows through it — which is consistent with the rest of the model, since the first request through any URL is the one that records the original and triggers the worker.
Related
- Migrating from mod_pagespeed 1.x to 2.0
- Fire-and-forget worker IPC
- Why I rebuilt mod_pagespeed
- Reduce TTFB at the server layer
- Run ModPageSpeed with Docker Compose
- How async rewriting works
- Cache modes
If you want to see the shared cache in action, the fastest path is the two-container Docker Compose stack: an nginx interceptor and a factory worker mounting the same volume for the cache file and the worker socket. Pull the images and run them — grab the build from /download/, and check the cache modes documentation to understand how the worker and nginx coordinate around that one mapped file. Production use needs a commercial license, but the software never locks you out: unlicensed deployments keep optimizing and serving, they just tell you they are unlicensed. So you can stand up the full nginx-plus-worker topology, watch a request go MISS then HIT, and confirm the zero-copy path works on your own traffic before you decide anything.
mod_pagespeed and PageSpeed are trademarks of Google LLC; We-Amp B.V. is not affiliated with, endorsed by, or sponsored by Google, and maintains the open-source mod_pagespeed project independently.
Read next
-
Fire-and-Forget IPC: Decoupling Request Latency From Optimization Work
How ModPageSpeed 2.0 uses fire-and-forget IPC between nginx and the worker — a small notification, no reply — so requests never wait on optimization work.
-
Sentinel Cache Keys: Reserving Alternate IDs for 103 Early Hints
How ModPageSpeed 2.0 uses sentinel cache keys — the reserved Viewport=3 trick — to store 103 Early Hints preloads, an origin content hash, and a browser optimization profile alongside content variants.
-
Cache Key Derivation in ModPageSpeed 2.0: Host-Scoped Keys and Single-Pass Variant Fallback
How cache key derivation in ModPageSpeed 2.0 hashes host plus URL into one key and scores stored variants in one selector pass instead of probing many keys.