A slow Time to First Byte is the kind of audit finding that sends people straight to a CDN. Sometimes that is the right move. More often the first byte is slow for reasons a CDN does not touch, and you can fix most of them in your nginx config before you add a vendor to the request path. This post covers those server-layer changes, and where the server layer stops helping.

What TTFB is, and how PSI flags it

TTFB is the time from the start of the request to the first byte of the response body arriving at the client. It covers DNS resolution, the TCP handshake, the TLS handshake, the request travelling to the server, the server producing a response, and the first byte travelling back.

TTFB is not itself a Core Web Vital. The three are LCP, INP, and CLS. But TTFB sits upstream of LCP: the browser cannot start rendering before the first byte lands, so your LCP floor is your TTFB. If TTFB is 800 ms, nothing paints before 800 ms no matter how small your hero image is.

Two thresholds matter, and people mix them up:

The lab audit. PageSpeed Insights and Lighthouse run a “Reduce initial server response time” audit. It flags red above 600 ms, measured under Lighthouse’s throttled lab conditions. That is the warning that prompts most people to read this kind of post.
The field metric. web.dev treats real-user TTFB as good at 0.8 s or less and poor above 1.8 s, at the 75th percentile across the trailing 28 days. This is the number Chrome’s CrUX dataset and Search Console report, and it is the one that actually correlates with your LCP in the wild. The 2025 Web Almanac found only 44% of mobile pages hit “good” TTFB, so this is not a rare problem.

Fix the field number and the lab audit follows.

Split the number before you spend money

A single TTFB figure hides three different problems with three different owners. Pull them apart first:

curl -w 'dns=%{time_namelookup} connect=%{time_connect} \
tls=%{time_appconnect} ttfb=%{time_starttransfer}\n' \
-o /dev/null -s https://<host>/

Origin compute is time_starttransfer minus time_appconnect, the server-think time. This is your application, your database, your template render. Hit an uncached URL to see it.
Network is the connection phases: DNS, TCP, TLS. A CDN’s edge PoPs shorten these for distant users, which is the legitimate case for reaching for one.
Cache is the difference between a cached and an uncached request. Hit the same URL twice and compare.

Run the timing against both a cached and an uncached path. If the cached request is still slow, the cost is transport, and transport lives in your nginx config. If the uncached request is slow but the cached one is fast, your origin compute is the long pole, and no amount of caching headers fixes a query that takes 900 ms to run.

The server-layer levers

These are the changes that live entirely in nginx and cost nothing per request.

Keep-alive to the upstream. By default nginx opens a fresh connection to your backend for every request. An upstream block with keepalive 32; and proxy_http_version 1.1; reuses connections, which removes a TCP (and possibly TLS) handshake from the server-think portion of every dynamic response. This is the single most common forgotten setting on reverse-proxy setups.

HTTP/2 or HTTP/3. On HTTP/1.1, the client serializes requests across a handful of connections and the document can queue behind earlier assets. HTTP/2 multiplexes them over one connection. Use the current directive form (listen 443 ssl; plus http2 on;) rather than the deprecated listen ... http2. HTTP/3 over QUIC removes head-of-line blocking at the transport level and adds 0-RTT resumption for returning visitors; nginx has shipped QUIC since 1.25.0, and it carried over into the 1.26 and later stable branches (the current stable branch is 1.28.x). Enable it with a quic listener and an Alt-Svc header advertising h3. None of this helps the first byte of a single isolated request much, but it cuts the wait when the document competes with other in-flight requests, which is the real-world case.

TLS session resumption. A full TLS handshake is a round trip you pay on every cold connection. ssl_session_cache shared:SSL:10m; with ssl_session_tickets on; lets returning clients resume without it. For users far from your origin, this is often a larger TTFB win than anything you do to the application.

Pre-compressed assets. gzip on; compresses on the fly, burning 5–30 ms of CPU per response and serializing behind the worker pool under load. Pre-compress your static assets at build time and serve them with gzip_static on; and brotli_static on;, which hands nginx a .gz or .br file straight off disk. Brotli at a high level produces smaller files than gzip, and because you compressed once at build time you can afford the higher level.

A sane upstream / FastCGI cache. For dynamic responses that stay fresh for even a few seconds, proxy_cache or fastcgi_cache with proxy_cache_use_stale updating; lets nginx serve a cached response while it revalidates in the background. The slow origin render happens off the request path. A short s-maxage with stale-while-revalidate does the same thing for any CDN sitting in front.

Serve optimized variants from cache, don’t recompute them

There is a quieter TTFB cost most people never measure: doing optimization work in the request path. If your stack re-encodes an image, minifies CSS, or inlines critical CSS on the fly per request, that work lands inside TTFB for every uncached response.

This is the part ModPageSpeed 2.0 is built around. Optimization happens out of the request path. On a cache miss, nginx serves the original immediately and sends a notification to the worker; the worker reads the original, optimizes it, and writes the result into the Cyclone cache. The next matching request is a cache hit served via zero-copy mmap, with no re-encoding on the request path.

The cache is variant-aware, which matters for TTFB specifically. A naive cache that keys only on the URL either serves the wrong format to some clients or recomputes per client. Cyclone stores multiple variants under one URL key — WebP and AVIF and the optimized original, across viewport and density and Save-Data — each tagged with the client capability it matches. A request gets the best-fit variant from cache with no recomputation, and falls back to a closer variant or the original if its exact match is not warm yet. The optimization cost is paid once, asynchronously, instead of landing inside TTFB on every request. This is the same mechanism behind the LCP work in our nginx LCP guide; the TTFB benefit comes from the same design.

What the server layer does not fix

None of the levers above make a slow application fast.

If your time_starttransfer on an uncached, keep-alive’d, resumed connection is still 700 ms, the time is going into your code or your database. An N+1 query, a cold cache in your ORM, a synchronous call to a third-party API, a render that walks a 5,000-item collection: a reverse proxy cannot speed any of that up. It can cache the result so you pay the cost less often, but the uncached path stays as slow as the backend makes it. Profile the application and fix the query first, then judge whether the residual network distance is worth a CDN.

A CDN’s job is the network phase: shortening DNS, TCP, and TLS for users far from your origin, and absorbing traffic. It does not fix origin compute, and it does not give you variant-aware optimization unless you pay for that feature separately. A working order: split the number, fix the transport in nginx, move optimization out of the request path, fix the application if the uncached path is still slow, and reach for the CDN last for the network distance it addresses.

How to reduce TTFB: server-layer wins before you reach for a CDN

What TTFB is, and how PSI flags it

Split the number before you spend money

The server-layer levers

Serve optimized variants from cache, don’t recompute them

What the server layer does not fix

Read next

Server-injected resource hints: Speculation Rules and preconnect from real traffic

Measuring LCP and CLS in a headless browser to drive optimization

Remove unused JavaScript with Chrome's coverage instrumentation