Browser Analysis — ModPageSpeed 2.0

How ModPageSpeed 2.0 uses headless Chrome to extract critical CSS, detect LCP, and validate optimizations.

ModPageSpeed 2.0 can use headless Chrome to analyze pages with real browser rendering instead of relying solely on heuristics. Browser analysis extracts critical CSS from actual CSS Coverage data, detects the true Largest Contentful Paint element, measures image dimensions, and validates that optimizations do not cause visual regressions.

Browser analysis is strictly additive. Every failure falls back to the heuristic path. Pages still get optimized — they just use the faster, less precise heuristic pipeline instead.

Enabling Browser Analysis

Browser analysis is off by default. Enable it with the --enable-browser-analysis flag and ensure Chrome (or chrome-headless-shell) is available in the container:

factory_worker \
  --cache-path /data/cache.vol \
  --enable-browser-analysis \
  --chrome-binary /usr/bin/chrome-headless-shell

The Docker release images (modpagespeed/worker) ship with Chromium pre-installed. No additional setup required.

Architecture

Worker (libuv event loop)
  |
  +-- BrowserAnalysisManager
        |
        +-- AnalysisQueue        -- bounded priority queue with dedup
        |
        +-- ChromeProcess        -- spawn/recycle/RSS monitor
        |     |
        |     +-- CdpClient      -- JSON-RPC over pipe (FD 3/4)
        |
        +-- BrowserCssExtractor  -- CSS Coverage API -> critical CSS
        +-- PageAnalyzer         -- LCP, fold, CLS, image dims
        +-- UnusedCssRemover     -- dead rule removal
        +-- VisualRegressionGate -- PNG pixel diff validation
        +-- FontGlyphScanner     -- code point scanning + @font-face
        +-- ScriptCoverageAnalyzer -- Profiler coverage + deferral

BrowserAnalysisManager owns the Chrome lifecycle, analysis queue, and the CDP pipeline. It runs on the main libuv event loop (where CDP must operate). Worker thread pool threads enqueue analysis requests via uv_async_send().

CDP Pipe Transport

Chrome DevTools Protocol communication happens over --remote-debugging-pipe (file descriptors 3 and 4), not over a WebSocket. Messages are null-byte delimited JSON-RPC. This avoids the overhead and port management of the WebSocket debugging protocol.

Design decisions:

Per-command uv_timer_t timeout (default 30s)
Large CDP messages (>64KB) parsed off the event loop via uv_queue_work()
CancelAll() on pipe EOF resolves all pending callbacks

How It Works

The worker thread runs HtmlScanner::Scan() to extract page structure
TemplateDetector::HashStructure() computes an FNV-1a hash of the DOM structure, identifying the page template
LookupProfile() checks the cache for an existing OptimizationProfile for this template hash
Profile found: browser-validated critical CSS and LCP data are used instead of heuristics
No profile: EnqueueAnalysis() sends the request to the main event loop via uv_async_send()
DrainQueue() dequeues items and runs the analysis pipeline across three viewports: Mobile (375x667), Tablet (768x1024), Desktop (1440x900)
The resulting OptimizationProfile is stored in the cache with SentinelId::kBrowserProfile

CSS Cache Inlining

Before passing HTML to Chrome, the worker resolves <link rel="stylesheet"> tags against the Cyclone cache and injects <style> blocks into the HTML. This enables Chrome’s CSS Coverage API to compute real coverage percentages instead of returning 0% for external stylesheets.

Guards prevent abuse: 50 stylesheet cap, 2MB per-stylesheet cap, 10MB total HTML cap.

Analysis Components

Component	Purpose
BrowserCssExtractor	Uses Chrome’s CSS Coverage API to identify which CSS rules are actually used on each viewport. Produces per-viewport critical CSS.
PageAnalyzer	Detects the real LCP element, measures fold position, computes CLS, and reads rendered image dimensions.
UnusedCssRemover	Takes Coverage data and removes dead rules from stylesheets.
VisualRegressionGate	Captures before/after screenshots and compares them pixel-by-pixel. Blocks optimizations that cause visible regressions.
FontGlyphScanner	Scans the DOM with TreeWalker for code points used on the page. Maps them to @font-face declarations for future subsetting.
ScriptCoverageAnalyzer	Uses Chrome’s Profiler domain to measure JS code coverage. Identifies scripts safe to defer.

Script Coverage Analysis

When browser analysis is enabled, the ScriptCoverageAnalyzer component uses Chrome’s Profiler domain to measure JavaScript code coverage. This identifies scripts that are safe to defer, improving page load performance by reducing parser-blocking JavaScript.

How It Works

The analyzer loads the page with JavaScript enabled (Profiler + Coverage APIs)
Each external script’s coverage is measured during page load
Scripts are classified into deferral categories based on coverage data and execution timing

Deferral Categories

Category	Description
`kSafeToDefer`	Script has low main-thread impact; safe to add `defer`
`kCandidateForAsync`	Script is independent; could use `async` instead
`kAlreadyAsync`	Script already has `async` or `defer` attribute
`kKeepSynchronous`	Script must execute synchronously (DOM-dependent, inline handlers)

SSRF Defense

Script analysis enables JavaScript execution in Chrome (required for accurate coverage measurement). The other three SSRF defense layers remain active: network offline mode, Fetch interception, and DNS-level blocking. Chrome cannot make outbound connections even with JavaScript enabled.

Configuration

Flag	Default	Description
`--no-browser-script-analysis`	(enabled)	Disable script coverage analysis

Script analysis results feed into the optimization policy engine, which decides whether to enable script deferral for each URL template.

Optimization Policy

The optimization policy engine computes per-template decisions about optional HTML transforms based on browser analysis data. It runs after profile generation and stores the policy alongside the optimization profile in cache.

Policy Fields

Field	Condition	Description
`async_css_enabled`	Avg CSS coverage < 50%	Enable async loading for render-blocking stylesheets
`script_deferral_enabled`	Deferrable scripts detected	Enable `defer` attribute on safe scripts

Stats Counters

Counter	Description
`policy.computed`	Total optimization policies computed
`policy.async_css_enabled`	Times async CSS was enabled by policy
`policy.script_deferral_enabled`	Times script deferral was enabled by policy

These counters appear in /v1/stats JSON, /v1/metrics Prometheus output, the management socket STATS command, and the web console metrics page.

Chrome Process Management

Lifecycle

ChromeProcess::Start() spawns Chrome with headless flags and pipe transport
CDP commands flow through CdpClient for page analysis
After each page, IncrementPageCount() checks the recycle threshold
At the recycle threshold, Stop() sends SIGTERM (then SIGKILL after 5s)
A fresh Chrome process starts for the next batch

Launch Flags

Chrome is spawned with strict isolation flags:

--headless=new — new headless mode
--remote-debugging-pipe — FD 3/4 pipe transport
--disable-gpu — no GPU required
--no-sandbox — required in containers (Chrome must run with minimal container privileges)
--host-resolver-rules="MAP * ~NOTFOUND" — DNS-level SSRF block
--disable-dev-shm-usage — avoids /dev/shm exhaustion in containers
Various isolation flags (--disable-extensions, --disable-background-networking, --no-first-run, etc.)

RSS Monitoring

On Linux, the worker reads /proc/pid/status VmRSS every 5 seconds. When Chrome exceeds --chrome-max-memory (default 512MB), the worker stops it and starts a fresh instance. This prevents memory leaks from accumulating across hundreds of pages.

SSRF Defense (4 Layers)

Browser analysis operates on cached content, not live network requests. Four layers prevent Chrome from making any outbound connections:

Network.emulateNetworkConditions({offline: true}) — blocks all network
Fetch.enable + Fetch.requestPaused — intercept and fail all requests
Emulation.setScriptExecutionDisabled({value: true}) — no JS execution (CSS extractor and visual regression gate)
--host-resolver-rules="MAP * ~NOTFOUND" — Chrome-level DNS block

Font Glyph Scanner and Script Coverage Analyzer enable JavaScript (they need it for accurate analysis) but still enforce the other three layers.

Configuration Flags

Flag	Default	Description
`--enable-browser-analysis`	off	Enable browser analysis pipeline
`--chrome-binary`	`/usr/bin/chrome-headless-shell`	Path to Chrome binary
`--chrome-recycle-interval`	100	Pages per Chrome instance before restart
`--chrome-page-timeout`	60000	Per-page analysis timeout in ms
`--chrome-max-memory`	512	Max Chrome RSS in MB before forced restart
`--chrome-startup-timeout`	10000	Chrome startup timeout in ms
`--browser-queue-size`	1000	Max queued analysis requests
`--browser-profile-ttl`	86400	Profile cache lifetime in seconds (24h)
`--no-browser-critical-css`	(enabled)	Disable browser-based critical CSS
`--no-browser-lazy-loading`	(enabled)	Disable browser-based lazy load decisions
`--no-browser-lcp-preload`	(enabled)	Disable browser-based LCP detection
`--no-browser-image-sizing`	(enabled)	Disable browser-based image dimensions
`--no-browser-script-analysis`	(enabled)	Disable script coverage analysis

All flags are also hot-reloadable via PATCH /v1/config from the web console.

Monitoring

Stats Counters

Browser analysis stats appear in the management socket STATS and BROWSER-STATUS commands, and in the web console dashboard:

Counter	Description
`browser.profiles_generated`	Templates analyzed and cached
`browser.profiles_used`	Cache hits on existing profiles
`browser.analysis_errors`	Failures (timeout, Chrome crash, etc.)
`browser.chrome_crashes`	Chrome process crashes
`browser.queue_depth`	Current queue size
`browser.scripts_analyzed`	Scripts evaluated by browser analysis
`browser.scripts_deferrable`	Scripts identified as safe to defer
`browser.css_inlining_attempted`	CSS inlining attempts
`browser.css_inlining_stylesheets_cached`	Stylesheets found in cache
`browser.css_inlining_bytes_inlined`	Total CSS bytes injected

Management Socket

The BROWSER-STATUS command on the management socket returns detailed JSON including Chrome state, queue contents, and per-profile statistics:

echo "BROWSER-STATUS" | socat - UNIX-CONNECT:/data/pagespeed.sock.mgmt

Error Handling

Every failure falls back to the heuristic path:

Failure	Behavior
Chrome binary not found	Heuristic only, no retry
Chrome fails to start	Retry after 2 seconds
Chrome crashes mid-analysis	Cancel current item, restart Chrome after 2s
Analysis timeout	Skip item, process next in queue
Cache read failure	Skip item
Queue full	Head-drop oldest item

The worker logs all browser analysis errors at the warning level. Monitor them in the debug console (/logs) or via the management socket.

Troubleshooting

Chrome not available (503 errors in waterfall/diff)

The web console’s waterfall viewer and visual diff features return 503 when Chrome is not running. Check:

Is --enable-browser-analysis set?
Does the Chrome binary exist at the configured path?
In Docker: is the worker image the full variant (not the minimal image)?

High chrome_crashes count

Frequent Chrome crashes usually indicate memory pressure:

Lower --chrome-recycle-interval to restart Chrome more often
Lower --chrome-max-memory to catch leaks earlier
Check container memory limits — Chrome needs at least 256MB headroom

Profiles not being generated

If profiles_generated stays at zero while traffic flows:

Check queue_depth — if it stays at 0, analysis requests are not being enqueued. Verify --enable-browser-analysis is set.
Check analysis_errors — errors during analysis prevent profile creation.
Check css_inlining_stylesheets_cached — if external CSS is not yet cached, the worker waits for it before running browser analysis.

Visual Regression Gate false positives

The visual regression gate disables JavaScript (SSRF defense). Pages that rely on CSS-in-JS frameworks (styled-components, Emotion, etc.) will show differences because their styles are injected by JavaScript. This is a known limitation. The heuristic path optimizes these pages correctly.

Next Steps

Web Console — Use the waterfall viewer and visual diff tools powered by browser analysis
HTTP API Reference — BROWSER-STATUS management command and /v1/stats browser counters
Configuration Reference — All browser analysis flags
Troubleshooting — Chrome not found, CDP failures, and analysis timeout diagnostics