Skip to main content

Browser Analysis

How ModPageSpeed 2.0 uses headless Chrome to extract critical CSS, detect LCP, and validate optimizations.

ModPageSpeed 2.0 can use headless Chrome to analyze pages with real browser rendering instead of relying solely on heuristics. Browser analysis extracts critical CSS from actual CSS Coverage data, detects the true Largest Contentful Paint element, measures image dimensions, and validates that optimizations do not cause visual regressions.

Browser analysis is strictly additive. Every failure falls back to the heuristic path. Pages still get optimized — they just use the faster, less precise heuristic pipeline instead.

Enabling Browser Analysis

Browser analysis is off by default. Enable it with the --enable-browser-analysis flag and ensure Chrome (or chrome-headless-shell) is available in the container:

factory_worker \
  --cache-path /data/cache.vol \
  --enable-browser-analysis \
  --chrome-binary /usr/bin/chrome-headless-shell

The Docker release images (modpagespeed/worker) ship with Chromium pre-installed. No additional setup required.

Architecture

Worker (libuv event loop)
  |
  +-- BrowserAnalysisManager
        |
        +-- AnalysisQueue        -- bounded priority queue with dedup
        |
        +-- ChromeProcess        -- spawn/recycle/RSS monitor
        |     |
        |     +-- CdpClient      -- JSON-RPC over pipe (FD 3/4)
        |
        +-- BrowserCssExtractor  -- CSS Coverage API -> critical CSS
        +-- PageAnalyzer         -- LCP, fold, CLS, image dims
        +-- UnusedCssRemover     -- dead rule removal
        +-- VisualRegressionGate -- PNG pixel diff validation
        +-- FontGlyphScanner     -- code point scanning + @font-face
        +-- ScriptCoverageAnalyzer -- Profiler coverage + deferral

BrowserAnalysisManager owns the Chrome lifecycle, analysis queue, and the CDP pipeline. It runs on the main libuv event loop (where CDP must operate). Worker thread pool threads enqueue analysis requests via uv_async_send().

CDP Pipe Transport

Chrome DevTools Protocol communication happens over --remote-debugging-pipe (file descriptors 3 and 4), not over a WebSocket. Messages are null-byte delimited JSON-RPC. This avoids the overhead and port management of the WebSocket debugging protocol.

Design decisions:

  • Per-command uv_timer_t timeout (default 30s)
  • Large CDP messages (>64KB) parsed off the event loop via uv_queue_work()
  • CancelAll() on pipe EOF resolves all pending callbacks

How It Works

  1. The worker thread runs HtmlScanner::Scan() to extract page structure
  2. TemplateDetector::HashStructure() computes an FNV-1a hash of the DOM structure, identifying the page template
  3. LookupProfile() checks the cache for an existing OptimizationProfile for this template hash
  4. Profile found: browser-validated critical CSS and LCP data are used instead of heuristics
  5. No profile: EnqueueAnalysis() sends the request to the main event loop via uv_async_send()
  6. DrainQueue() dequeues items and runs the analysis pipeline across three viewports: Mobile (375x667), Tablet (768x1024), Desktop (1440x900)
  7. The resulting OptimizationProfile is stored in the cache with SentinelId::kBrowserProfile

CSS Cache Inlining

Before passing HTML to Chrome, the worker resolves <link rel="stylesheet"> tags against the Cyclone cache and injects <style> blocks into the HTML. This enables Chrome’s CSS Coverage API to compute real coverage percentages instead of returning 0% for external stylesheets.

Guards prevent abuse: 50 stylesheet cap, 2MB per-stylesheet cap, 10MB total HTML cap.

Analysis Components

ComponentPurpose
BrowserCssExtractorUses Chrome’s CSS Coverage API to identify which CSS rules are actually used on each viewport. Produces per-viewport critical CSS.
PageAnalyzerDetects the real LCP element, measures fold position, computes CLS, and reads rendered image dimensions.
UnusedCssRemoverTakes Coverage data and removes dead rules from stylesheets.
VisualRegressionGateCaptures before/after screenshots and compares them pixel-by-pixel. Blocks optimizations that cause visible regressions.
FontGlyphScannerScans the DOM with TreeWalker for code points used on the page. Maps them to @font-face declarations for future subsetting.
ScriptCoverageAnalyzerUses Chrome’s Profiler domain to measure JS code coverage. Identifies scripts safe to defer.

Script Coverage Analysis

When browser analysis is enabled, the ScriptCoverageAnalyzer component uses Chrome’s Profiler domain to measure JavaScript code coverage. This identifies scripts that are safe to defer, improving page load performance by reducing parser-blocking JavaScript.

How It Works

  1. The analyzer loads the page with JavaScript enabled (Profiler + Coverage APIs)
  2. Each external script’s coverage is measured during page load
  3. Scripts are classified into deferral categories based on coverage data and execution timing

Deferral Categories

CategoryDescription
kSafeToDeferScript has low main-thread impact; safe to add defer
kCandidateForAsyncScript is independent; could use async instead
kAlreadyAsyncScript already has async or defer attribute
kKeepSynchronousScript must execute synchronously (DOM-dependent, inline handlers)

SSRF Defense

Script analysis enables JavaScript execution in Chrome (required for accurate coverage measurement). The other three SSRF defense layers remain active: network offline mode, Fetch interception, and DNS-level blocking. Chrome cannot make outbound connections even with JavaScript enabled.

Configuration

FlagDefaultDescription
--no-browser-script-analysis(enabled)Disable script coverage analysis

Script analysis results feed into the optimization policy engine, which decides whether to enable script deferral for each URL template.

Optimization Policy

The optimization policy engine computes per-template decisions about optional HTML transforms based on browser analysis data. It runs after profile generation and stores the policy alongside the optimization profile in cache.

Policy Fields

FieldConditionDescription
async_css_enabledAvg CSS coverage < 50%Enable async loading for render-blocking stylesheets
script_deferral_enabledDeferrable scripts detectedEnable defer attribute on safe scripts

Stats Counters

CounterDescription
policy.computedTotal optimization policies computed
policy.async_css_enabledTimes async CSS was enabled by policy
policy.script_deferral_enabledTimes script deferral was enabled by policy

These counters appear in /v1/stats JSON, /v1/metrics Prometheus output, the management socket STATS command, and the web console metrics page.

Chrome Process Management

Lifecycle

  1. ChromeProcess::Start() spawns Chrome with headless flags and pipe transport
  2. CDP commands flow through CdpClient for page analysis
  3. After each page, IncrementPageCount() checks the recycle threshold
  4. At the recycle threshold, Stop() sends SIGTERM (then SIGKILL after 5s)
  5. A fresh Chrome process starts for the next batch

Launch Flags

Chrome is spawned with strict isolation flags:

  • --headless=new — new headless mode
  • --remote-debugging-pipe — FD 3/4 pipe transport
  • --disable-gpu — no GPU required
  • --no-sandbox — required in containers (Chrome must run with minimal container privileges)
  • --host-resolver-rules="MAP * ~NOTFOUND" — DNS-level SSRF block
  • --disable-dev-shm-usage — avoids /dev/shm exhaustion in containers
  • Various isolation flags (--disable-extensions, --disable-background-networking, --no-first-run, etc.)

RSS Monitoring

On Linux, the worker reads /proc/pid/status VmRSS every 5 seconds. When Chrome exceeds --chrome-max-memory (default 512MB), the worker stops it and starts a fresh instance. This prevents memory leaks from accumulating across hundreds of pages.

SSRF Defense (4 Layers)

Browser analysis operates on cached content, not live network requests. Four layers prevent Chrome from making any outbound connections:

  1. Network.emulateNetworkConditions({offline: true}) — blocks all network
  2. Fetch.enable + Fetch.requestPaused — intercept and fail all requests
  3. Emulation.setScriptExecutionDisabled({value: true}) — no JS execution (CSS extractor and visual regression gate)
  4. --host-resolver-rules="MAP * ~NOTFOUND" — Chrome-level DNS block

Font Glyph Scanner and Script Coverage Analyzer enable JavaScript (they need it for accurate analysis) but still enforce the other three layers.

Configuration Flags

FlagDefaultDescription
--enable-browser-analysisoffEnable browser analysis pipeline
--chrome-binary/usr/bin/chrome-headless-shellPath to Chrome binary
--chrome-recycle-interval100Pages per Chrome instance before restart
--chrome-page-timeout60000Per-page analysis timeout in ms
--chrome-max-memory512Max Chrome RSS in MB before forced restart
--chrome-startup-timeout10000Chrome startup timeout in ms
--browser-queue-size1000Max queued analysis requests
--browser-profile-ttl86400Profile cache lifetime in seconds (24h)
--no-browser-critical-css(enabled)Disable browser-based critical CSS
--no-browser-lazy-loading(enabled)Disable browser-based lazy load decisions
--no-browser-lcp-preload(enabled)Disable browser-based LCP detection
--no-browser-image-sizing(enabled)Disable browser-based image dimensions
--no-browser-script-analysis(enabled)Disable script coverage analysis

All flags are also hot-reloadable via PATCH /v1/config from the web console.

Monitoring

Stats Counters

Browser analysis stats appear in the management socket STATS and BROWSER-STATUS commands, and in the web console dashboard:

CounterDescription
browser.profiles_generatedTemplates analyzed and cached
browser.profiles_usedCache hits on existing profiles
browser.analysis_errorsFailures (timeout, Chrome crash, etc.)
browser.chrome_crashesChrome process crashes
browser.queue_depthCurrent queue size
browser.scripts_analyzedScripts evaluated by browser analysis
browser.scripts_deferrableScripts identified as safe to defer
browser.css_inlining_attemptedCSS inlining attempts
browser.css_inlining_stylesheets_cachedStylesheets found in cache
browser.css_inlining_bytes_inlinedTotal CSS bytes injected

Management Socket

The BROWSER-STATUS command on the management socket returns detailed JSON including Chrome state, queue contents, and per-profile statistics:

echo "BROWSER-STATUS" | socat - UNIX-CONNECT:/data/pagespeed.sock.mgmt

Error Handling

Every failure falls back to the heuristic path:

FailureBehavior
Chrome binary not foundHeuristic only, no retry
Chrome fails to startRetry after 2 seconds
Chrome crashes mid-analysisCancel current item, restart Chrome after 2s
Analysis timeoutSkip item, process next in queue
Cache read failureSkip item
Queue fullHead-drop oldest item

The worker logs all browser analysis errors at the warning level. Monitor them in the debug console (/logs) or via the management socket.

Troubleshooting

Chrome not available (503 errors in waterfall/diff)

The web console’s waterfall viewer and visual diff features return 503 when Chrome is not running. Check:

  1. Is --enable-browser-analysis set?
  2. Does the Chrome binary exist at the configured path?
  3. In Docker: is the worker image the full variant (not the minimal image)?

High chrome_crashes count

Frequent Chrome crashes usually indicate memory pressure:

  • Lower --chrome-recycle-interval to restart Chrome more often
  • Lower --chrome-max-memory to catch leaks earlier
  • Check container memory limits — Chrome needs at least 256MB headroom

Profiles not being generated

If profiles_generated stays at zero while traffic flows:

  1. Check queue_depth — if it stays at 0, analysis requests are not being enqueued. Verify --enable-browser-analysis is set.
  2. Check analysis_errors — errors during analysis prevent profile creation.
  3. Check css_inlining_stylesheets_cached — if external CSS is not yet cached, the worker waits for it before running browser analysis.

Visual Regression Gate false positives

The visual regression gate disables JavaScript (SSRF defense). Pages that rely on CSS-in-JS frameworks (styled-components, Emotion, etc.) will show differences because their styles are injected by JavaScript. This is a known limitation. The heuristic path optimizes these pages correctly.

Next Steps