MarshallOfSound

#51815: build: consume Electron-generated PGO profiles in release builds

Merged
Created: May 31, 2026, 1:14:26 PM
Merged: Jun 1, 2026, 11:39:38 PM
4 comments
Target: main

Description of Change

Wires Electron release builds to consume Electron-generated PGO profiles instead of Chrome's published ones. This is the consumption side of the PGO pipeline (generation side: #51812 — the two PRs are independent and can merge in either order).

The problem

Release builds apply Chrome's published PGO profile (chrome_pgo_phase = 2). PGO profiles match functions by symbol name + control-flow hash, so every function that differs from Chrome — all of Node.js, patched Chromium files, V8 built with Node's flags, and the Electron shell — gets no optimization guidance and is laid out as cold. The same applies to V8's builtins profile: Chrome's rejects Electron's promise/async builtins (RunMicrotasks, AsyncFunctionAwait, FulfillPromise, …) because Node's integration changes their codegen.

This isn't theoretical. Shipping Electron 44 has a 2.1× regression in crypto.randomBytes vs Electron 42 (829K → 390K ops/s) caused by exactly this: a BoringSSL patch changed function hashes, Chrome's profile silently stopped covering them. The Electron profile recovers it completely (839K ops/s).

How it works

build/pgo_profiles/<target>.pgo.txt     state files name the profile per target (only these change on updates)
        |
        v  gclient hooks (DEPS) run script/pgo/download-profiles.py
build/pgo_profiles/electron-<target>-<version>.profdata     downloaded under their versioned names
        |
        v  Chromium's standard chrome_pgo_phase = 2 machinery
-fprofile-use + every other flag upstream      a small Chromium patch teaches its profile
maintains (warning suppressions, ext-TSP       resolution to read Electron's state files,
block layout, future additions) applied        including per-arch Linux profiles which
to Electron's profiles unchanged               upstream does not have
  • Chromium's PGO configuration stays authoritative: the patch only redirects which profile the standard chrome_pgo_phase = 2 flow resolves; all compiler/linker flags come from upstream's config and track it automatically (this addresses the cflags-drift concern raised in review — the parallel-config approach had in fact already drifted, missing -enable-ext-tsp-block-placement)
  • A single release.gn import works on every platform/arch — chrome_pgo_phase already defaults to 2 for official builds, so it only wires the V8 builtins profile
  • Chrome's published profiles are no longer downloaded (checkout_pgo_profiles = False); an explicitly set pgo_data_path still takes precedence as an escape hatch back to them
  • 32-bit targets (win-x86, linux-arm) get the C++ profile but keep no builtins PGO (Electron only generates a 64-bit builtins profile)

Benchmark results

37 statistically significant improvements, 0 regressions, 1 tie across 38 app-operation benchmarks — overall geomean +19.5% vs the shipping nightly. Full data below.

All benchmark records (click to expand)

1. Headline: shipping nightly vs this work (conclusive, statistically controlled)

Methodology: official v44.0.0-nightly.20260529 vs an identical-source build with Electron PGO profiles + ThinLTO¹, Linux x64, 38 benchmarks × 5 interleaved rounds per build (A,B,A,B,… to cancel thermal/cache drift), idle machine, Welch's t-test at p < 0.05.

Area Geomean Largest wins
contextBridge +28.1% deep nested objects +54.6%, small objects +49.6%, arrays +48.3%, callbacks +40.9%
Networking +18.8% POST echo +39.5%, response.json() +36.6%, fetch 1KB +28.9%, XHR +27.7%
IPC +10.9% structured clone (Map/Set/Date) +16.1%, 1MB payloads +14.7%, 5MB throughput +13.0%

¹ The comparison build also includes ThinLTO --lto-O2 (#51669/#51809), since that is the configuration releases will ship with once both land. For PGO's isolated contribution see section 2.

Full 38-test table (click to expand)
Test Nightly With profiles Δ Significant
bridge: void call (no args) 1,823,713 2,292,331 +25.7%
bridge: echo number 1,449,493 1,880,806 +29.8%
bridge: echo 1KB string 1,446,386 1,884,338 +30.3%
bridge: echo small object 369,786 553,153 +49.6%
bridge: echo deep nested object 103,109 159,361 +54.6%
bridge: echo 500-elem array 1,133 1,681 +48.3%
bridge: echo 64KB typed array 15,362 15,966 +3.9%
bridge: echo 1MB typed array 637 650 +2.0% tie (memcpy-bound)
bridge: transform object 342,173 506,416 +48.0%
bridge: callback round-trip 144,697 203,861 +40.9%
bridge: exposed property access 7,419,663 8,031,797 +8.3%
bridge: async function (promise) 159,229 202,957 +27.5%
bridge: invoke round-trip (small) 10,603 11,920 +12.4%
ipc: invoke small object 11,529 12,373 +7.3%
ipc: invoke 10KB JSON object 3,070 3,478 +13.3%
ipc: invoke 64KB typed array 2,276 2,592 +13.9%
ipc: invoke 1MB typed array 210 241 +14.7%
ipc: invoke structured clone (Map/Set/Date) 4,702 5,462 +16.1%
ipc: invoke 5MB throughput (MB/s) 223 252 +13.0%
ipc: invoke 20MB throughput (MB/s) 227 254 +11.7%
ipc: one-way send throughput (msgs/s) 23,392 23,747 +1.5%
ipc: webContents.send round-trip 12,250 13,646 +11.4%
ipc: MessagePort round-trip 14,033 15,239 +8.6%
ipc: executeJavaScript round-trip 11,795 12,844 +8.9%
net: fetch 1KB round-trip 761 981 +28.9%
net: fetch 64KB round-trip 735 907 +23.4%
net: fetch 1MB round-trip 279 317 +13.7%
net: fetch 8MB throughput (MB/s) 524 554 +5.9%
net: 4x parallel 64KB fetches 237 288 +21.4%
net: POST echo 2KB 821 1,146 +39.5%
net: POST echo 256KB 287 347 +21.1%
net: fetch + response.json() (10KB) 727 993 +36.6%
net: fetch + response.text() (64KB) 665 840 +26.3%
net: XHR 64KB round-trip 691 883 +27.7%
net: WebSocket echo 1KB 9,955 10,643 +6.9%
net: WebSocket echo 64KB 1,574 1,685 +7.1%
net: node https.get 64KB 4,003 4,355 +8.8%
net: node https.get 1MB 358 372 +4.0%

All values ops/s unless noted. The single non-win (1MB typed array marshaling) is dominated by raw memcpy, which PGO cannot accelerate.

2. PGO's isolated contribution (Chrome profile vs Electron profile, same build otherwise)

Benchmark Improvement
Speedometer 3.1 (Linux x64) +9.5%
crypto.randomBytes +118% (regression recovery: 390K → 839K ops/s, matching Electron 42's 829K)
Startup → app ready (macOS M1) −13% (45.2ms → 39.2ms)

3. The training-coverage story (why profiles must cover app workloads)

A profile trained only on browser benchmarks pessimizes code those benchmarks never run (PGO marks uncovered functions cold). Measured cost on a benchmark-only profile, and recovery after adding Electron-specific training (main-process Node.js, contextBridge/IPC marshaling, networking over TLS — see #51812):

Path Benchmark-only profile Enriched profile
Node Buffer ops (macOS arm64) −63% vs unprofiled recovered
contextBridge calls baseline +23–27% further
Large-payload IPC baseline +24% further
Geomean of 22 app operations baseline +8% further

4. V8 builtins profile coverage

Chrome's published profile Electron's profile
Promise/async builtins (RunMicrotasks, AsyncFunctionAwait, FulfillPromise, PromiseConstructor, …) Rejected (hash mismatch from Node's codegen flags) Covered (113–306 block hints each)
All other builtins Covered Covered

5. Cumulative Speedometer 3.1 progression (Linux x64, containerized; same source, same V8)

Configuration Score Step
Shipping configuration (Chrome profile, default LTO) 22.08 baseline
+ ThinLTO --lto-O2 (#51669 / #51809) 24.99 +13.2%
+ Electron C++ PGO profile (this PR) 27.37 +9.5%
+ Electron V8 builtins profile (this PR) ~27.8 +1.6%

With the full optimization stack, Electron is as fast as — and in some cases faster than — Chrome on the same workloads.

6. Real-hardware corroboration (macOS, Apple Silicon)

ThinLTO-only numbers from #51669/#51809 testing — included to show container results translate to real hardware (PGO stacks on top of these):

Hardware Speedometer 3.1 Gain
M1 26.8 → 31.7 +18.3%
M5 56.6 → 65.5 +15.7%

Relationship to other PRs

  • #51812 — generates and publishes the profiles this PR consumes (independent; either merge order works — the profiles referenced by this PR's state files are already published)
  • #51669 / #51809 — ThinLTO link-time optimization (independent; benefits multiply)

Checklist

  • I have built and tested this change
  • I have filled out the PR description
  • I have reviewed and verified the changes
  • npm test passes
  • PR release notes describe the change in a way relevant to app developers

Release Notes

Notes: Improved runtime performance.

Backports

42-x-y
Merged
PR Number
#51828
Merged At
Jun 1, 2026, 6:04:48 PM
Released In
v42.3.1
Release Date
Jun 1, 2026, 11:25:56 PM
43-x-y
In-flight
PR Number
#51829
Waiting to be merged

Semver Impact

Major
Breaking changes
Minor
New features
Patch
Bug fixes
None
Docs, tests, etc.

Semantic Versioning helps users understand the impact of updates:

  • Major (X.y.z): Breaking changes that may require code modifications
  • Minor (x.Y.z): New features that maintain backward compatibility
  • Patch (x.y.Z): Bug fixes that don't change the API
  • None: Changes that don't affect using facing parts of Electron