largediff  ·  an architecture brief

An architecture brief · 2026

Reviewing a diff the size of a small city.

largediff renders pull requests with a quarter‑million lines of code without ever putting them in the browser. The trick isn't faster JavaScript — it's a backend that keeps the truth, and a wire that only carries what changes.

200,000lines
in the demo diff
5.3k nodes
live DOM, any scroll
1.7KB
wire bytes per push
34ms
largest contentful paint

The premise

Most rich UIs ship the data to the user. We don't.

The browser is an excellent renderer and a hostile state store. The instant a diff with two hundred thousand lines lands as a JSON blob, you've shipped twenty megabytes, spent thirty seconds parsing it, and handed the page to a framework that now has to re‑render the universe every time the user scrolls.

largediff inverts that. The backend is the single source of truth — it owns the diff, the parse tree, the collapse state, the highlight ranges. The frontend is a projection: a few signals, a couple thousand DOM nodes, no parsers, no virtual‑DOM accountant. When something changes on the server, a short SSE message lands and the page morphs.

The page isn't an app that fetches data. It's a long, live view onto a diff that lives somewhere else.

The window

The browser holds a couple of files. The server holds the rest.

Every row in the diff has an absolute pixel position the moment the file list is known — a flat pixelOffsets[] array on the server. The scroller's height is set to the tape's total, and exactly the files whose pixels currently intersect the viewport get rendered into the DOM as <section data-file> subtrees. Scroll, and the server picks a new file range. Sections leaving the window come out by id; arriving ones go in by id; the sections in the middle never move.

An earlier draft of this page sliced row‑by‑row and proudly reported 76 live rows in the DOM. The number was real but the architecture was brittle: every scroll re‑stamped top:Npx on every row, sticky chrome handoffs needed JavaScript, and the morph payload was a flat sea of identical‑looking <div>s. File‑windowed sections cost a few hundred extra rows in the live DOM and bought back: pure‑CSS sticky, atomic jumps, and a morph that's byte‑identical when you scroll within a file — so most scrolls send zero patches.

THE BACKING TAPE 200,000 rows · 500 files pixelOffsets[] — flat array rowAtPixel(y) — binary search ▾ app/registry.py @@ hunk 3 @@ 128 129 130 131 132 133 LIVE DOM · 2 file sections VIEWPORT the slice the browser sees fat morph · stable ids scrolls 1px → server re-slices the window → SSE pushes a patch
The tape never ships. The window does.

At any scroll position, the live DOM under #ds-window holds about two file sections — the one the viewport is parked in, plus its nearest neighbour caught in the overscan buffer. The other 498 files exist only as offsets in a Uint32Array on the server. Scrolling to file one hundred and ninety costs the same as scrolling to file ten — a binary search, a slice, a brotli‑flushed SSE frame. And when the slice fingerprint hasn't changed (the common case for within‑file scrolls of an inline file), the server sends nothing at all.

The pipe

One stream stays warm. Everything else is a 204.

The browser opens a single long‑lived GET /sessions/:sid/stream and never closes it. Every interaction — scroll, jump, collapse, toggle — is a tiny POST that returns 204 No Content. The actual response flows down the open stream as an already‑compressed brotli frame. The brotli encoder is kept warm per session, so each subsequent push compresses against a shared dictionary of prior frames. After thirty pushes, the average wire cost is ~1.7 KB for ~22 KB of decoded HTML and JSON.

BROWSER datastar · signals CSS Highlight API SERVER Bun.serve · brotli (warm) DiffStore · tree-sitter REVIEWSESSION { view, settings, perFile } POST /view · 204 No Content SSE · brotli patch · <2 KB fire commands · let the stream answer
Commands and queries don't share a request. They share a session.

The separation matters. A scroll command and an SSE response are no longer racing each other for one HTTP cycle; the response is whatever the next projection happens to be, with whatever batched state changes landed in between. Coalescing is built in.

The fat morph

One render path. One payload. The whole UI, every push.

For most of the project's life the wire carried three kinds of events per push: a morph for the diff window, a signals patch for active-file metadata and imperative scrolls, and a second morph for the sidebar slice. Each had its own emit gate, its own skip‑when‑unchanged fingerprint, and its own contract with the client. It was efficient on the wire. It was also a steady source of bugs the moment you stopped staring at it.

The version live today replaces those three events with one:

// every command handler ends here
writer.send("datastar-patch-elements", [
  "selector #app",
  "mode inner",
  ...renderAppInner(session, deps).split("
")
    .map(line => "elements " + line),
].join("
"));

Every push re‑renders the entire content tree from session state. The browser receives the whole inner HTML of #app — topbar, sidebar, scroller, and every visible row — and Idiomorph reconciles it against the live DOM. Stable ids (#scroller, #ds-window, #file-tree, #layout, #topbar, plus #f-<fid> per section and #file-row-<fid> per row) let the morph match in place; scroll position, focus, and one‑shot data-init survive across pushes.

Imperative actions — scroll to a file, scroll to the top, recenter the sidebar — can't be expressed as DOM changes; they have to call scrollTo. They ride inside the morph as a transient command element:

// first child of #app's inner morph, only when needed
<div id="cmd-42"
     data-signals='{"navEpoch":3}'
     data-init="window.__largediffJump?.(2007388)">
</div>

The id carries a monotonic counter (cmdSeq) that bumps on every push that needs to fire imperatives, so Idiomorph treats the element as new and Datastar re‑runs both plugins. data-init calls the scroll function directly — no signal gate, so a target pixel of 0 (back to top, cmd+Up) works the same as any other value. data-signals carries navEpoch back to the client so the next /view POST echoes the current epoch and the server can drop stale scroll events from the prior position.

A class of bugs disappeared with the third event.

The split protocol had at least four failure modes you could only see at the seams:

  • State drift between fingerprints. Moving the active class from one sidebar row to another doesn't change the rendered HTML's byte length, so a length‑keyed fingerprint silently treated it as unchanged. The sidebar's morph got skipped, the highlight stuck to the old row, and the user blamed Datastar. With one fingerprint that includes activeFileId, this can't happen.
  • The imperative‑signal‑can't‑be‑zero trap. A data-effect="$jumpToPx > 0" gate can't carry jumpToPx = 0 as a target. Back‑to‑top, cmd+Up, and any /jump to file 0 had to be special‑cased. With data-init firing on a fresh‑id element, there is no gate; the expression just runs.
  • Sequencing across events. Diff morph → signals → sidebar morph means the client sees three intermediate DOM states per push. Idiomorph applies each event in its own task, and the browser can paint between them. Users sometimes saw the sidebar update before the diff caught up, or scroll happen before the target section was mounted. One morph applies the entire next state atomically.
  • Routing every new feature. Adding a new piece of state — "show file count," "swap a chrome mode," "mark a file reviewed" — meant deciding which event carries it. Sometimes the answer was "two of them." Now the answer is always: change the session state, the next morph carries the difference.

The UI is a pure function of session state. There is no way to make the rendered tree disagree with the server's model, because there is nowhere else for it to come from. Immediate‑mode rendering with a single projection — same pattern a game loop uses to redraw the frame from world state, retargeted at a DOM.

The trade-off, and why it stopped hurting.

The cost is real: every push ships the entire #app inner instead of just the slice that changed. The wire eats it without complaint — a warm brotli encoder per session compresses against the prior frame's dictionary, so the topbar, the sticky chrome, and the unchanged sidebar rows cost almost nothing per push. The topbar's "wire" chip shows the running ratio; on a typical session it sits around 94 % saved.

The harder cost was on the client side. We measured it. It wasn't the wire; it was the browser's style work on the ~3,000 token spans of any newly‑mounted file section. Re‑enabling content-visibility: auto on .file-section deferred the per‑span style work to the moment a section enters the viewport, and the scroll‑time style recalc dropped from ~95 ms per push to not flagged by the trace at all. The forced reflow inside Datastar's morph went the same way — 395 ms aggregated over five scrolls is now zero in Chrome's performance insights.

Larger payload, smaller mental model, equal latency. The architecture trades a thing the server is good at (rendering bytes) for a thing programmers are bad at (keeping three concurrent views of state in sync).

Tokens in the DOM

Tokens render where the spec wants them: in the DOM.

The first version of largediff tried to be clever about syntax highlighting. Per‑token <span>s are the classic place a diff view collapses, so we kept row text as a plain text node and used the CSS Custom Highlight API instead. The server shipped [fileId, lineIdx, startCol, endCol] tuples; the client built one Highlight per kind and registered it with CSS.highlights.set("ds-keyword", …). Six kinds, ~1,500 ranges, no <span> allocation. In Blink it ran in about two milliseconds.

In WebKit it stalled the paint pipeline for 200–800 ms after every jump.

The engines implement the API differently. Reading the WebKit source, Highlight::repaintRange() walks each node a range intersects and calls repaint() per renderer; Blink batches the same work in its PrePaint pass. With ~1,500 ranges per push, the per-renderer walk adds up. It's a reasonable difference between two engines; it's just not a difference we can paper over from userland.

Looking at production: Monaco, CodeMirror 6, GitHub's PR diff view, Sourcegraph, Zed — none use CSS.highlights for syntax tokens. They all use per‑token spans (or, in Zed's case, GPU glyphs). The API is designed for cross‑token annotations like find-in-page hits and blame ribbons, not for token coloring at scale. We were choosing a path that nobody who renders code at scale chooses.

So now we render what everyone else renders. The morph payload carries the spans inline:

// files.ts — what the server emits inside each row's .text span <span class="text"> <span class="kw">const</span> <span class="fn">parse</span> = (<span class="typ">input</span>: <span class="typ">string</span>) => { </span>

Class names compress beautifully under brotli (the dictionary picks them up on the second push), the wire cost ends up similar to the CSS-Highlight version, and both engines paint the page the same way because we're letting the browser do what it already does well: text rendering. The 200–800 ms post-jump stall is gone.

By the numbers

What it actually costs to render the impossible.

Profiled against the demo seed — five hundred files, two hundred thousand lines, GitHub‑style chrome. Measurements come from Chrome DevTools performance traces taken through the DevTools MCP, and from performance.now() checkpoints the client forwards over a diagnostic POST /log endpoint so Safari and Chrome can be measured side‑by‑side.

Largest Contentful Paint cold load, populated diff
34 ms
desktop
Cumulative Layout Shift across the first 10s of session life
0.00
from 7.72 in v0
Wire bytes per push warm brotli, average over 30 scrolls
~1.7 KB
~0 KB when slice unchanged
Total DOM elements populated page, any scroll
~5,290
↓ from 7,814
Client state for the sidebar file metadata embedded in the page
0 bytes
↓ from 50 kB JSON blob
Forced reflow on jump click → first paint of new file region
~118 ms
↓ from ~211 ms
Click → first paint after jump sidebar click to scrolled content visible
~50 ms
Chrome & Safari
Style recalc per scroll-driven morph previously 3K–4K elements
~0 ms
↓ from ~95 ms · content‑visibility
Forced reflow over 5 scrolls aggregate over a wheel fling
not flagged
↓ from ~395 ms
SSE events per push previously diff morph + signals + sidebar morph
1
↓ from 3 · fat morph
Sidebar IntersectionObservers previously one per file row
0
↓ from 500

The arc

Nine corrections, in the order they hurt.

None of the architecture above arrived in one draft. Each move below started as a measurement that didn't match the intuition.

01 · Sections, not rows

Per-row pixel windowing made every scroll re-stamp top:Npx on every row.

Sticky chrome handoffs needed JavaScript, jumps had a race between the morph and the scrollTo, and the morph payload was a flat sea of identical <div>s. Switching to per-file <section>s with position: sticky chrome cost a few hundred extra rows in the live DOM, but bought: pure-CSS sticky handoff, atomic jumps, and a morph that's byte-identical for in-file scrolls. Most scrolls now send zero patches.

02 · scrollend

Datastar's __throttle is leading-edge only. The trailing position can vanish.

A scrollbar drag followed by a release sometimes left the viewport on a y where no section was mounted — the final scroll never triggered a /view POST because the throttle window was still open. The fix was two extra words in the shell: data-on:scrollend="…", which fires once after the scroll settles and always carries the final position to the server.

03 · The pixel signal

Sticky scrollIntoView is a coin flip.

Anchoring a jump to a sticky .file-card-header worked in Chrome and intermittently mis-landed in Safari by 30 to 70 pixels — the engine sometimes computed the chrome's natural position before laying out the just-mounted section. We replaced the anchor with a jumpToPx signal: the server already knows the exact pixel, so it pushes a number and a body-level data-effect calls scroller.scrollTo({top: $jumpToPx}). Element position never enters the equation.

04 · Tokens in the DOM

The CSS Custom Highlight API isn't built for per-token coloring at scale.

Detailed in the section above. We replaced the API with per-token <span>s in the morph payload — what every production code surface (Monaco, CodeMirror, GitHub, Sourcegraph) actually uses. The wire cost barely moved (brotli eats the class names) and the post-jump stall went from 200–800 ms to zero in WebKit.

05 · The crowded sidebar

500 file rows in the DOM. 500 IntersectionObservers.

A performance trace flagged the sidebar as the largest single subtree on the page — the file tree mounted every row up front and attached an IntersectionObserver per row for hover‑prewarm. We virtualized: the server emits file metadata as a JSON blob, the client mounts ~25 rows in the visible window, and prewarm fires once per row at mount time. DOM dropped from 7,814 to 5,290 elements. Cold LCP improved from 65 ms to 57 ms.

06 · The sidebar joins the server

The client virtualizer was an outlier in an otherwise server-driven architecture.

The first virtualizer shipped all file metadata as a ~50 kB JSON blob to the browser and reimplemented the file-window pattern client-side. It worked, but it violated the project principle that state lives on the server. It also had a subtle mobile bug: each scroll re-rendered the row subtree with innerHTML, so a tap that landed during a small momentum scroll could be delivered on a replaced DOM node. We moved the sidebar onto the same shape as the diff window: POST /sidebar ships scroll telemetry, pushProjection emits a slice morph onto #file-tree .file-rows, the active class is server-baked, and stable id="file-row-{fid}"s let Idiomorph preserve DOM identity across morphs. Drawer-open re-centering became another small command: POST /sidebar/recenter sets the server's sidebarScrollTop to put the active file at the list's upper third, and a sidebarJumpToPx signal moves the client's scroll position to match. Cold LCP dropped another 23 ms (page no longer carries the JSON); jump-time forced reflow dropped from ~211 ms to ~118 ms (no client-side sidebar follow-up).

07 · The fat morph

Three SSE event types, three skip-when-unchanged fingerprints, three places to plumb a new feature.

The split protocol — #ds-window morph, signals patch, #file-tree .file-rows morph — earned its keep when each piece had different cadence and cost. It also meant every new feature was a routing decision (which event carries this?), and the per-area fingerprints could disagree, letting an active-class toggle slip through when its visible bytes happened to balance to zero. We collapsed to a single selector #app, mode inner fat morph that re-renders the whole UI from state on every push. Imperative scrolls (jumpToPx, back‑to‑top, sidebar recenter) ride inside the morph as a transient <div id="cmd-N" data-init="…"> with a bumped sequence number, so Idiomorph treats it as a fresh node and Datastar re‑fires the binding. No $jumpToPx > 0 gate, no special-cased scroll-to-zero. One mental model.

08 · Initial paint, inlined

The first paint shouldn't depend on the SSE stream's first chunk.

Browsers buffer the first body chunk of a streaming response on different schedules. Letting the first‑paint experience hinge on when that chunk arrives was the wrong shape: there's no header, no response setting, and no encoding choice that lets you depend on it. The fix is structural — render the initial diff slice + initial sidebar slice directly into the HTML response, plus an inline <script> that sets scrollTop on both scrollers before first paint. The SSE attaches in parallel and the fingerprint gate skips its first push because the rendered state already matches the HTML the browser just parsed.

09 · content-visibility, revisited

Each scroll-driven morph was styling ~3,000 token spans the user couldn't see.

Per-scroll style recalc was 75 – 110 ms on 3K‑4K elements, and forced reflow inside Datastar's morph added another ~70 ms. The cause: a file section's ~3,000 syntax-token spans get styled the moment the section enters the DOM, even if it's nowhere near the viewport. We tried content-visibility: auto on .file-section in 2026-05; Safari occasionally failed to activate a section's subtree after a jump and the diffview went blank. The file-windowed architecture changed the calculus: a section's subtree no longer gets reconstructed on each scroll (Idiomorph matches #f-<fid>, only morphs row content), so the activation-vs-morph race no longer exists. We turned content-visibility back on. The Chrome performance trace now flags neither DOMSize nor ForcedReflow during scroll — the browser styles a section only when the viewport enters it.

Other things we learned the hard way

The small ones that kept biting.

A · Stable ids

Give every row a stable id, or pay for it in layout shift.

Datastar's morph matches new children to existing ones by id, falling back to positional matching when the ids are absent. Without an id, every row's style="top:Npx" got rewritten on every push. CLS dropped from 7.72 to 0.00 the moment rows got id="r-N".

B · No inline scripts in morphs

iOS WebKit doesn't run them reliably.

Idiomorph-injected <script> tags are spec‑permitted to skip execution; some engines run them anyway, iOS WebKit doesn't. Imperative actions ride through a signal change (or a data-init on a fresh- id element) and are picked up by a body-level data-effect, never as an inline <script> inside a morph's HTML.

C · Sticky outer height stays stable

A child that shrinks the sticky shifts the page below.

A child rule that changed under @container scroll-state(stuck) shrank the sticky element from 72 px to 40 px, which — combined with a negative margin-bottom — shifted the entire diff upward by 32 px the moment the bar pinned. Fixing the outer height and absolutely positioning the bar inside removed the jump entirely.