Time-Travel Debugging Meets LLMs: Record/Replay Architectures That Supercharge Code Debugging AI

Debugging AIs are only as good as the context they see. Logs and stack traces help, but when you want a language model to replicate, reason about, and fix a bug with precision, you need deterministic replay. Time-travel debugging has been around for years, but fusing it with LLMs demands a deliberate architecture: capture the right signals, replay with high fidelity, and shape the trace so the model draws causal lines rather than hallucinations.

This article presents a practical blueprint for feeding deterministic replays to LLM debuggers. It dives into capturing syscalls, network, heap snapshots, and symbols; balancing overhead versus fidelity; local versus cloud deployments; privacy tradeoffs; and a production-ready pipeline that can ship in real teams. If you have used rr, WinDbg TTD, Pernosco, or PANDA, consider this a bridge from those proven techniques to LLM-centric workflows.

TL;DR

Deterministic record/replay provides the missing substrate for reliable LLM debugging.
Capture what matters: syscalls, file snapshots, network I/O, time and randomness sources, process tree, symbols, and periodic heap snapshots.
Offer fidelity tiers that balance overhead: rapid (syscalls), balanced (syscalls + network + symbols + periodic heap diffs), forensic (full memory record).
Use container sanding and manifest-driven packaging so replays are reproducible locally and in CI/cloud.
Provide an LLM replay adapter that summarizes, vectors, and streams trace windows on demand.
Bake in privacy: redact PII, hash secrets, and allow client-side policy before any upload.

Why LLM Debuggers Need Deterministic Replay

LLMs excel at synthesizing patterns across large contexts, but they struggle with the non-determinism and partial observability of real systems. Common pitfalls when you feed a model only logs and a stack trace:

Missing causal events: The read that consumed an empty buffer; the DNS timeout before a fallback; the signal that interrupted a critical section.
Non-deterministic races: A heisenbug that appears one run in twenty, caused by a specific timing interleaving.
JIT and runtime variance: A JIT emits different code between runs, causing divergent behavior under identical inputs.

Deterministic replay addresses these by giving the model a frozen world:

Every syscall, signal, and scheduling decision is recorded and replayed.
All sources of non-determinism (time, randomness, network) are virtualized.
Memory state can be sampled (periodically or fully) to reconstruct object graphs.

The result is an LLM that can step forward and backward in time, cover why a branch was taken, and propose fixes based on provable evidence instead of speculation.

Requirements for Replay Quality

To make replays useful for LLM debugging, capture layers must satisfy:

Completeness for causality: Capture all external inputs and internal sources of entropy that can change control flow.
Stability across environments: Replays should run on developer laptops, CI, or a cloud sandbox.
Efficient storage and transport: Compress and chunk traces; make them seekable.
Rich symbolization: Map addresses to function names, files, and lines; preserve inlined frames and template/generics instantiations.
Query-friendly structure: Provide indexes by thread, file descriptor, object identity, and time.

Concretely, you want to capture:

Syscalls: open, read, write, mmap, futex, epoll, ioctl, clone/spawn/exec, signals, and exit codes.
Filesystem: snapshots of files read and written, including metadata; overlays for container file systems.
Network: inbound and outbound payloads, timing, DNS, TLS session keys if available.
Time and randomness: wall clock, monotonic clock, getrandom, /dev/urandom, PRNG seeds.
Process tree and environment: arguments, environment variables, working directory, locales, CPU features.
Symbols and debug info: DWARF/PDB, symbol tables, source line mappings, inlined frames, build IDs.
Heap snapshots: periodic memory pages plus allocation metadata; incremental diffs to bound cost.
Thread scheduling: context switches, runnable/blocked states, mutex/futex waits and wakes.

A Practical Architecture: Record, Package, Replay, Reason

The architecture breaks down into four major subsystems, with clear contracts between them:

Recorder: captures events and state with minimal perturbation.
Packager: normalizes, compresses, and indexes the capture as a content-addressable artifact.
Replayer: instantiates the process world in a sandbox and enforces deterministic inputs.
LLM Adapter: exposes a queryable, summarized view for the model with just-in-time trace streaming.

1) Recorder: Multi-layer capture

User-space interposition:
- LD_PRELOAD on Linux or DYLD_INSERT_LIBRARIES on macOS to intercept libc calls: open, read, write, clock_gettime, getrandom, socket, connect, accept, send, recv, poll, epoll_wait.
- On Windows, DLL injection or API hooking for Win32 calls; leverage ETW providers and Time Travel Debugging where available.
Kernel-level capture:
- eBPF uprobes/kprobes and tracepoints for sys_enter/sys_exit, sched_switch, net events.
- ptrace/seccomp-bpf for forced trapping of target syscalls to virtualize results.
- Linux auditd as a fallback for coarse syscall trails.
Network funnel:
- Transparent proxy (TPROXY + iptables) to intercept and record TCP/UDP payloads per connection.
- Optional TLS key logging (SSLKEYLOGFILE for OpenSSL/boringssl) to decrypt captures, or terminate TLS at a local sidecar in dev.
Filesystem snapshotting:
- OverlayFS to capture all writes.
- For reads, track content hashes and store any file bytes read during the session (content-addressable to deduplicate across runs).
Time and randomness virtualization:
- Intercept clock_gettime and related syscalls; return recorded values.
- Intercept getrandom and /dev/urandom reads; record and replay byte streams.
Process and thread tree:
- Capture clone, fork, exec, prctl; record affinity, env, and argv.
- Record scheduling events (sched_switch) for concurrency reasoning.
Heap and object snapshots:
- Language-specific hooks for high-value runtimes:
  - C/C++: jemalloc/tcmalloc profiling, glibc malloc hooks; periodic /proc/pid/mem page sampling.
  - Go: runtime tracing, heap profiles, goroutine dumps.
  - JVM: JFR (Java Flight Recorder) events; AsyncGetCallTrace; JVMTI heap iteration snapshots.
  - Python: tracemalloc, sys.setprofile frames, object graph dumps.
  - Node.js: heap snapshots via v8 inspector protocol, async hooks for promise/job queues.
Symbol capture:
- Capture binaries, shared libraries, and build IDs; collect DWARF/PDB bundles and source maps (for JS/TS).

The recorder should be togglable at runtime via env vars or a small CLI to minimize friction.

2) Packager: Manifests, chunking, compression, and indexing

Trace artifacts can be large. The packager makes them manageable and portable:

Manifest schema:
- Process tree with pids/tids, parent relations.
- Event stream segments (by time or event count) with offsets and indexes.
- Data blobs: file contents, network payloads, memory pages, TLS keys.
- Symbol bundles keyed by build-id.
Chunking strategy:
- Segment by time slice (e.g., 1–5 seconds) or by event count (e.g., 100k events).
- Memory snapshots as base + delta pages (copy-on-write page hashes; content-addressed).
Compression and dedup:
- zstd with long-distance mode for text-heavy payloads; dictionary training across frequent headers.
- Content-addressable storage (CAS) to reuse identical libraries, file blobs, and snapshot pages.
Indexes:
- Per-thread and per-fd event indexes for fast seek.
- Symbolized callsite cache to rapidly map PCs to source lines.
- Object ID maps for sockets, files, mutexes.

3) Replayer: Sandbox and determinism enforcement

Replays should run in an isolated, reproducible environment:

Sandbox runtime:
- Linux: nsjail or firecracker microVM; tie cgroups and namespaces (pid, net, mount, user, uts) to isolate effects.
- macOS: sandbox-exec and psuedonamespaces; Windows: containers or Hyper-V isolation.
Determinism controls:
- ptrace/seccomp gate for syscalls; serve recorded results and payloads.
- Virtualize time and randomness using the recorded streams.
- Schedule determinism: enforce recorded order or deterministic interleavings for concurrency. Incorporate rr-style facilitation where possible.
Artifact provisioning:
- Mount file overlays; map CAS blobs to expected paths.
- Preload recorded libraries to ensure symbol addresses align; pin the CPU feature set if needed.
Debugger bridges:
- Provide gdbserver/lldbserver endpoints; enable step-back if supported.
- Allow high-level queries (e.g., show me writes to fd 7 between t=1.2s and t=1.3s).

4) LLM Adapter: From raw traces to model-ready context

Dumping a 5GB trace into a prompt is counterproductive. The adapter abstracts:

Summarizers:
- Construct a timeline narrative: key syscalls, errors, race points, perf stalls.
- Build per-thread and per-resource summaries: file lifecycles, socket transactions.
- Heap deltas highlighting growing structures and leaked objects.
Temporal retrieval:
- Embed short narrative segments and code spans; use a vector store keyed by time and resource.
- On demand, stream precise trace windows (events and disassembly) for a function or time range.
Schema translation:
- Provide a compact JSON/CBOR schema for events so the LLM can reason structurally.
- Offer code+trace parity: map each callsite in source to its runtime events.
Safety filters:
- Redact PII before leaving the developer’s machine; treat secrets and memory strings with hashing or FPE.

Overhead vs Fidelity: Pick the Right Capture Tier

Capturing everything is ideal, but not always practical. You need deployment modes that progressively add fidelity at increasing cost. Historical data points provide guidance:

rr (Mozilla) reports 1.2x–2x slowdown on many workloads for syscall + schedule record/replay in userland with hardware support (e.g., performance counters for nondet events). CPU-bound code can approach 1.0x–1.3x; I/O-heavy or multithreaded may hit 2x–5x in worst cases.
eBPF-based syscall tracing can add 1%–10% overhead depending on filters and volume.
Network PCAP capture overhead is typically modest (<5%) if offloaded with kernel TPROXY and zero-copy ring buffers.
Heap snapshot overhead is highly variable: periodic page hashing and diffs can add 5%–30% depending on cadence; language-integrated profilers can be cheaper but less complete.

A practical tiering strategy:

Rapid (baseline):
- Capture syscalls, time/randomness, process tree, minimal symbols.
- Network capture optional; no heap snapshots.
- Target overhead: 5%–20%.
- Use for local dev reproducibility and CI smoke failures.
Balanced (recommended default):
- Adds network payloads, file content snapshots on demand, JIT/runtime summaries (JFR, Go runtime, tracemalloc), periodic heap deltas (e.g., every 250ms or 10k allocations, whichever first), scheduler events.
- Target overhead: 15%–50%.
- Use for intermittent test failures, performance anomalies, memory leaks.
Forensic (deep dive):
- Full memory record (incremental pages), all syscalls, full network, symbol bundle with source maps, deterministic scheduling.
- Target overhead: 50%–200%.
- Use for flaky concurrency bugs, security incidents, or release blockers.

Key idea: make the tier selectable by environment variable and allow escalation mid-run (e.g., switch to forensic when an invariant breaks). The recorder can start in rapid mode and, upon detecting a suspicious event, increase fidelity for the next N seconds.

Language- and Runtime-specific Notes

Native C/C++:
- rr remains the gold standard for user-space record/replay of multithreaded native code on Linux. You can integrate rr traces directly and augment with network/file payload capture.
- For malloc events, prefer jemalloc profiling with epoch capture; store allocation sites via frame-pointer unwinding.
Go:
- Use runtime/trace and runtime metrics; capture goroutine states and preemptions.
- Delve (dlv) can interface for symbolization and step debugging during replay.
JVM:
- JFR provides low-overhead event streams; combine with JVMTI for heap object sampling.
- Stabilize JIT by pinning flags (e.g., disable tiered compilation for replays or capture JIT logs to rehydrate codecache mapping).
Python:
- tracemalloc for allocation stacks; sys.setprofile for function call events; line profiling if needed.
- Virtualenv and package versions must be frozen in the manifest.
Node.js:
- v8 heap snapshots and async hooks; map Promises and event loop ticks to a timeline.
- Capture source maps for TS to JS mapping.
Windows:
- WinDbg Time Travel Debugging (TTD) captures user-space record/replay; ETW for system events; ProcMon for coarse I/O if needed.
Containers and microVMs:
- Run services under a fixed base image with overlay; mount snapshot-friendly volumes.
- gVisor or kata can provide additional syscall mediation for exactness.

Local vs Cloud: Where Should Replays Live?

There is no one-size-fit-all answer; adopt a hybrid approach.

Local-first (developer laptops):
- Pros: privacy by default; low latency; easy iteration; works offline.
- Cons: limited storage; heterogeneous environments; weaker isolation; less discoverability across teams.
- Strategy: default to Rapid or Balanced tier; keep traces ephemeral or prunable; provide a local LLM (small model) for immediate triage.
Cloud-backed:
- Pros: durable storage, fleet-level search, better isolation, heavy compute for symbolization and summarization, consistent replayer images.
- Cons: privacy concerns; egress costs; permissions complexity.
- Strategy: client-side redaction; opt-in upload; strict RBAC; encryption at rest and in transit; object storage with lifecycle policies; autoscaled summarizer workers.

A healthy pipeline lets devs promote a local trace to the cloud when collaboration or persistence is needed. Attach the trace to a CI run or an issue, then allow others to replay deterministically in a sandbox.

Privacy and Compliance: Make Safe-by-Design the Default

Debug traces are a magnet for secrets and PII. Treat privacy as a first-class dimension of the design.

Redaction policies:
- Before upload, scan memory strings and payloads with pattern detectors (e.g., secret scanners, regex rules for tokens, emails, phone numbers) and redact or tokenize.
- Allow per-project dictionaries of known sensitive keys and domains.
Hashing and tokenization:
- Replace sensitive values with salted hashes or format-preserving encrypted tokens so relational structure is preserved while content is hidden.
- Maintain a local-only mapping so a dev can unredact if needed.
Scoped capture:
- Enable path-based and domain-based allow/deny lists for files and sockets.
- Suppress memory pages belonging to marked libraries or regions.
Governance:
- Immutable audit logs on who accessed which trace.
- Data retention and auto-expiry; per-tenant encryption keys.

The LLM adapter should run a final content filter before responding to a model query, ensuring no sensitive texts are left in the stream unless the request is explicitly permitted.

A Production-Ready Pipeline: End-to-End Blueprint

Below is a concrete, minimal design you can implement incrementally.

CLI and Environment

devtrace record --cmd './bin/test flaky_test'
devtrace pack --out trace.tarc
devtrace replay --trace trace.tarc -- cmd or test case
devtrace llm --trace trace.tarc --ask 'why did fd 7 return EAGAIN?'

Environment toggles:

DEVTRACE_MODE=rapid|balanced|forensic
DEVTRACE_REDACT=on
DEVTRACE_UPLOAD=ask|always|never
DEVTRACE_HEAP_SNAPSHOT_INTERVAL=250ms

Event Manifest Schema (simplified)

json
{
  'manifest_version': 1,
  'host': { 'os': 'linux', 'arch': 'x86_64', 'kernel': '6.6.1' },
  'processes': [
    { 'pid': 1234, 'ppid': 1, 'argv': ['./bin/test'], 'env': ['FOO=1'], 'build_ids': ['abc123...'] }
  ],
  'segments': [
    { 'id': 'seg-0001', 'start_ns': 0, 'end_ns': 1500000000, 'events': 'cas://sha256:...', 'index': 'cas://sha256:...' }
  ],
  'blobs': {
    'files': { '/etc/hosts@sha256:...': 'cas://sha256:...' },
    'net': { 'conn-42@seg-0001': 'cas://sha256:...' },
    'mem': { 'pid-1234@seg-0001@base': 'cas://sha256:...', 'pid-1234@seg-0001@delta-1': 'cas://sha256:...' },
    'symbols': { 'buildid-abc123': 'cas://sha256:...' }
  },
  'indexes': { 'by_thread': 'cas://sha256:...', 'by_fd': 'cas://sha256:...' }
}

Minimal LD_PRELOAD Interposer (Linux)

c
#define _GNU_SOURCE
#include <dlfcn.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/time.h>
#include <unistd.h>
#include <time.h>
#include <errno.h>

static ssize_t (*real_read)(int, void*, size_t);
static ssize_t (*real_write)(int, const void*, size_t);
static int (*real_getrandom)(void*, size_t, unsigned int);
static int (*real_clock_gettime)(clockid_t, struct timespec*);

__attribute__((constructor))
static void init() {
    real_read = dlsym(RTLD_NEXT, "read");
    real_write = dlsym(RTLD_NEXT, "write");
    real_getrandom = dlsym(RTLD_NEXT, "getrandom");
    real_clock_gettime = dlsym(RTLD_NEXT, "clock_gettime");
}

ssize_t read(int fd, void* buf, size_t count) {
    ssize_t r = real_read(fd, buf, count);
    // emit_event("read", fd, buf, r);
    return r;
}

ssize_t write(int fd, const void* buf, size_t count) {
    // emit_event("write", fd, buf, count);
    return real_write(fd, buf, count);
}

int getrandom(void* buf, size_t buflen, unsigned int flags) {
    int r = real_getrandom(buf, buflen, flags);
    // record bytes, serve on replay
    return r;
}

int clock_gettime(clockid_t clk, struct timespec* ts) {
    int r = real_clock_gettime(clk, ts);
    // record and clamp during replay
    return r;
}

Note: for correctness across threads and signals, prefer seccomp-bpf and ptrace gating for syscalls over pure interposition; the interposer is useful for portability and unit tests.

eBPF Outline for Syscall Capture

c
// BPF pseudo: attach to sys_enter_* and sys_exit_* tracepoints
struct event_t { u64 ts; u32 pid; u32 tid; u16 sys_nr; s64 ret; u64 args[6]; };
BPF_PERF_OUTPUT(events);

TRACEPOINT_PROBE(raw_syscalls, sys_enter) {
  struct event_t e = {};
  e.ts = bpf_ktime_get_ns();
  e.pid = bpf_get_current_pid_tgid() >> 32;
  e.tid = (u32)bpf_get_current_pid_tgid();
  e.sys_nr = args->id;
  // read args from regs depending on arch
  events.perf_submit(args, &e, sizeof(e));
  return 0;
}

TRACEPOINT_PROBE(raw_syscalls, sys_exit) {
  // capture return value and error
  return 0;
}

Transparent Network Sidecar (iptables + TPROXY)

bash
# mark and redirect outbound TCP to local proxy on 127.0.0.1:15000
ip rule add fwmark 1 lookup 100
ip route add local 0.0.0.0/0 dev lo table 100
iptables -t mangle -A OUTPUT -p tcp -m socket -j TPROXY --on-port 15000 --tproxy-mark 0x1/0x1

The sidecar records payloads and metadata; during replay, it replays from cassette files instead of touching the network.

Replay Sandbox with nsjail

bash
nsjail \
  --mode o \
  --chroot /replay/rootfs \
  --cwd /work \
  --bindmount_ro /replay/overlays:/replay/overlays \
  --env DEVTRACE_REPLAY=1 \
  -- ./bin/test --seed 42

A small ptrace broker feeds syscalls with recorded results and enforces monotonic clocks.

LLM Query Adapter (Python sketch)

python
from devtrace import Trace, summarize, query_window
from llm import chat

trace = Trace.open('trace.tarc')
summary = summarize(trace, budget_tokens=2000)

resp = chat([
  { 'role': 'system', 'content': 'You are a debugging assistant.' },
  { 'role': 'user', 'content': 'Here is the failure summary:\n' + summary },
  { 'role': 'user', 'content': 'Why did fd 7 return EAGAIN at t=1.243s? Provide the causal chain.' }
])

if resp.needs_more_context:
    win = query_window(trace, start_ns=1_200_000_000, end_ns=1_260_000_000, filters={'fd':7})
    resp2 = chat([
      { 'role': 'assistant', 'content': resp.partial },
      { 'role': 'user', 'content': 'Additional events:\n' + win.to_markdown() }
    ])
    print(resp2.text)
else:
    print(resp.text)

The adapter limits initial context to a narrative summary and expands on demand with targeted windows.

Example Walkthrough: A Flaky Race in Production

Scenario: A Go service occasionally returns HTTP 500 on a POST endpoint. Tests pass locally, but CI shows failures twice a day. You enable Balanced capture in CI for the failing test and export the trace.

What the LLM sees and does:

Timeline summary:
- t=0.12s: goroutine 57 accepts connection on 0.0.0.0:8080.
- t=0.17s: reads 1.2KB request body; traces show Content-Type and JSON payload.
- t=0.21s: attempts to write to a channel ch that may be closed.
- t=0.22s: runtime sched_switch reveals goroutine 57 blocked; goroutine 12 closes ch due to context timeout.
- t=0.23s: write returns EPIPE; handler maps to 500.
Causal chain (from syscalls + Go runtime events):
- Context deadline exceeded triggers cleanup path; ch close occurs concurrently.
- A race exists between handler enqueue and cleanup.
Code mapping:
- Symbolized stack points to handler.go:142; channel write at handler.go:144.
- Inlined functions resolved via DWARF; go build info captured.
Fix recommendation:
- Swap unguarded send with select on ctx.Done(); check for closed channel.
- Or replace channel with buffered queue plus atomic closed flag.
Validation:
- The adapter replays deterministically with both interleavings and shows the fix avoids the crash.

This is the power of time-travel fed to an LLM: precise interleavings, grounded recommendations, and verifiable outcomes.

Cost Model and Sizing

Storage per minute:
- Rapid: 1–10 MB/min for syscall streams, depending on I/O volume.
- Balanced: 20–200 MB/min including network and periodic heap pages.
- Forensic: 200 MB–2 GB/min with full memory deltas.
CPU overhead:
- Dominated by compression and symbolization. Parallelize post-processing; keep recorder light.
Retention and pruning:
- Store segment-level CAS blobs; dedupe across runs via content hashes.
- Keep only failing segments and the 10s leading up to failure; drop the rest.

Integrations and Tooling

CI/CD:
- GitHub Actions: run failing jobs under devtrace record; upload artifacts on failure.
- Store in S3 or GCS with 7–30 day retention; attach links to PRs.
IDEs:
- Extensions for VS Code/JetBrains to open a trace, navigate timeline, and ask the LLM questions within the editor.
Observability:
- Bridge OpenTelemetry traces to replay segments: link span IDs to syscall and network events.
- Use OTel to trigger escalations: when a span error occurs, bump capture tier.

Evaluation: Measuring Impact Beyond Gut Feel

Adopt objective metrics to track whether LLM + replay actually helps:

Reproduction rate: fraction of failures deterministically reproduced locally/CI.
Time-to-root-cause: median minutes from failure to cause explanation.
Suggestion acceptance: percentage of LLM fix suggestions that are merged.
Flake suppression: decrease in flaky test reruns over time.
Performance overhead: monitor p95 build/test time deltas with capture enabled.

Datasets to bootstrap evaluation:

BUGSWARM: real-world fail-pass pairs for CI builds; wrap with record/replay to create a gold corpus for LLMs.
Synthetic suites: inject races, timeouts, and resource contention into microservices; verify the model detects and fixes them under replay.

Limitations and Open Problems

GPU and accelerator determinism:
- Non-deterministic kernels and driver scheduling hinder exact replay; capture tensors and inputs at boundaries as a compromise.
JIT nondeterminism at scale:
- Rehydrating JIT code caches consistently is non-trivial; pin flags or map PCs via symbolic debug rather than raw addresses.
Long-running services:
- Traces can grow without bound; use event-triggered windows and adaptive sampling; checkpoint at scenario boundaries.
Cross-host distributed replay:
- For multi-service bugs, you need correlated captures across nodes; clock skew and network ordering must be normalized.
Legal and compliance constraints:
- Jurisdictional limits on code and data movement can complicate cloud pipelines; keep a local-only path viable.

Opinionated Recommendations

Default to Balanced capture in CI for flaky suites; Rapid for local dev; Forensic only on demand.
Always virtualize time and randomness; the marginal effort pays off heavily in determinism.
Treat network capture as first-class; many bugs are at the boundaries.
Keep symbol stores rigorous: build IDs, DWARF/PDBs, and source maps must be part of the artifact process.
Push summarization to the edge; never upload raw memory unless absolutely necessary.
Design your LLM adapter for progressive disclosure: start with narratives, fetch raw evidence only when needed.

References and Further Reading

rr: lightweight record and replay for Linux user-space (https://rr-project.org/)
Pernosco: time-travel debugging as a SaaS over rr traces (https://pernos.co/)
WinDbg Time Travel Debugging (TTD) docs (https://learn.microsoft.com/windows-hardware/drivers/debugger/time-travel-debugging-overview)
PANDA: Platform for Architecture-Neutral Dynamic Analysis via QEMU record/replay (https://panda.re/)
eBPF Tracing (BCC, bpftrace) (https://github.com/iovisor/bcc) (https://github.com/iovisor/bpftrace)
Java Flight Recorder (JFR) (https://openjdk.org/projects/jmc/)
Go runtime tracing (https://pkg.go.dev/runtime/trace)
OpenTelemetry (https://opentelemetry.io/)

Closing

Time-travel debugging is the durable substrate that makes LLM debuggers trustworthy. With a pragmatic record/replay pipeline, you can hand models deterministic, richly symbolized slices of program execution. The result is less speculation, more causality, and faster, more reliable fixes. Build the pipeline once; let every developer and every model stand on it.