Trace-Driven Code Debugging AI: eBPF Snapshots, Time Travel, and the End of “Can’t Repro”

Modern teams spend a disproportionate amount of time chasing bugs they can’t reproduce. The bug happened in production, on a specific kernel, inside a specific container image, during a specific traffic pattern. You have partial logs and a vague Grafana bump. You suspect a race, a USB of shame, and a week of log-enabled redeploys while you pray it reproduces again.

There’s a better path: treat runtime behavior as data. Capture deterministic, redacted traces of the code paths that matter, then replay them offline as many times as needed—across debuggers, profilers, fuzzers, and code debugging AI—until the fix is obvious, testable, and safe.

This article describes a practical architecture for trace-driven debugging using:

eBPF for low-overhead, selective runtime capture without code changes.
Record/replay and time-travel debuggers for deterministic reproduction and root cause analysis.
On-host redaction to ship useful traces without leaking secrets or PII.
An AI loop that consumes traces and symbols, proposes fixes, and verifies them against the same deterministically replayed scenario.

We’ll cover concrete capture points, Linux specifics, language runtimes (Go/Java/Node/CPython), pitfalls that sink naive deployments, and a rollout plan that doesn’t degrade SLOs or privacy posture.

Executive Summary

The reproducibility gap is largely an observability gap. Logs are lossy and biased; traces can be precise and deterministic.
eBPF lets you capture per-request causality, syscall inputs/outputs, network payload boundaries, timing, and critical user-level probes—with overhead that’s usually lower than broad log sampling.
Deterministic replay means recording sources of nondeterminism (time, randomness, scheduling, I/O) and feeding them back to the program in a sandbox, often with a time-travel debugger on top.
Redaction belongs at the source: structure and hash/tokenize sensitive payloads before they leave the host. Avoid log spam, avoid risky writes, keep data minimization by default.
An AI assistant works best when fed structured traces, symbolized stack frames, and a stable reproduction harness. It can then propose minimal patches, generate unit/integration tests from the trace, and verify fixes deterministically.

Why “Can’t Repro” Is Still a Thing

Production differs from staging in hardware (NUMA, CPU instructions), kernel quirks, microlatencies, clock behavior, scheduler priorities, cgroup limits, and real traffic shape.
Logs approximate truth, but not deterministically. They miss preemption points, kTLS offload details, NIC coalescing, socket buffers, and libc-level peculiarities.
Adding logs to prod means extra IO and tail-at-scale issues, plus PII risk. You also shift timing, sometimes masking the race you hope to expose.
Without determinism, every bug becomes a statistical event you chase with correlation rather than a concrete sequence you can replay.

Determinism wins because it converts “try to trigger again” into “load the last 20 seconds of the failing request and step backward.”

What Is a Deterministic Trace?

A deterministic trace contains enough information to recreate a target execution path in a controlled environment. The minimum viable set typically includes:

Inputs and nondeterminism:
- Time reads (clock_gettime), random bytes (getrandom/urandom), PIDs/TIDs, environment.
- Syscall inputs/outputs, including network I/O boundaries and file offsets.
- Thread scheduling and synchronization events (mutex lock/unlock points) or a schedule log to replay the same interleaving.
Program state anchors:
- A snapshot of memory/registers at key points (e.g., onset of crash, watchpoint trigger) or a checkpoint from which forward/backward execution is faithful.
- Symbolized call stacks, code versions/build IDs, and module load addresses.
Causality context:
- Request/trace IDs across layers, cgroup/container identity, kernel version, CPU core, and NUMA node.

With that, you can drive either:

A full record/replay engine (rr-style) to step exactly through the same execution.
A partial replay harness that feeds the same syscalls and inputs to a recompiled binary, combined with time-travel debugging over a snapshot.

eBPF: Capture Without Code Changes

Extended BPF (eBPF) runs verified programs in the kernel, making it possible to attach to kernel or user-space events (kprobes, tracepoints, uprobes, USDT) with low overhead and strong safety guarantees.

What you should capture:

Syscalls: open/close, read/write, send/recv, connect/accept, mmap/munmap, futex, clone, execve, timerfd, epoll.
Network boundaries: per send/recv, TCP retransmits, TLS termination handoffs (kTLS), socket addresses, sizes.
File and storage: offsets and lengths, not payloads by default. Capture content selectively for files known to carry structured configs.
Threading and scheduling: futex wait/wake, context switches (sched:sched_switch), CPU core, runnable queue length.
Time and randomness: gettimeofday/clock_gettime, vDSO calls, getrandom.
User-level events: uprobes on functions of interest; USDT probes in runtimes (Go, Java JVMTI/JFR, Python C-API hooks, Node’s USDT probes).

Key properties:

Selectivity: Attach to the code you care about (using build IDs and symbol names), and gate capture with per-request budget and sampling.
Low overhead: Modern ring buffers, CO-RE (Compile Once, Run Everywhere) BPF, and per-CPU maps keep overhead usually in the low single digits for selective tracing.
Safety: BPF verifier prevents unsafe memory access; your code should include tight bounds and minimal per-event work.

eBPF Skeleton Example (libbpf + CO-RE)

The following illustration shows a BPF program that captures syscall boundaries and a user-level probe on a Go function, with inline redaction of obvious secrets. It uses a ring buffer to stream events to user space.

c
// trace.bpf.c
#include <vmlinux.h>
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_core_read.h>
#include <bpf/bpf_tracing.h>

struct event {
    u64 ts;
    u32 pid;
    u32 tid;
    u32 type; // 1=sys_enter, 2=sys_exit, 3=uprobe
    u32 sys_nr;
    u64 arg0;
    u64 arg1;
    u64 arg2;
    u64 ret;  // for exit
    u64 req_id_hash; // request id hashed
};

struct {
    __uint(type, BPF_MAP_TYPE_RINGBUF);
    __uint(max_entries, 1 << 24);
} rb SEC(".maps");

// Simple FNV-1a 64-bit hash for request IDs
static __always_inline u64 fnv1a64(u64 x) {
    u64 h = 1469598103934665603ULL;
    #pragma unroll
    for (int i=0;i<8;i++) {
        u8 b = (x >> (i*8)) & 0xff;
        h ^= b; h *= 1099511628211ULL;
    }
    return h;
}

// Redaction helper: zero out big buffers; we only keep lengths in kernel
static __always_inline void redact(u64 *a0, u64 *a1) {
    // Example: if arg0 points to user buffer and arg1 is length, drop pointer
    *a0 = 0; // never emit raw pointers/payloads
}

SEC("tracepoint/syscalls/sys_enter")
int on_sys_enter(struct trace_event_raw_sys_enter *ctx) {
    struct event *e = bpf_ringbuf_reserve(&rb, sizeof(*e), 0);
    if (!e) return 0;
    e->ts = bpf_ktime_get_ns();
    e->pid = bpf_get_current_pid_tgid() >> 32;
    e->tid = (u32)bpf_get_current_pid_tgid();
    e->type = 1;
    e->sys_nr = ctx->id;
    e->arg0 = ctx->args[0];
    e->arg1 = ctx->args[1];
    e->arg2 = ctx->args[2];

    // Example inline redaction for write/send payloads
    if (ctx->id == __NR_write || ctx->id == __NR_sendto || ctx->id == __NR_sendmsg) {
        redact(&e->arg0, &e->arg1);
    }

    // Derive per-request hash from task struct (e.g., cgroup id or TLS)
    // Placeholder: use tid for demo
    e->req_id_hash = fnv1a64(e->tid);
    bpf_ringbuf_submit(e, 0);
    return 0;
}

SEC("tracepoint/syscalls/sys_exit")
int on_sys_exit(struct trace_event_raw_sys_exit *ctx) {
    struct event *e = bpf_ringbuf_reserve(&rb, sizeof(*e), 0);
    if (!e) return 0;
    e->ts = bpf_ktime_get_ns();
    e->pid = bpf_get_current_pid_tgid() >> 32;
    e->tid = (u32)bpf_get_current_pid_tgid();
    e->type = 2;
    e->sys_nr = ctx->id;
    e->ret = ctx->ret;
    e->req_id_hash = fnv1a64(e->tid);
    bpf_ringbuf_submit(e, 0);
    return 0;
}

// Uprobe on user function, resolved via ELF symbol + build-id in user space
SEC("uprobe/target_fn")
int BPF_KPROBE(on_target_fn) {
    struct event *e = bpf_ringbuf_reserve(&rb, sizeof(*e), 0);
    if (!e) return 0;
    e->ts = bpf_ktime_get_ns();
    e->pid = bpf_get_current_pid_tgid() >> 32;
    e->tid = (u32)bpf_get_current_pid_tgid();
    e->type = 3;
    e->sys_nr = 0;
    e->req_id_hash = fnv1a64(e->tid);
    bpf_ringbuf_submit(e, 0);
    return 0;
}

char LICENSE[] SEC("license") = "Dual BSD/GPL";

The user-space loader correlates ring-buffer events with per-request IDs (e.g., from HTTP headers or traceparent), enriches with symbolization, and applies deeper redaction policies before persistence.

bpftrace One-Liners for Incident Response

For quick one-off captures during an incident, bpftrace is a great complement:

bash
# Trace futex waits and wakes with stack traces (sampling)
bpftrace -e 'tracepoint:sched:sched_switch { @ts[pid, tid] = nsecs; }

usdt:/path/to/binary:go:gc:markassist { printf("markassist %d %d\n", pid, tid); }

kprobe:__x64_sys_read /comm == "myservice"/ { @reads[tid] = count(); }
'

Snapshots and Time Travel

eBPF captures causality at the boundary; snapshots give you stateful checkpoints to time-travel.

Options for snapshots and replay:

Process-level checkpoint/restore: CRIU can checkpoint a process tree, sockets, and memory mappings. It is finicky but effective for some workloads. Useful for pausing near the failure and cloning the environment for replay.
Coredumps plus symbolized unwind tables: capture core on failure and combine with eBPF timeline to align where a crash occurs and what inputs preceded it.
VM-level snapshots: With KVM/QEMU or cloud hypervisor snapshots, you can freeze the entire machine state and attach Intel PT/ARM CoreSight to capture control flow. Overkill for most microservices, invaluable for kernel or perf-sensitive debugging.
Record/replay debuggers:
- rr (Mozilla): single-threaded or pinned-thread record/replay with schedule logs; great fidelity but overhead can be 2–5x and doesn’t cover all syscalls/hardware features.
- Undo/LiveRecorder or UDB: commercial time-travel solutions with multicore support.
- Pernosco (on top of rr): cloud time-travel UI for deep root cause analysis.

A pragmatic approach: default to eBPF event streams with small “micro-snapshots” (e.g., perf-based user stacks, heap samples, and register windows around faults), and escalate to CRIU/VM snapshots only when necessary.

Redaction and Privacy: Do It On-Host

If you’re going to capture network and syscall edges, you must minimize risk. Principles:

Redact at source. eBPF code and user-space agent enforce rules before persistence.
Default: never capture full user payloads. Capture lengths, offsets, hashes, and protocol metadata. Opt-in structured payload capture for specific content-types with field-level redaction (e.g., mask emails, tokens, card PANs).
Hash or tokenize identifiers in a keyed, rotating scheme to enable correlation without reversibility. Prefer HMAC-SHA256 with daily keys stored in KMS; rotate aggressively.
Bloom filters for secrets: use on-host detectors to spot high-entropy blobs, bearer tokens, or JWTs, and replace with tokens.
DSAR/compliance support: keep per-tenant encryption keys to selectively purge or decrypt.

Example redaction pipeline (user space):

python
# redactor.py
import hmac, hashlib, time
from typing import Dict

ROTATION_PERIOD = 24*3600

class Hasher:
    def __init__(self, kms):
        self.kms = kms
        self.load_key()
    def load_key(self):
        epoch = int(time.time()) // ROTATION_PERIOD
        self.key = self.kms.get_key(f"trace-hmac-{epoch}")
    def hmac_id(self, val: bytes) -> str:
        return hmac.new(self.key, val, hashlib.sha256).hexdigest()

# Example rule set
RULES = [
    {"field": "http.request.headers.authorization", "action": "drop"},
    {"field": "http.request.cookies", "action": "hash"},
    {"field": "db.query.params", "action": "mask"},
]

Architecture: From Host to AI

A reference architecture to operationalize trace-driven debugging:

On-host tracer (Daemonset/Agent):
- eBPF programs attach to selected tracepoints, kprobes, and uprobes.
- User-space agent pulls from ring buffers, resolves symbols (build-id + debuginfo), correlates to request IDs, applies redaction, buffers locally with backpressure.
- Implements per-request budgets (event and byte caps), sampling, and kill-switch.
Relay and storage:
- Streams structured events over gRPC with mTLS to a collector.
- Writes to columnar storage (Parquet) partitioned by service, build-id, and time; cold storage in object store with lifecycle policies.
- Builds secondary indexes (trace-id, PID/TID, syscall range, error codes).
Replayer and sandbox:
- Converts event streams into a replay harness:
  - Stub time and random.
  - Feed recorded syscalls via seccomp user-notify or LD_PRELOAD interposition.
  - Inject network payload boundaries with recorded timings.
  - Apply scheduling decisions (e.g., recorded futex order) using a scheduler shim.
- Optional: CRIU/VM snapshot restore for deep fidelity sessions.
AI orchestration:
- Summarizes trace into a model-consumable timeline with call stacks, diffs, and memory maps.
- Prompts a debugging model with: failing timeline, symbolized code, known bugs, and constraints.
- Generates patch candidates and unit tests derived from the trace; verifies by replaying in CI.

Data model for events:

Common fields: ts_ns, pid, tid, cpu, service, version/build-id, container/cgroup, trace_id_hash.
Event types: syscall_enter/exit, net_tx/rx, uprobe/uretprobe, sched_switch, futex_wait/wake, fault/signal, alloc/free sample, gc events.
Payloads are schematic (sizes, offsets, hashes) with content redaction metadata.

Language Runtime Notes

Go:
- Use USDT probes or uprobes on runtime functions (e.g., net/http Serve, runtime.casgstatus, runtime.schedule) to derive request boundaries and contention hot spots.
- Go’s goroutine scheduler introduces nondeterminism; record futex interactions and poller wakeups. Replay by serializing scheduler decisions for the failing request.
Java:
- JVMTI/JFR events can provide safepoints, GC cycles, class loading. Combine with uprobes on JNI bridges and syscalls.
- For redaction, use Servlet filter-derived trace IDs to correlate requests.
Node.js:
- USDT probes exist for HTTP parser, GC, and event loop. Syscalls capture covers the rest.
- Time and randomness often come via V8; intercept both or force seeds during replay.
CPython:
- C-API tracing and uprobe on PyEval_EvalFrame for hotspots; watch GIL contention via futex.

Deterministic Replay Techniques

The aim is to eliminate nondeterminism in the target execution:

Time and randomness: LD_PRELOAD interceptors for clock_gettime/getrandom or seccomp user-notify to broker syscalls back to the replayer with recorded return values.
Syscalls: Provide recorded return values and errno; for reads, deliver recorded bytes; for writes, discard or compare.
Scheduling:
- Record futex wait/wake events with causal order; during replay, control thread run order to reproduce interleavings.
- Pin threads to cores to avoid cross-core timing drift.
Signals and faults: Deliver signals with recorded timing; on page faults, seed memory from snapshot or recorded content.
I/O: For sockets, reconstruct stream boundaries. If TLS is in-app, record ciphertext or plaintext at the boundary where you can reproduce behavior without leaking secrets.

This can be done in user space for most services. Use heavier-weight rr/Undo when necessary.

Example: A Flaky Use-After-Free in a Go Service

Symptoms: Under load, a Go microservice occasionally crashes with SIGSEGV in a hot path. Staging never reproduces it.

Approach:

Enable eBPF capture on the service with:
- syscalls (read, write, epoll_wait, futex), sched_switch, and uprobes on the suspected function and runtime.casgstatus.
- Redaction rules set to drop payloads; capture sizes only.
- Per-request budget: 5,000 events, sampling at 1% of requests.
A crash occurs. The agent takes a minidump (core) plus the last 200 ms of trace for the crashing goroutine and related futex interactions.
The replayer sets up a sandbox:
- Stubs time/random.
- Replays recorded futex ordering to reproduce the race.
- Uses the core to prime memory maps; starts from a checkpoint just prior to the failure.
Time-travel debugging shows a map pointer freed in a different goroutine due to a missed reference in a rarely-taken error branch.
AI assistant reads the trace + code, proposes a fix: move ref decrement after all early returns, add test harness that drives the same interleaving via a scheduler stub.
CI runs the replayed scenario on the patch. It no longer crashes; the test becomes part of the suite. Bug class closed.

Total impact: No ad-hoc logging, minimal extra prod overhead, deterministic root cause, repeatable test.

Performance, Overhead, and Backpressure

Targets:

On-host overhead under typical selective capture: 1–3% CPU for the instrumented service host, more if you add heavy user stack unwinding or capture payload bytes.
Avoid perf cliffs:
- Use ring buffers instead of perf buffers where possible.
- Pre-allocate and reuse user-space buffers.
- Limit stack unwinding frequency; sample stacks, don’t capture on every event.
- Gate heavy uprobes with per-request event quotas.
Backpressure: When the collector is slow or the host is under pressure, drop low-value events first, keep causality (e.g., keep syscall order/retvals, drop verbose net events). Maintain counters for dropped events in the trace header for honesty.

Pitfalls and How to Avoid Them

Ring buffer overruns: Use larger per-CPU buffers, backpressure, and priority channels. Drop non-essential events first.
Symbol resolution drift: Always attach uprobes using build IDs, not file paths, and collect build ID to symbol map alongside traces.
Kernel/version divergence: Use CO-RE and BTF to keep programs compatible across kernels. Maintain a tiny compatibility test suite in CI for your kernels.
Container boundaries: Attribute events to the right container/cgroup; embed cgroup id in event headers; map to Kubernetes metadata on the collector.
JIT and managed languages: JITed code addresses change; rely on runtime probes (USDT/JFR) and function boundaries that the runtime exports.
Privacy violations: Redaction must be enforced on-host with deny-by-default policies, and configuration must be code-reviewed. Maintain a policy pack for common PII patterns.
RR/Undo overhead: Use rr sparingly and primarily in deep-dive sessions. Default to lightweight replayers for the common case.
Clock domain confusion: Use monotonic clocks (ktime), not wall time, in traces. Record NTP adjustments if wall time matters.
Multithreaded replay fidelity: Reproducing scheduling requires capturing futex orders and occasionally blocking syscalls to match causal order. Start with single-request determinism; widen scope as needed.

Rollout Plan: From Pilot to Default-On

Pilot in non-critical services:
- Enable syscall, sched_switch, and a couple of language-specific probes.
- Measure overhead at 0%, 0.1%, 1% sampling. Record event-loss metrics.
Define SLAs and budgets:
- Per-node CPU budget (e.g., <2% additional CPU), per-request event cap, max trace size.
- Kill-switch and dynamic sampling via feature flags.
Integrate redaction policies:
- Write tests that feed sensitive payloads and assert they are dropped/masked.
- Validate that only hashed IDs traverse the wire.
Storage and retention:
- Partition and lifecycle rules; 7 days hot (indexed Parquet), 90 days cold (compressed object store), encrypted per tenant.
- Metadata index with per-trace checksum and event loss counters.
Replay service in CI:
- Convert traces into deterministic tests; run against mainline nightly.
- Gate merges for flaky tests by requiring deterministic pass under replay.
AI assistant gated rollout:
- Start with summarization and root cause hints only.
- Allow suggested patches behind a review workflow that includes replay tests.
Expand coverage:
- Add user-level probes where you see high-value blind spots.
- Increase sampling during incidents automatically (alert-driven burst sampling) with budget caps.

Feeding Traces to AI—Effectively

Large language models excel when the input includes:

A compact, structured narrative:
- Ordered list of events leading to failure with timestamps and causal links.
- Symbolized stacks with source file/line and commit hashes.
- Environment diffs vs. staging.
Problem focus:
- State invariants that were violated (e.g., refcount negative, double free, unexpected errno on non-blocking I/O).
- Observed data races and lock orders from the trace.
Constraints:
- Non-functional requirements (latency budget, memory footprint).
- Redaction rules (never suggest logging secrets), and safe patch scopes.

Workflow:

Summarizer reduces a 50k-event trace to a 1–2k token narrative with key spans flagged.
The debugger LLM consumes the summary + code context and suggests hypotheses and probes to add.
Generate a patch and a trace-derived unit/integration test that recreates the failure.
Run replay in CI; if green, proceed to human review.

Guardrails:

Never grant write access to prod via AI; all actions go through CI/CD and replay tests.
Keep model prompts and outputs as artifacts for auditability.

Storage, Indexing, and Cost Control

Columnar storage (Parquet) yields high compression and efficient slicing by time/PID/trace-id.
Use zstd compression with dictionaries learned from your schemas.
Maintain a tiny hot index (RocksDB/ClickHouse) mapping trace-id -> object offsets.
Compact event payloads: prefer enums over strings, delta-encode timestamps, bit-pack flags.
Retention tiers: hot (indexed 7–14 days), warm (object store with partial index), cold (tape or glacier). Keep a one-click rehydrate path.

Tooling You Can Leverage Today

eBPF-based:
- Cilium Tetragon: security and tracing with policies; good base for syscall/futex visibility.
- Pixie: high-level application visibility; can guide where to add deeper probes.
- Parca/Parca Agent: continuous profiling; integrate profiles with traces.
- bpftrace: incident response and prototyping.
- Sysdig Falco/eCapture: syscall security hooks; shows practical patterns for filtering.
Record/replay:
- rr + Pernosco: great for deep-dive, single-process, high-fidelity analysis.
- Undo LiveRecorder/UDB: commercial time-travel with multicore support.
- CRIU: process snapshots; requires kernel support and careful testing.
Debugging and symbolization:
- elfutils, libdw, BTF, build-id machinery; store debuginfo artifacts per build.

Security and Compliance Considerations

Threat model: treat trace data as sensitive, even if redacted. Encrypt in transit (mTLS) and at rest (per-tenant keys). Guard access with short-lived credentials and immutable audit logs.
Data minimization: deny-by-default capture; explicit allowlists for structured fields; never capture full payloads by default.
DSAR and right-to-be-forgotten: per-tenant keys allow cryptographic erasure.
Policy as code: version control redaction and probe policies; include automated tests and approvals.

The Future: Hardware-Assisted Determinism

Intel Processor Trace (PT) and ARM CoreSight provide low-overhead control-flow traces; combine with eBPF for data edges and with VM snapshots for surgical reproductions.
kTLS and NIC offloads: capture at the right abstraction boundary to reflect what the app sees (plaintext) without holding secrets longer than needed.
Portable WASI sandboxes for partial replays of components.

Conclusion: Make Reproduction Boring

The path out of “can’t repro” is to stop guessing and start recording. eBPF gives you safe, low-overhead visibility without source changes; record/replay turns one-off failures into deterministic harnesses; on-host redaction keeps data safe; and an AI loop built on top can propose and verify fixes quickly.

This isn’t magic. It’s careful engineering of capture, minimization, and deterministic replay—with the bonus that once you have the traces, tools and people get smarter together. The result: fewer log-spam redeploys, faster root cause analysis, and bugs that stay fixed because they’re backed by tests that recreate reality.

Appendix: Event Schema Sketch

json
{
  "event_version": 1,
  "trace_id_hash": "e9b1...",
  "service": "catalog",
  "build_id": "ab12cd34",
  "container": "k8s://ns/pod",
  "events": [
    {"ts_ns": 1234567890, "type": "sys_enter", "nr": 0, "pid": 123, "tid": 123, "args": [3,0,0], "redacted": true},
    {"ts_ns": 1234567990, "type": "futex_wait", "addr": "0x7f...", "val": 2},
    {"ts_ns": 1234568990, "type": "uprobe", "symbol": "service.Handle", "file": "handler.go", "line": 214},
    {"ts_ns": 1234570990, "type": "fault", "signal": "SIGSEGV", "rip": "0x55...", "stack": ["..."], "core_id": "s3://.../core.gz"}
  ],
  "drops": {"events": 42, "reason": "backpressure"},
  "redaction_policy": "v3-hmac-rotating"
}

Appendix: Minimal LD_PRELOAD Shim for Time/Random

c
// shim.c - preload into replay sandbox
#define _GNU_SOURCE
#include <dlfcn.h>
#include <time.h>
#include <sys/random.h>
#include <stdint.h>

static int (*real_getrandom)(void*, size_t, unsigned) = 0;
static int (*real_clock_gettime)(clockid_t, struct timespec*) = 0;

__attribute__((constructor))
static void init(void) {
  real_getrandom = dlsym(RTLD_NEXT, "getrandom");
  real_clock_gettime = dlsym(RTLD_NEXT, "clock_gettime");
}

static int fetch_recorded_bytes(void* buf, size_t len) {
  // Read from a FIFO/socket prepared by the replayer
  // ... omitted for brevity ...
  return (int)len;
}

int getrandom(void *buf, size_t buflen, unsigned flags) {
  return fetch_recorded_bytes(buf, buflen);
}

int clock_gettime(clockid_t clk_id, struct timespec *tp) {
  // Read recorded timespec from the replayer
  // ... omitted ...
  tp->tv_sec = 1700000000; tp->tv_nsec = 123456789;
  return 0;
}

With these building blocks, you can construct a deterministic, privacy-preserving, and AI-friendly debugging surface that turns elusive production glitches into repeatable, testable fixes.