Beads Memory for AI Coding Agents: An Architecture that Automates PM in Developer Workflows
TL;DR: A small, git-backed issue tracker with explicit dependency edges beats markdown plans, vector-only memory, and brittle prompt chaining for long-horizon AI coding. Beads turns project management duties into a first-class, queryable memory substrate. Agents stop forgetting, stop hand-waving, and start shipping.
- What it is: A tiny, repo-local, git-backed database represented as JSONL with a CLI called
bd. - What it solves: LLM session amnesia, multi-agent coordination, and perma-lost TODOs discovered during execution.
- Why it works: A temporal dependency graph (“beads on a chain”) gives agents a stable, structured, long-horizon memory with query semantics.
- Why it matters: It automates the PM loop—triage, scoping, prioritization, handoffs—directly from inside the developer workflow.
This article explains the architecture, the design trade-offs, and why the beads-style, temporal-graph memory model feels like the missing primitive for agentic coding. It also includes practical snippets to integrate Beads with your agents today.
The Real Problem: Agents Don’t Plan Across Time
If you rely on Claude Code, Sourcegraph Amp, OpenAI’s assistants, MCP tool stacks, or homegrown dev agents, you’ve seen it:
- Great at single-session sprints, shaky at multi-day, multi-phase delivery.
- Markdown plans proliferate into six-deep Russian dolls, then rot.
- After compaction/restart, agents re-discover the same tasks and declare victory at phase 3-of-6 because that’s all they can “see.”
In practice, an agent’s memory is what’s on disk plus whatever fits in context right now. The moment you need nested workstreams, stacked blockers, or feature-to-bug-to-cleanup detours, the markdown-plan approach collapses. It’s the Memento problem: every morning the plan is new again.
The experiment that triggered Beads was simple: move the plan into an issue tracker and give agents a way to query “ready work.” Within minutes, the behavior shifted from meandering to disciplined: compute the ready set, pick a task, work it, record discovered work, repeat. No hero prompts, no brittle chains.
What Went Wrong With “Master Plans” and Heavy Orchestrators
Two instructive dead ends are worth calling out:
-
Heavy orchestration for desktop dev tools. Systems like Temporal are remarkable for large-scale workflows, but for single-developer desktops or small swarms, they impose weight, operational surface area, and cognitive tax that dwarfs the benefits. The orchestration became the product.
-
Markdown master plans. A beautiful idea—hierarchical files under git, with agents expanding and updating as they go. In practice, the plan multiplied, fractured, conflicted, and became unqueryable. Agents cannot reliably interpret free-form text to compute a global dependency graph or an actionable queue. The plan turned into write-only memory.
The insight behind Beads is not “let’s do Jira.” It’s: modeling work as a temporal dependency graph inside the repo is the simplest possible memory construct that aligns with how LLMs actually operate.
Beads, Defined
Beads is a minimal issue tracker designed for agents:
- Storage: a JSONL file (or small set of files) in your repo, versioned by git. Each issue is an object, each update appends an event. Think of it as a tiny, versioned, append-only log.
- Schema: issues have IDs, titles, status, priority, labels, parent/child relationships (epics), and explicit dependency edges: blocks/blocked_by, and a crucial discovered_from edge.
- CLI:
bdprovides discovery, triage, linking, status changes, and queries in both human and JSON output modes. - Distribution: it’s naturally distributed via git. Multiple agents can coordinate across machines/repos without a centralized server.
This gives agents four primitives they don’t get from markdown:
- a) Explicit, queryable dependencies,
- b) Ready-set computation (what’s unblocked and actionable),
- c) Durable session continuity (no re-prompting the entire plan),
- d) An audit trail aligned with the code’s version history.
Why “Temporal Graph” Memory?
Beads is a temporal graph because it encodes not just structural dependencies (A blocks B) but the causal sequence by which work was discovered and executed. The discovered_from relations let agents reconstruct how the work unfolded over time and preserve context that would otherwise be lost. This becomes a living narrative the agents can query, not a brittle plan that must be reread and reinterpreted.
In effect, Beads gives LLMs what they’re missing:
- Working memory across sessions (via queries),
- Long-term memory via versioned state in git,
- A programmatic plan that survives compaction and restarts.
Why It Outperforms Vector-Only Memory and Prompt Chaining
- Vector stores excel at information recall, not plan execution. Cosine similarity won’t tell you which task is unblocked or next-in-line, nor will it compute a topological order through dependencies.
- Prompt chaining treats the plan as ephemeral. Each link in the chain must be carefully managed. Drift accumulates and small missteps compound into dead plans or loops.
- Beads inverts this: the plan lives as data. Prompts are short and stable because agents query the plan instead of carrying it in context. The system becomes robust to resets and concurrent workers.
This doesn’t replace vectors. You still want RAG for code search/docs. But vectors store facts; Beads stores commitments. Facts help you decide what’s true; commitments help you decide what to do.
The Beads Schema (Practical Core)
A representative issue record might look like this:
json{ "id": "bd-142", "title": "Refactor auth middleware to support service tokens", "status": "open", "priority": 0.72, "assignee": "agent/claude", "labels": ["auth", "refactor"], "parent_id": "bd-101", "blocks": ["bd-188"], "blocked_by": ["bd-91"], "discovered_from": "bd-90", "created_at": "2025-10-08T19:52:11Z", "updated_at": "2025-10-08T20:05:27Z", "events": [ {"ts": "2025-10-08T19:52:11Z", "actor": "agent/claude", "type": "created"}, {"ts": "2025-10-08T20:04:00Z", "actor": "agent/claude", "type": "linked", "edge": "blocked_by", "to": "bd-91"}, {"ts": "2025-10-08T20:05:27Z", "actor": "agent/claude", "type": "status", "from": "open", "to": "in_progress"} ] }
Key design choices:
- parent_id encodes epic/subtask structure without forcing a global tree rewrite.
- blocks/blocked_by edges represent ordering constraints.
- discovered_from preserves causal history: when an issue emerges while doing another, link it.
- events form an append-only audit trail for status and structure changes.
The BD CLI exposes these as first-class operations. Even better, it prints machine-readable JSON so agents don’t need to parse prose.
The Ready Set: From Graph to Action
The engine behind Beads is trivial and powerful: compute the ready set from the dependency graph.
- A task is ready if status ∈ {open, todo} and it has no open blocked_by edges.
- Among ready tasks, prioritize by a scoring function (priority, recency, epic rank, discovered-from recency, assignee availability, etc.).
- Agents claim a task (status → in_progress), work it, and either:
- mark done and close, or
- discover more work, file it, link it, and possibly requeue the parent.
Simple pseudocode:
python# Pseudocode used by the agent at each session start ready = bd.query_ready(json=True) # returns list of tasks with metadata work = select_best(ready) # your scoring function bd.update(work.id, status="in_progress", assignee=current_agent) # ... implement change ... if discovered: new_id = bd.new(title, labels=[...], discovered_from=work.id) bd.link(edge="blocked_by", src=work.id, dst=new_id) # work now depends on the discovered task bd.update(work.id, status="done")
This is why Beads feels like an external working memory. The agent no longer needs the entire plan in context. It queries the plan, takes a bead, and moves forward.
Installing the Habit: A Minimal Agent Integration
Add one instruction to your agent’s configuration (AGENTS.md or CLAUDE.md):
md- Always initialize and use Beads: 1) Run: bd quickstart (once per repo) 2) Start each session with: bd ready --json to get the next unblocked tasks 3) When you discover new work, run: bd new and link it via discovered_from 4) Update status as you go: open -> in_progress -> done
Give the agent permission to invoke these commands and interpret JSON output. That’s often enough to flip a project from “constant herding” to “self-propelling.”
Real CLI Examples
- Initialize:
bashbd quickstart
- Create an epic and a subtask:
bashbd new --title "Revamp test harness" --label testing --priority 0.9 --id bd-500 bd new --title "Migrate flaky e2e tests to Playwright" --parent bd-500 --label e2e
- Link and query ready work:
bashbd link --edge blocked_by --src bd-501 --dst bd-490 # bd-501 is blocked by bd-490 bd ready --json | jq '.[:5]' # top 5 ready items for the session
- Record discovery during execution:
bashbd new --title "Fix auth token refresh bug" --label bug --discovered-from bd-501
- Claim, work, complete:
bashbd update --id bd-501 --status in_progress --assignee agent/claude # ... run changes, tests ... bd update --id bd-501 --status done
Multi-Agent Coordination That Actually Works
Because the database is just git-tracked JSONL, concurrency looks like normal development:
- Each agent works on a branch; commits include both code and beads updates.
- If two agents create the same ID (rare with ULIDs/monotonic IDs), conflicts are resolved with a simple renumber and edge rewire (which an LLM can do reliably because the schema is explicit).
- Merge conflicts are line-level and semantic: the agent can reason about status transitions and choose the correct resolution.
To keep the system healthy:
- Use ULIDs for ids (or repo-prefixed IDs) to avoid collisions across repos.
- Add a TTL for in_progress; stale claims revert to open.
- Emit heartbeats as events; a pre-commit hook can warn on orphaned in_progress tasks.
Temporal-Graph Memory vs. GitHub Issues/Jira
Why not just use GitHub Issues or Jira?
- Latency and friction. Agents need to query and update at high frequency. Repo-local JSONL under git keeps the loop tight and offline-capable.
- Semantics. Beads bakes in edges (blocked_by, blocks, parent/child, discovered_from) and a JSON-first CLI for agents. No HTML forms, no rate limits, no formatting ambiguity.
- Versioned alongside code. It’s the same commit, same branch, same PR. The work memory is code-adjacent, not SaaS-adjacent.
This doesn’t preclude sync to cloud trackers for reporting. But the authoritative memory for agent planning should live where the agent lives: in the repo.
Automating PM, Not Just Planning
Beads quietly eats the PM loop:
- Discovery: agents create issues the moment they see them, linked via discovered_from to preserve context.
- Triage: simple rules (labels + priority + epic rank) drive the scoring function that selects the next bead.
- Scheduling: edges compute a ready set. No one needs to “remember” the Gantt chart.
- Handoffs: one agent closes a bead; the next bd ready picks the follow-on. Audit trails preserve who did what and when.
- Risk control: block merges on unresolved blockers; require all discovered-from chains to terminate cleanly for a feature to be “done.”
You can even wire CI to enforce this:
bash# In CI: block PR merge if any tasks discovered-from the PR's epic remain open if bd query --epic "$EPIC" --open-only | grep -q "."; then echo "Open work remains; failing check."; exit 1 fi
A Minimal Scheduling Algorithm That Works
You don’t need a fancy scheduler to get value. A pragmatic heuristic beats perfect optimization:
- Compute all ready tasks (no open blocked_by).
- Score = w1priority + w2recent_discovery_bonus + w3epic_rank + w4(-age_penalty) + w5*label_fit(agent)
- Pick the top N that match the agent’s capabilities; claim the first.
This keeps flow moving and surfaces fresh discoveries quickly while paying down old work in the background. It’s easy to tune, transparent to humans, and agents can explain the decision because it’s data-driven.
Cross-Repo Handoffs and Coordinated Swarms
For multi-repo work:
- Treat beads URIs as first-class: repo://org/name#bd-123.
- Allow edges across repos. A frontend bead can be blocked_by a backend bead in another repo.
- Sync via submodules or a shared beads registry repo (still JSONL + git), or mirror edges in each repo with canonical IDs.
In practice, even simple conventions work: prefix IDs with repo slug and keep the edges in the primary repo that “owns” the feature epic.
From Vibe Coding to Sustainable Flow
The story behind Beads is instructive. Months of pushing markdown master plans gave way to 600+ decaying files and frequent agent amnesia. A short “what if we moved all known work to an issue tracker?” experiment immediately stabilized the loop. The switch from prose plans to an agent-native data model was the inflection.
Once agents operate on a temporal graph:
- They don’t quietly drop discovered work. It’s filed and linked.
- They don’t overclaim completion. Open blockers are visible.
- They don’t require babysitting between sessions.
bd ready --jsonrehydrates context instantly.
Diagnostics and Metrics You Can Track Today
- Cycle time per bead: created → done
- Flow efficiency: active time / (active + blocked)
- WIP by epic or label: count of in_progress tasks
- Blocked ratio: blocked / total open
- Discovery rate: new issues per unit time; discovered_from chain lengths
- Ready queue depth: how much unblocked work exists right now
These become operational levers. For example, if discovery outruns closure, create a “stabilization” epic and bias the scoring function toward closing discovered-from chains.
Guardrails and Pitfalls (and How to Handle Them)
- Destructive agent actions. Prevent accidental deletion of the beads file with filesystem permissions,
.gitattributeslocks, and pre-commit checks. Consider a cron job that snapshots beads separately. - ID collisions. Use ULIDs or repo-prefixed IDs. On conflict, auto-renumber and rewrite edges in a single commit the agent can explain.
- Stale in_progress tasks. TTL with auto-revert to open; a background job can enforce this.
- Conflicting edits. Because events append, merging is usually safe. Teach the agent to resolve status transitions with a simple precedence lattice (done > in_progress > open, unless a blocker remains open).
- Privacy and secrets. Keep beads files free of secrets. Treat as code.
Suggested Importers and Interop
- Import TODO.md: parse headings as epics, bullets as tasks; infer blocked_by from “blocked on #ID” phrases; initialize priorities from tags.
- Mirror to GitHub/Jira: a one-way exporter for charts and stakeholder visibility.
- Backfill from commit history: heuristics to create beads discovered_from chains from PR descriptions and references.
Example: A Lightweight Agent Loop in Go
Below is a simplified loop that illustrates how an agent can operate with Beads as its external memory:
go// Pseudocode; assumes `bd` available on PATH func nextReady() []Issue { out := run("bd", "ready", "--json") return parseIssues(out) } func claim(id, who string) { run("bd", "update", "--id", id, "--status", "in_progress", "--assignee", who) } func discovered(parentID, title string, labels []string) string { args := []string{"new", "--title", title, "--discovered-from", parentID} for _, l := range labels { args = append(args, "--label", l) } out := run("bd", args...) return parseNewID(out) } func complete(id string) { run("bd", "update", "--id", id, "--status", "done") } func main() { who := os.Getenv("AGENT_NAME") ready := nextReady() if len(ready) == 0 { fmt.Println("No ready work"); return } task := selectBest(ready) claim(task.ID, who) // ... perform code changes/tests based on task ... // If we discovered follow-on work: // newID := discovered(task.ID, "Fix flaky token refresh", []string{"bug", "auth"}) // run("bd", "link", "--edge", "blocked_by", "--src", task.ID, "--dst", newID) complete(task.ID) }
The prompts can be tiny because the plan is data, not text.
Why This Feels Like a Primitive, Not a Tool
Beads is not a monolith—it’s a minimal contract:
- A small, explicit schema that encodes how work flows.
- A CLI that returns JSON so agents can reason without scraping prose.
- A storage model that piggybacks on the tools developers already use (git).
It’s the simplest thing that turns PM from an unstructured prompt problem into a structured data problem. And it does so in the same repo, with the same review and merge semantics we use for code.
Limitations and What’s Next
- Scaling to massive portfolios: JSONL in a single file will hit limits. Shard by epic, use content-addressed chunks, or adopt a CRDT log. The core abstraction survives.
- Rich queries: today’s
bdqueries are enough for agents; dashboards for humans may want a secondary index or SQLite mirror. - Cross-repo edges: formalize repo-scoped IDs and URI edges; add "follow edges across repos" queries.
- Policy: encode org rules (“no open discovered-from under a done epic”) and let agents explain violations before merge.
None of these undermine the core thesis: temporal-graph memory inside the repo is the right substrate.
A Simple Protocol for Your Own Evaluation
Run this bakeoff in a real repo:
- Baseline (markdown + vectors):
- For a medium feature (4–8 steps), measure total compactions/restarts, rediscovered work incidents, and human interventions.
- Beads:
- Migrate the plan into beads, keep RAG for code/docs, add the 4-line AGENTS.md policy.
- Metrics: issues created, discovered_from chain lengths, blocked time ratio, cycle time per bead, handoff count without human intervention.
Success looks like: fewer “where were we?” messages, fewer phantom completions, lower rework, and visibly coherent discovered-from chains in the audit log.
Opinionated Take
Vector memory and clever prompts made agent coding viable; a temporal-graph work memory makes it sustainable. Plans in prose were a romantic detour. The right data model for long-horizon agentic work is not a document—it’s a graph whose edges encode order, cause, and ownership, versioned with the code it changes.
Beads is small, but it reframes the problem: stop asking LLMs to remember a plan. Give them a plan they can query.
Getting Started Checklist
- Install the CLI and run
bd quickstartin your repo. - Add a short rule to AGENTS.md/CLAUDE.md instructing agents to:
- query
bd ready --jsonon session start, - claim before they code,
- file discovered work with
discovered_from, - and close the bead when done.
- query
- Optionally, wire a CI check that fails merges if blockers remain.
- Migrate your TODO.md by asking your agent to import and link.
In most teams, this is a 30-minute change that yields outsized gains in continuity, coordination, and trust.
Beads turns agent planning into data. Once you see agents pick up a bead, do the work, log their discoveries, and hand off to the next ready bead—all without you pasting a plan back into the chat—you’ll wonder why we ever tried to store the plan in prompts.
