Beads Memory for AI Coding Agents: An Architecture that Automates PM in Developer Workflows

TL;DR: A small, git-backed issue tracker with explicit dependency edges beats markdown plans, vector-only memory, and brittle prompt chaining for long-horizon AI coding. Beads turns project management duties into a first-class, queryable memory substrate. Agents stop forgetting, stop hand-waving, and start shipping.

What it is: A tiny, repo-local, git-backed database represented as JSONL with a CLI called bd.
What it solves: LLM session amnesia, multi-agent coordination, and perma-lost TODOs discovered during execution.
Why it works: A temporal dependency graph (“beads on a chain”) gives agents a stable, structured, long-horizon memory with query semantics.
Why it matters: It automates the PM loop—triage, scoping, prioritization, handoffs—directly from inside the developer workflow.

This article explains the architecture, the design trade-offs, and why the beads-style, temporal-graph memory model feels like the missing primitive for agentic coding. It also includes practical snippets to integrate Beads with your agents today.

The Real Problem: Agents Don’t Plan Across Time

If you rely on Claude Code, Sourcegraph Amp, OpenAI’s assistants, MCP tool stacks, or homegrown dev agents, you’ve seen it:

Great at single-session sprints, shaky at multi-day, multi-phase delivery.
Markdown plans proliferate into six-deep Russian dolls, then rot.
After compaction/restart, agents re-discover the same tasks and declare victory at phase 3-of-6 because that’s all they can “see.”

In practice, an agent’s memory is what’s on disk plus whatever fits in context right now. The moment you need nested workstreams, stacked blockers, or feature-to-bug-to-cleanup detours, the markdown-plan approach collapses. It’s the Memento problem: every morning the plan is new again.

The experiment that triggered Beads was simple: move the plan into an issue tracker and give agents a way to query “ready work.” Within minutes, the behavior shifted from meandering to disciplined: compute the ready set, pick a task, work it, record discovered work, repeat. No hero prompts, no brittle chains.

What Went Wrong With “Master Plans” and Heavy Orchestrators

Two instructive dead ends are worth calling out:

Heavy orchestration for desktop dev tools. Systems like Temporal are remarkable for large-scale workflows, but for single-developer desktops or small swarms, they impose weight, operational surface area, and cognitive tax that dwarfs the benefits. The orchestration became the product.
Markdown master plans. A beautiful idea—hierarchical files under git, with agents expanding and updating as they go. In practice, the plan multiplied, fractured, conflicted, and became unqueryable. Agents cannot reliably interpret free-form text to compute a global dependency graph or an actionable queue. The plan turned into write-only memory.

The insight behind Beads is not “let’s do Jira.” It’s: modeling work as a temporal dependency graph inside the repo is the simplest possible memory construct that aligns with how LLMs actually operate.

Beads, Defined

Beads is a minimal issue tracker designed for agents:

Storage: a JSONL file (or small set of files) in your repo, versioned by git. Each issue is an object, each update appends an event. Think of it as a tiny, versioned, append-only log.
Schema: issues have IDs, titles, status, priority, labels, parent/child relationships (epics), and explicit dependency edges: blocks/blocked_by, and a crucial discovered_from edge.
CLI: bd provides discovery, triage, linking, status changes, and queries in both human and JSON output modes.
Distribution: it’s naturally distributed via git. Multiple agents can coordinate across machines/repos without a centralized server.

This gives agents four primitives they don’t get from markdown:

a) Explicit, queryable dependencies,
b) Ready-set computation (what’s unblocked and actionable),
c) Durable session continuity (no re-prompting the entire plan),
d) An audit trail aligned with the code’s version history.

Why “Temporal Graph” Memory?

Beads is a temporal graph because it encodes not just structural dependencies (A blocks B) but the causal sequence by which work was discovered and executed. The discovered_from relations let agents reconstruct how the work unfolded over time and preserve context that would otherwise be lost. This becomes a living narrative the agents can query, not a brittle plan that must be reread and reinterpreted.

In effect, Beads gives LLMs what they’re missing:

Working memory across sessions (via queries),
Long-term memory via versioned state in git,
A programmatic plan that survives compaction and restarts.

Why It Outperforms Vector-Only Memory and Prompt Chaining

Vector stores excel at information recall, not plan execution. Cosine similarity won’t tell you which task is unblocked or next-in-line, nor will it compute a topological order through dependencies.
Prompt chaining treats the plan as ephemeral. Each link in the chain must be carefully managed. Drift accumulates and small missteps compound into dead plans or loops.
Beads inverts this: the plan lives as data. Prompts are short and stable because agents query the plan instead of carrying it in context. The system becomes robust to resets and concurrent workers.

This doesn’t replace vectors. You still want RAG for code search/docs. But vectors store facts; Beads stores commitments. Facts help you decide what’s true; commitments help you decide what to do.

The Beads Schema (Practical Core)

A representative issue record might look like this:

json
{
  "id": "bd-142",
  "title": "Refactor auth middleware to support service tokens",
  "status": "open",                       
  "priority": 0.72,                        
  "assignee": "agent/claude",
  "labels": ["auth", "refactor"],
  "parent_id": "bd-101",                  
  "blocks": ["bd-188"],                   
  "blocked_by": ["bd-91"],                
  "discovered_from": "bd-90",            
  "created_at": "2025-10-08T19:52:11Z",
  "updated_at": "2025-10-08T20:05:27Z",
  "events": [
    {"ts": "2025-10-08T19:52:11Z", "actor": "agent/claude", "type": "created"},
    {"ts": "2025-10-08T20:04:00Z", "actor": "agent/claude", "type": "linked", "edge": "blocked_by", "to": "bd-91"},
    {"ts": "2025-10-08T20:05:27Z", "actor": "agent/claude", "type": "status", "from": "open", "to": "in_progress"}
  ]
}

Key design choices:

parent_id encodes epic/subtask structure without forcing a global tree rewrite.
blocks/blocked_by edges represent ordering constraints.
discovered_from preserves causal history: when an issue emerges while doing another, link it.
events form an append-only audit trail for status and structure changes.

The BD CLI exposes these as first-class operations. Even better, it prints machine-readable JSON so agents don’t need to parse prose.

The Ready Set: From Graph to Action

The engine behind Beads is trivial and powerful: compute the ready set from the dependency graph.

A task is ready if status ∈ {open, todo} and it has no open blocked_by edges.
Among ready tasks, prioritize by a scoring function (priority, recency, epic rank, discovered-from recency, assignee availability, etc.).
Agents claim a task (status → in_progress), work it, and either:
- mark done and close, or
- discover more work, file it, link it, and possibly requeue the parent.

Simple pseudocode:

python
# Pseudocode used by the agent at each session start
ready = bd.query_ready(json=True)  # returns list of tasks with metadata
work = select_best(ready)          # your scoring function
bd.update(work.id, status="in_progress", assignee=current_agent)

# ... implement change ...

if discovered:
    new_id = bd.new(title, labels=[...], discovered_from=work.id)
    bd.link(edge="blocked_by", src=work.id, dst=new_id)  # work now depends on the discovered task

bd.update(work.id, status="done")

This is why Beads feels like an external working memory. The agent no longer needs the entire plan in context. It queries the plan, takes a bead, and moves forward.

Installing the Habit: A Minimal Agent Integration

Add one instruction to your agent’s configuration (AGENTS.md or CLAUDE.md):

md
- Always initialize and use Beads:
  1) Run: bd quickstart (once per repo)
  2) Start each session with: bd ready --json to get the next unblocked tasks
  3) When you discover new work, run: bd new and link it via discovered_from
  4) Update status as you go: open -> in_progress -> done

Give the agent permission to invoke these commands and interpret JSON output. That’s often enough to flip a project from “constant herding” to “self-propelling.”

Real CLI Examples

Initialize:

bash
bd quickstart

Create an epic and a subtask:

bash
bd new --title "Revamp test harness" --label testing --priority 0.9 --id bd-500
bd new --title "Migrate flaky e2e tests to Playwright" --parent bd-500 --label e2e

Link and query ready work:

bash
bd link --edge blocked_by --src bd-501 --dst bd-490   # bd-501 is blocked by bd-490
bd ready --json | jq '.[:5]'                          # top 5 ready items for the session

Record discovery during execution:

bash
bd new --title "Fix auth token refresh bug" --label bug --discovered-from bd-501

Claim, work, complete:

bash
bd update --id bd-501 --status in_progress --assignee agent/claude
# ... run changes, tests ...
bd update --id bd-501 --status done

Multi-Agent Coordination That Actually Works

Because the database is just git-tracked JSONL, concurrency looks like normal development:

Each agent works on a branch; commits include both code and beads updates.
If two agents create the same ID (rare with ULIDs/monotonic IDs), conflicts are resolved with a simple renumber and edge rewire (which an LLM can do reliably because the schema is explicit).
Merge conflicts are line-level and semantic: the agent can reason about status transitions and choose the correct resolution.

To keep the system healthy:

Use ULIDs for ids (or repo-prefixed IDs) to avoid collisions across repos.
Add a TTL for in_progress; stale claims revert to open.
Emit heartbeats as events; a pre-commit hook can warn on orphaned in_progress tasks.

Temporal-Graph Memory vs. GitHub Issues/Jira

Why not just use GitHub Issues or Jira?

Latency and friction. Agents need to query and update at high frequency. Repo-local JSONL under git keeps the loop tight and offline-capable.
Semantics. Beads bakes in edges (blocked_by, blocks, parent/child, discovered_from) and a JSON-first CLI for agents. No HTML forms, no rate limits, no formatting ambiguity.
Versioned alongside code. It’s the same commit, same branch, same PR. The work memory is code-adjacent, not SaaS-adjacent.

This doesn’t preclude sync to cloud trackers for reporting. But the authoritative memory for agent planning should live where the agent lives: in the repo.

Automating PM, Not Just Planning

Beads quietly eats the PM loop:

Discovery: agents create issues the moment they see them, linked via discovered_from to preserve context.
Triage: simple rules (labels + priority + epic rank) drive the scoring function that selects the next bead.
Scheduling: edges compute a ready set. No one needs to “remember” the Gantt chart.
Handoffs: one agent closes a bead; the next bd ready picks the follow-on. Audit trails preserve who did what and when.
Risk control: block merges on unresolved blockers; require all discovered-from chains to terminate cleanly for a feature to be “done.”

You can even wire CI to enforce this:

bash
# In CI: block PR merge if any tasks discovered-from the PR's epic remain open
if bd query --epic "$EPIC" --open-only | grep -q "."; then
  echo "Open work remains; failing check."; exit 1
fi

A Minimal Scheduling Algorithm That Works

You don’t need a fancy scheduler to get value. A pragmatic heuristic beats perfect optimization:

Compute all ready tasks (no open blocked_by).
Score = w1priority + w2recent_discovery_bonus + w3epic_rank + w4(-age_penalty) + w5*label_fit(agent)
Pick the top N that match the agent’s capabilities; claim the first.

This keeps flow moving and surfaces fresh discoveries quickly while paying down old work in the background. It’s easy to tune, transparent to humans, and agents can explain the decision because it’s data-driven.

Cross-Repo Handoffs and Coordinated Swarms

For multi-repo work:

Treat beads URIs as first-class: repo://org/name#bd-123.
Allow edges across repos. A frontend bead can be blocked_by a backend bead in another repo.
Sync via submodules or a shared beads registry repo (still JSONL + git), or mirror edges in each repo with canonical IDs.

In practice, even simple conventions work: prefix IDs with repo slug and keep the edges in the primary repo that “owns” the feature epic.

From Vibe Coding to Sustainable Flow

The story behind Beads is instructive. Months of pushing markdown master plans gave way to 600+ decaying files and frequent agent amnesia. A short “what if we moved all known work to an issue tracker?” experiment immediately stabilized the loop. The switch from prose plans to an agent-native data model was the inflection.

Once agents operate on a temporal graph:

They don’t quietly drop discovered work. It’s filed and linked.
They don’t overclaim completion. Open blockers are visible.
They don’t require babysitting between sessions. bd ready --json rehydrates context instantly.

Diagnostics and Metrics You Can Track Today

Cycle time per bead: created → done
Flow efficiency: active time / (active + blocked)
WIP by epic or label: count of in_progress tasks
Blocked ratio: blocked / total open
Discovery rate: new issues per unit time; discovered_from chain lengths
Ready queue depth: how much unblocked work exists right now

These become operational levers. For example, if discovery outruns closure, create a “stabilization” epic and bias the scoring function toward closing discovered-from chains.

Guardrails and Pitfalls (and How to Handle Them)

Destructive agent actions. Prevent accidental deletion of the beads file with filesystem permissions, .gitattributes locks, and pre-commit checks. Consider a cron job that snapshots beads separately.
ID collisions. Use ULIDs or repo-prefixed IDs. On conflict, auto-renumber and rewrite edges in a single commit the agent can explain.
Stale in_progress tasks. TTL with auto-revert to open; a background job can enforce this.
Conflicting edits. Because events append, merging is usually safe. Teach the agent to resolve status transitions with a simple precedence lattice (done > in_progress > open, unless a blocker remains open).
Privacy and secrets. Keep beads files free of secrets. Treat as code.

Suggested Importers and Interop

Import TODO.md: parse headings as epics, bullets as tasks; infer blocked_by from “blocked on #ID” phrases; initialize priorities from tags.
Mirror to GitHub/Jira: a one-way exporter for charts and stakeholder visibility.
Backfill from commit history: heuristics to create beads discovered_from chains from PR descriptions and references.

Example: A Lightweight Agent Loop in Go

Below is a simplified loop that illustrates how an agent can operate with Beads as its external memory:

go
// Pseudocode; assumes `bd` available on PATH
func nextReady() []Issue {
    out := run("bd", "ready", "--json")
    return parseIssues(out)
}

func claim(id, who string) {
    run("bd", "update", "--id", id, "--status", "in_progress", "--assignee", who)
}

func discovered(parentID, title string, labels []string) string {
    args := []string{"new", "--title", title, "--discovered-from", parentID}
    for _, l := range labels { args = append(args, "--label", l) }
    out := run("bd", args...)
    return parseNewID(out)
}

func complete(id string) {
    run("bd", "update", "--id", id, "--status", "done")
}

func main() {
    who := os.Getenv("AGENT_NAME")
    ready := nextReady()
    if len(ready) == 0 { fmt.Println("No ready work"); return }
    task := selectBest(ready)
    claim(task.ID, who)

    // ... perform code changes/tests based on task ...

    // If we discovered follow-on work:
    // newID := discovered(task.ID, "Fix flaky token refresh", []string{"bug", "auth"})
    // run("bd", "link", "--edge", "blocked_by", "--src", task.ID, "--dst", newID)

    complete(task.ID)
}

The prompts can be tiny because the plan is data, not text.

Why This Feels Like a Primitive, Not a Tool

Beads is not a monolith—it’s a minimal contract:

A small, explicit schema that encodes how work flows.
A CLI that returns JSON so agents can reason without scraping prose.
A storage model that piggybacks on the tools developers already use (git).

It’s the simplest thing that turns PM from an unstructured prompt problem into a structured data problem. And it does so in the same repo, with the same review and merge semantics we use for code.

Limitations and What’s Next

Scaling to massive portfolios: JSONL in a single file will hit limits. Shard by epic, use content-addressed chunks, or adopt a CRDT log. The core abstraction survives.
Rich queries: today’s bd queries are enough for agents; dashboards for humans may want a secondary index or SQLite mirror.
Cross-repo edges: formalize repo-scoped IDs and URI edges; add "follow edges across repos" queries.
Policy: encode org rules (“no open discovered-from under a done epic”) and let agents explain violations before merge.

None of these undermine the core thesis: temporal-graph memory inside the repo is the right substrate.

A Simple Protocol for Your Own Evaluation

Run this bakeoff in a real repo:

Baseline (markdown + vectors):
- For a medium feature (4–8 steps), measure total compactions/restarts, rediscovered work incidents, and human interventions.
Beads:
- Migrate the plan into beads, keep RAG for code/docs, add the 4-line AGENTS.md policy.
- Metrics: issues created, discovered_from chain lengths, blocked time ratio, cycle time per bead, handoff count without human intervention.

Success looks like: fewer “where were we?” messages, fewer phantom completions, lower rework, and visibly coherent discovered-from chains in the audit log.

Opinionated Take

Vector memory and clever prompts made agent coding viable; a temporal-graph work memory makes it sustainable. Plans in prose were a romantic detour. The right data model for long-horizon agentic work is not a document—it’s a graph whose edges encode order, cause, and ownership, versioned with the code it changes.

Beads is small, but it reframes the problem: stop asking LLMs to remember a plan. Give them a plan they can query.

Getting Started Checklist

Install the CLI and run bd quickstart in your repo.
Add a short rule to AGENTS.md/CLAUDE.md instructing agents to:
- query bd ready --json on session start,
- claim before they code,
- file discovered work with discovered_from,
- and close the bead when done.
Optionally, wire a CI check that fails merges if blockers remain.
Migrate your TODO.md by asking your agent to import and link.

In most teams, this is a 30-minute change that yields outsized gains in continuity, coordination, and trust.

Beads turns agent planning into data. Once you see agents pick up a bead, do the work, log their discoveries, and hand off to the next ready bead—all without you pasting a plan back into the chat—you’ll wonder why we ever tried to store the plan in prompts.