Designing APIs for LLM Agents in 2025: Least-Privilege Scopes, Dry-Run/Undo, and Idempotent Tool Calling

LLM agents are moving from toy demos to production systems. They file tickets, open PRs, schedule meetings, send emails, manipulate datasets, and perform financial operations. The risk profile is obvious: a single bad tool call can do real damage. The opportunity is equally obvious: when APIs are designed for agentic usage, you get faster development, safer automation, and a better developer experience for both humans and AI assistants.

This article proposes a concrete set of API and SDK patterns that make agent use safe by default and operationally tractable. The focus is on:

Capability-scoped tokens (least-privilege)
JSON Schema contracts for inputs/outputs
Dry-run/preview and two-phase commit
Idempotency keys and safe retries
Reversible operations and compensations (undo)
Rate limits that are agent-friendly
Human-in-the-loop (HITL) safeguards
Practical integration with OpenAI tool calling and MCP

I will be opinionated: if you are exposing a write-capable API that agents will use, you should implement idempotency, previews, and scope-constrained credentials before GA. These features are not nice-to-haves; they are the difference between “we tried agents and it was scary” and “agents improved our MTTR and velocity without waking SREs at 3am.”

1) Design Principles for Agent-Safe APIs

Fail safe, not silent: Prefer hard, explicit errors over implicit partial success. Agents iterate; they need crisp feedback.
Scopes over roles: Grant capabilities at the API surface via fine-grained scopes; roles are too coarse for autonomous workflows.
Two-phase for destructive operations: Make irreversible operations hard to trigger; use preview-then-apply or grace windows.
Idempotency by default: Every non-GET write path should be idempotent for at least the duration of your retry horizon.
Deterministic contracts: Provide JSON Schema with strict validation and predictable error objects (e.g., RFC 7807 Problem Details).
Observability and audit: Treat agents as first-class actors with traceability, quotas, and audit trails.
Human-in-the-loop when needed: Offer built-in approval flows for sensitive scopes or high-risk parameter combinations.

2) Capability-Scoped Tokens (Least-Privilege by Construction)

Agents should rarely hold a user s full token. Instead, issue capability-scoped, time-limited credentials that:

Express allowed operations and resource filters (e.g., only a project, folder, or dataset prefix)
Constrain rate, financial or side-effect limits (e.g., per-day cost ceiling)
Are bound to the client (DPoP/MTLS) to reduce replay risk
Expire quickly and require refresh via a controlled token exchange

Example: A JWT with explicit capability claims (illustrative only):

json
{
  "iss": "https://auth.example.com",
  "sub": "agent:cal-assistant-42",
  "aud": "api.example.com",
  "exp": 1735776000,
  "scp": [
    "calendar.events.read",
    "calendar.events.create",
    "files.write"
  ],
  "resource_filters": {
    "calendar_ids": ["cal_123"],
    "files_path_prefix": "/agents/cal-assistant-42/"
  },
  "constraints": {
    "max_events_per_hour": 20,
    "max_file_size_mb": 10,
    "financial_cap_usd": 0
  },
  "act": { "actor_type": "agent", "actor_id": "cal-assistant-42" },
  "cnf": { "jkt": "<JWK-thumbprint-for-DPoP>" }
}

Notes:

Use OAuth 2.0 Token Exchange or a service-to-service minting flow to derive these capabilities from a user s consented grants. OAuth 2.1 and RAR (Rich Authorization Requests) are useful to encode fine-grained intent.
Bind tokens to the client using DPoP or MTLS to limit exfiltration blast radius.
Clearly document your scope registry. Scopes should map to specific API methods and resource selectors.

SDK recommendation: Provide a "scope narrowing" helper that lets developers request the smallest set of capabilities for a specific task, so agents do not inherit broader human privileges.

3) JSON Schema Contracts for Tools and APIs

LLM agents thrive on determinism. They will fill any input fields you expose, so your job is to expose only the right ones, with unambiguous types and constraints.

Provide JSON Schema for every tool input and for the primary response
Use enums, formats (e.g., date-time, email), minLength/maxLength, regex patterns, and oneOf/anyOf judiciously
Set additionalProperties: false to prevent "creative" keys from slipping through
Return RFC 7807 Problem Details for errors: consistent type, title, detail, and instance

Example JSON Schema for a scheduling tool:

json
{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$id": "https://api.example.com/schemas/schedule_meeting.json",
  "title": "ScheduleMeetingRequest",
  "type": "object",
  "additionalProperties": false,
  "required": ["title", "start_time", "end_time", "attendees"],
  "properties": {
    "title": { "type": "string", "minLength": 3, "maxLength": 140 },
    "start_time": { "type": "string", "format": "date-time" },
    "end_time": { "type": "string", "format": "date-time" },
    "attendees": {
      "type": "array",
      "minItems": 1,
      "maxItems": 20,
      "items": { "type": "string", "format": "email" }
    },
    "location": { "type": "string", "maxLength": 200 },
    "agenda": { "type": "string", "maxLength": 2000 },
    "dry_run": { "type": "boolean", "default": true },
    "notification": { "type": "string", "enum": ["none", "email", "calendar"] }
  }
}

A matching error object (Problem Details):

json
{
  "type": "https://api.example.com/problems/validation-error",
  "title": "Invalid request",
  "status": 400,
  "detail": "end_time must be after start_time",
  "instance": "/v1/meetings/requests/req_789",
  "errors": [
    { "path": "/end_time", "message": "must be after start_time" }
  ]
}

Strictness is a feature. If you re using OpenAI tool calling or MCP, you can embed this schema directly to help the model reliably fill fields.

4) Dry-Run/Preview and Two-Phase Apply

A preview endpoint or a dry_run flag prevents accidental side effects while enabling agents to reason about intended changes.

Patterns:

POST /v1/resource?action=preview (or dry_run=true) returns a preview object with a preview_id, a diff/plan, and a cost/impact estimate
POST /v1/resource?action=apply&preview_id=... commits an existing preview, optionally after approval
Previews should expire, be immutable, and log the actor and inputs used to compute them

Example flow:

http
POST /v1/meetings
Authorization: Bearer eyJ...
Content-Type: application/json
Idempotency-Key: 2c7b613f-b8c0-4bb6-a30f-2e7fbeb8c5f5

{
  "title": "Design Review",
  "start_time": "2025-09-15T16:00:00Z",
  "end_time": "2025-09-15T17:00:00Z",
  "attendees": ["a@example.com", "b@example.com"],
  "dry_run": true
}

Response:

json
{
  "preview_id": "prev_9JkLwZ",
  "would_create": {
    "event": {
      "calendar_id": "cal_123",
      "title": "Design Review",
      "start_time": "2025-09-15T16:00:00Z",
      "end_time": "2025-09-15T17:00:00Z",
      "attendees": ["a@example.com", "b@example.com"]
    }
  },
  "conflicts": [
    {
      "attendee": "b@example.com",
      "conflict_event": {
        "id": "evt_456",
        "start_time": "2025-09-15T16:30:00Z",
        "end_time": "2025-09-15T17:30:00Z"
      }
    }
  ],
  "cost_estimate": { "api_calls": 1, "calendar_writes": 1 },
  "expires_at": "2025-09-15T15:00:00Z"
}

Commit:

http
POST /v1/meetings/apply
Authorization: Bearer eyJ...
Content-Type: application/json
Idempotency-Key: 2c7b613f-b8c0-4bb6-a30f-2e7fbeb8c5f5

{ "preview_id": "prev_9JkLwZ" }

This two-phase approach enables human approval, gives the agent a plan to reason about, and centralizes side-effect checks.

Implementation tips:

Previews should run all validation and conflict detection; the apply step should reference an immutable plan
Consider a plan hash in the commit request to detect drift between preview and apply
Surface policy blocks early (e.g., scheduling outside working hours) to prompt the agent to adjust inputs

5) Idempotency Keys and Exactly-Once Semantics (Within a Window)

Agents retry. Networks flake. Without idempotency, you ll accidentally double-send emails, double-charge cards, or create duplicate events.

Accept an Idempotency-Key header on all non-GET operations
The key should scope to the request method + path + authenticated principal; store the first response for a TTL and return it for duplicates
If a second request arrives with the same key but different body (or different preview_id), return a clear error (409 Conflict) with details

Example:

http
POST /v1/charges
Idempotency-Key: 9f9d1b0e-2e93-45f9-9b22-12a20e6c0f68
Content-Type: application/json

{ "amount": 500, "currency": "USD", "source": "tok_abc" }

Server stores the result keyed by (principal, method, path, idempotency_key) and returns the stored result for any repeat attempts.

Operational guidance:

TTL should exceed your maximum client retry horizon (e.g., 24-72 hours)
Persist a stable result object and status code; include a header like Idempotent-Replayed: true on replays
Log the key and the X-Request-Id for correlation
Document if your idempotency window is bounded by other resources (e.g., order ID uniqueness)

References: Stripe popularized the Idempotency-Key pattern; the IETF has standardized guidance on the Idempotency-Key HTTP header. Adopt it.

6) Reversible Operations and Compensations (Undo)

Not all operations can be undone, but many can be compensated if you plan for it.

Patterns:

Provide a dedicated undo endpoint for reversible ops with a window (e.g., cancel email send within 30s if unread)
For complex workflows, use a Saga-style approach: each step defines a compensating action
Return a reversal_token on creation, with expires_at, and make it cheap to call

Example response from a send-email API:

json
{
  "message_id": "msg_7weK9M",
  "status": "queued",
  "reversal_token": "rev_Z3VhcmQtaXQtY2xvc2U=",
  "reversal_expires_at": "2025-09-15T12:00:45Z"
}

Undo:

http
POST /v1/messages/undo
Content-Type: application/json

{ "reversal_token": "rev_Z3VhcmQtaXQtY2xvc2U=" }

Design notes:

Undo should be idempotent and safe to retry
Always indicate whether the reversal succeeded, failed, or was no longer possible (with a clear reason)
Where instant undo is impossible (e.g., settlement finalized), surface compensations (refund, correction entry)

7) Rate Limits and Concurrency Controls Built for Agents

Agents can be bursty. Rate limits that work for humans may be hostile to automated planners. Offer:

Token-bucket limits by scope and resource owner; return 429 with Retry-After when depleted
Concurrency caps for side-effecting endpoints; return a clear X-Concurrency-Limit header
Transparent usage introspection endpoints so agents can plan (e.g., GET /v1/limits reports remaining budget per scope)

Headers worth adopting:

Retry-After: <seconds> on 429/503
X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset (common practice)
X-Request-Id for trace correlation

Server-side considerations:

Distinguish between soft limits (burst) and hard limits (daily/monthly quotas)
Don t punish idempotent retries; if a request is a replay, return the cached result rather than charging the rate bucket twice
Offer a "low-latency rejection" path when the request cannot be honored due to limits so agents can adapt quickly

8) Human-in-the-Loop Safeguards

HITL creates a safety valve for sensitive scopes and high-risk contexts.

Mechanisms:

Approval workflows: A creation request transitions from pending to approved/denied; agents can poll or receive webhooks
Consent escalation: If an agent tries to use a new capability, require an explicit user grant
Out-of-band confirmations: For a payment or data deletion, send a summary for human confirmation
"Policy-as-data": Express rules (time windows, recipient allowlists, cost ceilings) that the preview step can evaluate and return as structured reasons for hold/deny

Example preview with a policy hold:

json
{
  "preview_id": "prev_1abc2",
  "would_create": { "event": { "title": "Off-hours maintenance" } },
  "policy": {
    "status": "requires_approval",
    "rules_triggered": [
      {
        "id": "after_hours_block",
        "severity": "high",
        "message": "Events outside 08:00-18:00 require approval"
      }
    ],
    "approval": {
      "request_id": "apr_77KD",
      "approvers": ["teamlead@example.com"],
      "expires_at": "2025-09-15T15:00:00Z"
    }
  }
}

SDK design: Provide built-in helpers to block on approval with appropriate timeouts and to surface an actionable summary to the end user.

9) Observability, Audit, and Abuse-Resistance

Treat agents as named, auditable actors.

Emit structured audit logs with: actor id, scopes, request body hash, idempotency key, decision (allow/deny), and result
Provide per-agent analytics: success rate, mean/95th latency, undo ratio, preview-to-apply ratio
Offer a "shadow mode" to simulate side effects and collect safety signals before enabling apply
Redact secrets and personal data at the edge; enable privacy-preserving logs (e.g., format-preserving tokenization)

Abuse-resistance:

Bind tokens to client proof (DPoP/MTLS) and rotate keys frequently
Anomaly detection: Alert on sudden spike in destructive ops or policy holds
Envelope encryption for stored preview/plan artifacts

10) OpenAI Tool Calling: Contracts That Nudge the Model Toward Safety

OpenAI s tool/function calling works best when the tool schemas are strict and the descriptions encode expectations like "preview first" and idempotency.

Example tool definition (JSON) emphasizing dry-run and idempotency:

json
{
  "type": "function",
  "function": {
    "name": "schedule_meeting",
    "description": "Preview-then-apply scheduling. Always call with dry_run=true first to obtain a preview_id, then call with dry_run=false and preview_id to commit.",
    "parameters": {
      "type": "object",
      "additionalProperties": false,
      "required": ["title", "start_time", "end_time", "attendees", "dry_run"],
      "properties": {
        "title": { "type": "string", "minLength": 3 },
        "start_time": { "type": "string", "format": "date-time" },
        "end_time": { "type": "string", "format": "date-time" },
        "attendees": {
          "type": "array",
          "minItems": 1,
          "items": { "type": "string", "format": "email" }
        },
        "dry_run": { "type": "boolean" },
        "preview_id": { "type": "string", "description": "Required when dry_run=false" }
      }
    }
  }
}

Tool handler pseudocode (Node.js/TypeScript):

ts
import crypto from 'node:crypto';

async function schedule_meeting(args: {
  title: string;
  start_time: string;
  end_time: string;
  attendees: string[];
  dry_run: boolean;
  preview_id?: string;
}) {
  if (args.dry_run) {
    const previewId = 'prev_' + crypto.randomUUID();
    const conflicts = await findConflicts(args);
    return {
      preview_id: previewId,
      would_create: { ...args, attendees: args.attendees },
      conflicts,
      expires_at: new Date(Date.now() + 15 * 60_000).toISOString()
    };
  } else {
    if (!args.preview_id) throw new Error('preview_id required to commit');
    // Validate preview exists and not expired; check plan hash if used
    const idempotencyKey = currentRequestHeaders()['Idempotency-Key'];
    const result = await applyWithIdempotency('POST', '/v1/meetings/apply', idempotencyKey, { preview_id: args.preview_id });
    return result;
  }
}

Runtime prompts should reiterate the safety behavior: "Always preview before applying. If conflicts exist, ask for alternates rather than forcing an apply." Describe acceptable ranges and costs; the model will respect clear constraints.

11) MCP (Model Context Protocol): Tooling with Strong Schemas

MCP (Model Context Protocol) enables connecting models to tools with typed metadata. Register tools with explicit JSON Schema parameters and return types. The same dry-run, idempotency, and undo patterns apply.

Expose one tool for preview and one for apply (or a single tool with a dry_run flag)
Mark sensitive tools with a requires_approval capability so the MCP host can gate execution
Emit structured events back to the MCP host with policy holds, conflicts, and reversal tokens

Example MCP tool registration snippet (conceptual):

json
{
  "name": "schedule_meeting",
  "description": "Preview-then-apply scheduling with conflict detection.",
  "input_schema": { "$ref": "https://api.example.com/schemas/schedule_meeting.json" },
  "output_schema": {
    "type": "object",
    "properties": {
      "preview_id": { "type": "string" },
      "event_id": { "type": "string" },
      "conflicts": { "type": "array", "items": { "type": "object" } },
      "reversal_token": { "type": "string" }
    }
  },
  "capabilities_required": ["calendar.events.create"],
  "safety": {
    "requires_preview": true,
    "undo_window_seconds": 300
  }
}

12) SDK Design: Make the Safe Path the Easy Path

SDKs can encode the safety contract, so application developers don t have to remember it.

Default to dry_run=true; require an explicit commit() to apply
Require Idempotency-Key for non-GET calls; generate one automatically if missing and surface it
Add dangerous wrappers that require a reason string or policy check to proceed
Provide high-level flows that combine preview, HITL approval, and apply with timeouts and structured retries

Example fluent SDK (TypeScript):

ts
const client = new AgentSafeClient({ token: scopedToken });

const plan = await client.meetings
  .create({ title, start_time, end_time, attendees })
  .preview();

if (plan.conflicts.length) {
  // Ask the model to adjust or prompt the user
}

await client.meetings
  .apply(plan)
  .withIdempotency()
  .requireApprovalIf(plan.policy?.status === 'requires_approval');

The idea: encode the policy graph into the SDK so the most common path is also the safest path.

13) Testing and Simulation: Agent Fuzzing and Shadow Mode

Agents will compose tools in ways you did not foresee. Testing should include:

Contract tests for every tool schema with property-based generation (valid and invalid cases)
Idempotency tests that simulate retries, timeouts, and network failures
Saga/undo tests that verify compensations are offered and idempotent
Shadow mode: run the agent against a sandbox where apply is disabled and only previews are stored and analyzed
Telemetry assertions: ensure X-Request-Id, Idempotency-Key, and audit entries are present for each call

Automation idea: a "red team" agent that intentionally probes edge cases (large arrays, boundary times, limit exhaustion) and ensures the API returns deterministic, actionable errors.

14) Versioning, Deprecation, and Backward Compatibility

Agents are brittle to breaking changes.

Version your schemas with $id and your endpoints with /v1, /v2; never change meaning without a new version
Offer a capabilities discovery endpoint so agents can adapt (e.g., GET /v1/.well-known/agent-capabilities)
Deprecation policy: soft-deprecate with telemetry warnings and Deprecation headers; provide upgrade guides with mapping tables

15) End-to-End Example: Calendar + Docs Agent

Goal: The agent schedules a meeting and uploads an agenda doc.

Flow:

Agent discovers capabilities via .well-known/agent-capabilities and receives scoped token: calendar.events.create, files.write limited to /agents/123/ path.
Agent calls schedule_meeting with dry_run=true and receives preview_id plus a conflict for one attendee.
Agent proposes a new time and re-previews until conflicts clear.
Agent requests approval (policy requires manager approval for meetings > 10 attendees). The system sends an approval request; the agent polls or receives a webhook.
Upon approval, the agent calls apply with the prior preview_id and an Idempotency-Key. Server creates the event and returns a reversal_token valid for 5 minutes.
Agent uploads the agenda with Idempotency-Key and receives the file URL.
Rate limits constrain the burst of indexing requests; the API returns 429 with Retry-After: 2. The agent respects the header and retries.
A user flags an error; the agent calls undo with the reversal token, which succeeds because the window is still open.

Success criteria:

No duplicate events despite retries
Preview-to-apply ratio near 1.5 (some iterations expected)
All operations tied to the agent actor with audit trails
Human approved the high-risk step

16) Implementation Checklist

Security and auth

Capability-scoped tokens with resource filters and constraints
Token binding (DPoP/MTLS) and short expiry
OAuth Token Exchange or RAR for fine-grained grants

Contracts and validation

JSON Schema for inputs and outputs; additionalProperties: false
RFC 7807 Problem Details for errors
Strong format validators (date-time, email, uri)

Safety controls

Dry-run/preview with preview_id, plan hash, and expiry
Idempotency keys on all non-GET endpoints
Undo endpoints or compensating actions for reversible ops
Policy engine integration with structured reasons

Operations

Rate limits with clear headers and introspection endpoint
Concurrency caps for side-effect endpoints
Observability: request ids, audit logs, metric dashboards
Shadow mode and agent fuzz tests

SDK

Default to preview; explicit commit required
Automatic idempotency key generation and propagation
Approval helpers and safety wrappers
Typed models from schemas

17) Opinions That Will Save You Time

If your API can be called by an LLM, it will eventually be called with partial or malformed inputs. Invest in schemas and validation early; it pays for itself the first time the model improvises an extra field.
Dry-run is the single highest ROI feature for agent safety. It aligns human review, policy checks, and agent planning.
Idempotency keys reduce pages, duplicates, and user-visible weirdness. Add them before you ship write endpoints.
Reversal tokens are pragmatic. Even a 30-second window saves you from a class of embarrassing incidents.
Don t outsource safety to the model. Put safety in the protocol and the server. Models comply when the interface is clear.

18) Minimal Reference Snippets

HTTP Problem Details (RFC 7807):

json
{
  "type": "https://api.example.com/problems/rate-limit",
  "title": "Rate limit exceeded",
  "status": 429,
  "detail": "Burst limit of 10 requests per second exceeded for scope calendar.events.create",
  "instance": "urn:request:77d3b1",
  "retry_after": 2
}

Rate limit headers:

http
HTTP/1.1 429 Too Many Requests
Retry-After: 2
X-RateLimit-Limit: 10
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1726500002

Idempotent replay header on response:

http
HTTP/1.1 200 OK
Idempotent-Replayed: true
X-Request-Id: req_4kH2

19) Integration Tips for OpenAI Function Calling

Avoid ambiguous enums like "high/medium/low" without clear semantics; prefer numeric ranges and explicit units.
Use compact property names but include rich descriptions; the model reads descriptions.
Provide negative examples in the tool description: "Do not send emails to more than 5 recipients without prior approval; preview will surface a policy hold." The model respects strong prohibitions.
Return structured conflicts and alternatives; the model can choose among them.
For long-running operations, return a job id and a status polling tool; don t hold the call open.

Example prompt snippet guiding the model:

text
You are an operations-safe assistant. Follow this protocol:
1) Always call schedule_meeting with dry_run=true first.
2) If conflicts exist, propose alternatives or ask for guidance.
3) Only call apply with a preview_id and an Idempotency-Key.
4) If a policy hold is returned, request approval and wait for confirmation.
5) Use undo if a user cancels within the reversal window.

20) Closing Thoughts

Agent-safe APIs are not a speculative bet; they are a synthesis of well-understood distributed systems patterns applied to a new client: autonomous code that explores the space of possible actions. Least-privilege scopes fence in the blast radius. JSON Schema teaches the agent what is allowed. Dry-run and two-phase apply reduce risk while enabling planning. Idempotency and undo make retries safe. Rate limits, approvals, and audit trails match operational reality.

If you implement only three things before shipping to agents: idempotency keys, preview/apply, and capability-scoped tokens. Everything else builds on those foundations.