7 Agent Architecture Patterns That Actually Ship (With C...

Why these seven

Every team building AI agents in 2026 reinvents the same load-bearing patterns. We've studied multiple production agent codebases — open-source tools, leaked internals, and systems we run ourselves — and the convergence is striking. The same seven structural decisions keep appearing, regardless of whether the team uses TypeScript, Python, or Rust.

These aren't framework features. They're architecture patterns. Here's what they are, why they work, and what breaks when you skip them.

Pattern 1: The mega-prompt as application logic

The single most counterintuitive pattern in production agents: the system prompt encodes the majority of behavioral logic, not the application code.

Tool definitions, output formatting rules, safety constraints, editorial voice, file-handling strategies — all live in the prompt. The orchestration layer is comparatively thin: receive user input, call the model, execute requested tools, loop.

``` // Pseudocode: the core loop is ~30 lines while (!done) { const response = await model.chat(messages, { tools, systemPrompt }); for (const toolCall of response.toolCalls) { const result = await executeTool(toolCall); messages.push({ role: 'tool', content: result }); } if (response.stopReason === 'end_turn') done = true; } ```

The system prompt, meanwhile, runs to thousands of tokens and reads like an RFC. This inverts the traditional instinct to keep prompts short and write elaborate routers. If your agent's prompt is under 500 tokens and your orchestration code is over 500 lines, you've probably put behavior in the wrong place.

The failure mode this prevents: brittle conditional logic that breaks when the model's capabilities change between versions. Declarative prompts adapt; imperative routers don't.

Pattern 2: Diff-based file editing

Naive agents overwrite entire files. Production agents apply surgical edits.

The pattern: instead of generating a complete file, the agent produces a search-and-replace operation — an `old_string` to match and a `new_string` to substitute. The tool validates that `old_string` appears exactly once (preventing ambiguous edits), then applies the replacement.

``` // Tool definition shape { name: 'edit', params: { file_path: string, old_string: string, // must be unique in file new_string: string, replace_all?: boolean } } ```

This single constraint — uniqueness validation on the match string — eliminates an entire class of agent errors. The model learns to include enough surrounding context to make matches unambiguous. When it fails, it fails loudly instead of silently corrupting a file.

The failure mode this prevents: the agent confidently overwrites a 200-line file, losing the 180 lines it didn't need to touch. Every team that starts with whole-file writes migrates to diffs within a month.

Pattern 3: Context compaction under token pressure

Long-running agent sessions hit the context window ceiling. The naive solution — truncate from the front — destroys critical early context (the user's original request, key file contents). The production pattern: summarize and compact mid-conversation.

When remaining token budget drops below a threshold (typically 20-30% of the window), the agent triggers a compaction pass. An LLM call summarizes the conversation so far, preserving key decisions, file states, and the original task. The compacted summary replaces the full history.

``` if (remainingTokens < window * 0.2) { const summary = await model.chat([ { role: 'system', content: 'Summarize this conversation preserving all key context...' }, ...messages ]); messages = [systemPrompt, { role: 'assistant', content: summary }]; } ```

The failure mode this prevents: the agent "forgets" what it was doing 40 messages ago and starts contradicting earlier work, or simply crashes when the context overflows.

Pattern 4: Parallel tool dispatch

Sequential tool execution is the easiest bottleneck to miss. When the model requests multiple independent operations — say, reading three files — a production agent dispatches them concurrently.

``` const results = await Promise.all( toolCalls .filter(tc => !hasDependency(tc, toolCalls)) .map(tc => executeTool(tc)) ); ```

The dependency check matters. File reads can parallelize. But if one tool call creates a file and another reads it, sequential execution is required. Most implementations use a simple heuristic: read-only tools parallelize; write tools serialize.

The failure mode this prevents: an agent that takes 45 seconds to read 10 files when it could take 5. At interactive latencies, this is the difference between usable and abandoned.

Pattern 5: Zero-trust tool permissions

Every tool call is untrusted by default. The permission model typically defines three tiers:

1. Allow-listed tools — read-only operations that auto-execute (file reads, searches) 2. Prompt-required tools — destructive operations needing user confirmation (file writes, shell commands) 3. Forbidden tools — operations the agent can never perform in the current context

``` const PERMISSION_TIERS = { allow: ['read', 'grep', 'glob', 'web_search'], prompt: ['edit', 'write', 'bash'], deny: ['bash:rm -rf', 'bash:git push --force'] }; ```

The critical nuance: permissions are checked on the resolved arguments, not just the tool name. A `bash` tool might be prompt-tier in general, but `bash("rm -rf /")` should be in the deny tier regardless of what the user approved. Pattern-matching on arguments catches the long tail of dangerous operations.

The failure mode this prevents: the agent autonomously runs `git push --force` to main because the model thought it was helpful. This is not hypothetical.

Pattern 6: Retry with circuit breakers

External API calls fail. The production pattern layers three mechanisms:

1. Immediate retry with exponential backoff (2-3 attempts) 2. Circuit breaker that trips after N consecutive failures, preventing a flood of doomed requests 3. Graceful degradation to an alternative provider

``` class CircuitBreaker { constructor(threshold = 3, cooldownMs = 300000) { /* ... */ } async call(fn) { if (this.state === 'OPEN') { if (Date.now() - this.lastFailure > this.cooldownMs) { this.state = 'HALF_OPEN'; // allow one test request } else { throw new Error('Circuit open'); } } try { const result = await fn(); this.reset(); return result; } catch (err) { this.failures++; if (this.failures >= this.threshold) this.state = 'OPEN'; throw err; } } } ```

The recent trending discourse around agents burning $50K in API costs underscores why this matters. An agent without circuit breakers is a runaway billing event waiting to happen. The breaker pattern caps your blast radius.

The failure mode this prevents: a rate-limited API returning 429s triggers 10,000 retries, each adding latency and cost, while the user stares at a spinner.

Pattern 7: Provider fallback chains

No single AI provider has 100% uptime. Production agents define an ordered chain of providers and cascade through them:

``` const PROVIDERS = [ { name: 'anthropic_api', client: anthropicClient }, { name: 'cli_subprocess', spawn: cliFallback }, { name: 'openai_compat', client: openaiClient } ];

async function callModel(messages) { for (const provider of PROVIDERS) { try { return await provider.call(messages); } catch (err) { log.warn(`${provider.name} failed, trying next`, err); } } throw new Error('All providers exhausted'); } ```

The subtlety: different providers have different capabilities, token limits, and output formats. The fallback chain needs normalization at the boundary so downstream code doesn't branch on provider identity.

The failure mode this prevents: your agent goes down because one provider has an outage, even though three alternatives exist.

The meta-pattern

Zoom out and these seven patterns share a common insight: production agents are 20% intelligence and 80% error handling. The model call is one line. The retry logic, permission checks, context management, and graceful degradation are everything else.

If you're building an agent and the happy path works but failures cascade unpredictably, you're missing at least three of these patterns. Start with permissions (Pattern 5) and circuit breakers (Pattern 6) — those are the ones that prevent the headlines about agents going rogue or burning budgets.

The orchestration loop is boring. That's the point. Boring infrastructure is reliable infrastructure.

7 Agent Architecture Patterns That Actually Ship (With Code)

// tldr

// viewpoints

// deep dive

Why these seven

Pattern 1: The mega-prompt as application logic

Pattern 2: Diff-based file editing

Pattern 3: Context compaction under token pressure

Pattern 4: Parallel tool dispatch

Pattern 5: Zero-trust tool permissions

Pattern 6: Retry with circuit breakers

Pattern 7: Provider fallback chains

The meta-pattern

// read from source

sanbuphy/claude-code-source-code: Claude Code v2.1.88 Source Code

NanmiCoder/claude-code-haha: Claude Code leaked source - locally runnable version

tvytlx/claude-code-deep-dive: Claude Code 源码深度研究报告

oboard/claude-code-rev: Runnable ClaudeCode source code

lintsinghua/claude-code-book: 42万字拆解 AI Agent 的Harness骨架与神经 —— Claude Code 架构深度剖析，15 章从对话循环到构建你自己的 Agent Harness。在线阅读网站：

Reading leaked Claude Code source code

7 Agent Architecture Patterns That Actually Ship (With Code)

// tldr

// viewpoints

// deep dive

Why these seven

Pattern 1: The mega-prompt as application logic

Pattern 2: Diff-based file editing

Pattern 3: Context compaction under token pressure

Pattern 4: Parallel tool dispatch

Pattern 5: Zero-trust tool permissions

Pattern 6: Retry with circuit breakers

Pattern 7: Provider fallback chains

The meta-pattern

// read from source

sanbuphy/claude-code-source-code: Claude Code v2.1.88 Source Code

NanmiCoder/claude-code-haha: Claude Code leaked source - locally runnable version

tvytlx/claude-code-deep-dive: Claude Code 源码深度研究报告

oboard/claude-code-rev: Runnable ClaudeCode source code

lintsinghua/claude-code-book: 42万字拆解 AI Agent 的Harness骨架与神经 —— Claude Code 架构深度剖析，15 章从对话循环到构建你自己的 Agent Harness。在线阅读网站：

Reading leaked Claude Code source code

// share this