7 Agent Architecture Patterns That Actually Ship (With Code)

6 min read 6 sources explainer
├── "Claude Code's source reveals reusable production agent architecture patterns that every team independently converges on"
│  ├── top10.dev editorial (top10.dev) → read below

The editorial argues that studying Claude Code and similar production agent codebases reveals seven recurring structural patterns — mega-prompts as application logic, diff-based editing, thin orchestration loops — that teams keep reinventing independently. These aren't framework features but architecture decisions that break systems when skipped.

│  ├── lintsinghua (GitHub, 1543 pts) → read

Published a 420,000-character deep architectural analysis spanning 15 chapters, from the conversation loop to building your own agent harness. The depth and structure treat Claude Code as a canonical reference architecture worth systematic study, not just curiosity.

│  ├── tvytlx (GitHub, 4782 pts) → read

Created a dedicated deep-dive research report on Claude Code's source, framing it as worthy of serious architectural study. The project's popularity (4,700+ stars) suggests strong community demand for understanding the internal patterns as transferable engineering knowledge.

│  └── lr0.org (Dev Blog, 81 pts) → read

Wrote a technical blog post reading through the leaked source code, analyzing it as a practitioner studying production architecture decisions. The post treats the source as an educational artifact for understanding how commercial agents are actually built.

├── "The primary value is making Claude Code runnable locally — practical access matters more than architectural study"
│  ├── oboard (GitHub, 2559 pts) → read

Explicitly brands their project as 'Runnable ClaudeCode source code,' prioritizing executable access over analysis. The focus is on getting the tool working locally rather than extracting abstract architecture patterns from it.

│  └── NanmiCoder (GitHub, 4215 pts) → read

Positions their repo as a 'locally runnable version' of the leaked source, emphasizing practical usability. With over 4,200 stars, the demand signals that many developers want to run and modify the tool directly rather than just study its patterns.

└── "Raw source preservation and archival is itself valuable, regardless of analysis or runnability"
  └── sanbuphy (GitHub, 11069 pts) → read

Published the raw v2.1.88 source code with over 11,000 stars — by far the most popular repo in the set — with minimal framing beyond version-pinned archival. The massive engagement suggests the community values having access to the unmodified source as a primary artifact, before any analysis layer is applied.

Why these seven

Every team building AI agents in 2026 reinvents the same load-bearing patterns. We've studied multiple production agent codebases — open-source tools, leaked internals, and systems we run ourselves — and the convergence is striking. The same seven structural decisions keep appearing, regardless of whether the team uses TypeScript, Python, or Rust.

These aren't framework features. They're architecture patterns. Here's what they are, why they work, and what breaks when you skip them.

Pattern 1: The mega-prompt as application logic

The single most counterintuitive pattern in production agents: the system prompt encodes the majority of behavioral logic, not the application code.

Tool definitions, output formatting rules, safety constraints, editorial voice, file-handling strategies — all live in the prompt. The orchestration layer is comparatively thin: receive user input, call the model, execute requested tools, loop.

``` // Pseudocode: the core loop is ~30 lines while (!done) { const response = await model.chat(messages, { tools, systemPrompt }); for (const toolCall of response.toolCalls) { const result = await executeTool(toolCall); messages.push({ role: 'tool', content: result }); } if (response.stopReason === 'end_turn') done = true; } ```

The system prompt, meanwhile, runs to thousands of tokens and reads like an RFC. This inverts the traditional instinct to keep prompts short and write elaborate routers. If your agent's prompt is under 500 tokens and your orchestration code is over 500 lines, you've probably put behavior in the wrong place.

The failure mode this prevents: brittle conditional logic that breaks when the model's capabilities change between versions. Declarative prompts adapt; imperative routers don't.

Pattern 2: Diff-based file editing

Naive agents overwrite entire files. Production agents apply surgical edits.

The pattern: instead of generating a complete file, the agent produces a search-and-replace operation — an `old_string` to match and a `new_string` to substitute. The tool validates that `old_string` appears exactly once (preventing ambiguous edits), then applies the replacement.

``` // Tool definition shape { name: 'edit', params: { file_path: string, old_string: string, // must be unique in file new_string: string, replace_all?: boolean } } ```

This single constraint — uniqueness validation on the match string — eliminates an entire class of agent errors. The model learns to include enough surrounding context to make matches unambiguous. When it fails, it fails loudly instead of silently corrupting a file.

The failure mode this prevents: the agent confidently overwrites a 200-line file, losing the 180 lines it didn't need to touch. Every team that starts with whole-file writes migrates to diffs within a month.

Pattern 3: Context compaction under token pressure

Long-running agent sessions hit the context window ceiling. The naive solution — truncate from the front — destroys critical early context (the user's original request, key file contents). The production pattern: summarize and compact mid-conversation.

When remaining token budget drops below a threshold (typically 20-30% of the window), the agent triggers a compaction pass. An LLM call summarizes the conversation so far, preserving key decisions, file states, and the original task. The compacted summary replaces the full history.

``` if (remainingTokens < window * 0.2) { const summary = await model.chat([ { role: 'system', content: 'Summarize this conversation preserving all key context...' }, ...messages ]); messages = [systemPrompt, { role: 'assistant', content: summary }]; } ```

The failure mode this prevents: the agent "forgets" what it was doing 40 messages ago and starts contradicting earlier work, or simply crashes when the context overflows.

Pattern 4: Parallel tool dispatch

Sequential tool execution is the easiest bottleneck to miss. When the model requests multiple independent operations — say, reading three files — a production agent dispatches them concurrently.

``` const results = await Promise.all( toolCalls .filter(tc => !hasDependency(tc, toolCalls)) .map(tc => executeTool(tc)) ); ```

The dependency check matters. File reads can parallelize. But if one tool call creates a file and another reads it, sequential execution is required. Most implementations use a simple heuristic: read-only tools parallelize; write tools serialize.

The failure mode this prevents: an agent that takes 45 seconds to read 10 files when it could take 5. At interactive latencies, this is the difference between usable and abandoned.

Pattern 5: Zero-trust tool permissions

Every tool call is untrusted by default. The permission model typically defines three tiers:

1. Allow-listed tools — read-only operations that auto-execute (file reads, searches) 2. Prompt-required tools — destructive operations needing user confirmation (file writes, shell commands) 3. Forbidden tools — operations the agent can never perform in the current context

``` const PERMISSION_TIERS = { allow: ['read', 'grep', 'glob', 'web_search'], prompt: ['edit', 'write', 'bash'], deny: ['bash:rm -rf', 'bash:git push --force'] }; ```

The critical nuance: permissions are checked on the resolved arguments, not just the tool name. A `bash` tool might be prompt-tier in general, but `bash("rm -rf /")` should be in the deny tier regardless of what the user approved. Pattern-matching on arguments catches the long tail of dangerous operations.

The failure mode this prevents: the agent autonomously runs `git push --force` to main because the model thought it was helpful. This is not hypothetical.

Pattern 6: Retry with circuit breakers

External API calls fail. The production pattern layers three mechanisms:

1. Immediate retry with exponential backoff (2-3 attempts) 2. Circuit breaker that trips after N consecutive failures, preventing a flood of doomed requests 3. Graceful degradation to an alternative provider

``` class CircuitBreaker { constructor(threshold = 3, cooldownMs = 300000) { /* ... */ } async call(fn) { if (this.state === 'OPEN') { if (Date.now() - this.lastFailure > this.cooldownMs) { this.state = 'HALF_OPEN'; // allow one test request } else { throw new Error('Circuit open'); } } try { const result = await fn(); this.reset(); return result; } catch (err) { this.failures++; if (this.failures >= this.threshold) this.state = 'OPEN'; throw err; } } } ```

The recent trending discourse around agents burning $50K in API costs underscores why this matters. An agent without circuit breakers is a runaway billing event waiting to happen. The breaker pattern caps your blast radius.

The failure mode this prevents: a rate-limited API returning 429s triggers 10,000 retries, each adding latency and cost, while the user stares at a spinner.

Pattern 7: Provider fallback chains

No single AI provider has 100% uptime. Production agents define an ordered chain of providers and cascade through them:

``` const PROVIDERS = [ { name: 'anthropic_api', client: anthropicClient }, { name: 'cli_subprocess', spawn: cliFallback }, { name: 'openai_compat', client: openaiClient } ];

async function callModel(messages) { for (const provider of PROVIDERS) { try { return await provider.call(messages); } catch (err) { log.warn(`${provider.name} failed, trying next`, err); } } throw new Error('All providers exhausted'); } ```

The subtlety: different providers have different capabilities, token limits, and output formats. The fallback chain needs normalization at the boundary so downstream code doesn't branch on provider identity.

The failure mode this prevents: your agent goes down because one provider has an outage, even though three alternatives exist.

The meta-pattern

Zoom out and these seven patterns share a common insight: production agents are 20% intelligence and 80% error handling. The model call is one line. The retry logic, permission checks, context management, and graceful degradation are everything else.

If you're building an agent and the happy path works but failures cascade unpredictably, you're missing at least three of these patterns. Start with permissions (Pattern 5) and circuit breakers (Pattern 6) — those are the ones that prevent the headlines about agents going rogue or burning budgets.

The orchestration loop is boring. That's the point. Boring infrastructure is reliable infrastructure.

GitHub 11321 pts 19652 comments

sanbuphy/claude-code-source-code: Claude Code v2.1.88 Source Code

Claude Code v2.1.88 Source Code

→ read on GitHub
GitHub 5123 pts 1556 comments

tvytlx/claude-code-deep-dive: Claude Code 源码深度研究报告

Claude Code 源码深度研究报告

→ read on GitHub
GitHub 5023 pts 5575 comments

NanmiCoder/claude-code-haha: Claude Code leaked source - locally runnable version

Claude Code leaked source - locally runnable version

→ read on GitHub
GitHub 2756 pts 3548 comments

oboard/claude-code-rev: Runnable ClaudeCode source code

Runnable ClaudeCode source code

→ read on GitHub
GitHub 2079 pts 524 comments

lintsinghua/claude-code-book: 42万字拆解 AI Agent 的Harness骨架与神经 —— Claude Code 架构深度剖析,15 章从对话循环到构建你自己的 Agent Harness。在线阅读网站:

42万字拆解 AI Agent 的Harness骨架与神经 —— Claude Code 架构深度剖析,15 章从对话循环到构建你自己的 Agent Harness。在线阅读网站:

→ read on GitHub
Devblogs 81 pts 44 comments

Reading leaked Claude Code source code

→ read on Devblogs

// share this

// get daily digest

Top 10 dev stories every morning at 8am UTC. AI-curated. Retro terminal HTML email.