Your AI Agent Doesn't Need a Better Prompt. It Needs an If Statement.

4 min read 1 source clear_take
├── "Agent frameworks should use traditional programming control flow instead of prompt chains for orchestration"
│  └── Brian Suh (bearblog) → read

Suh argues that the agent frameworks succeeding in production use ordinary programming constructs — if statements, for loops, try/catch blocks, and state machines — to orchestrate small, focused LLM calls. He contends that encoding orchestration logic in natural language prompts is fundamentally backwards, and that LLMs should be reserved for what they're good at: classification, generation, and summarization.

├── "Production experience proves that minimizing LLM responsibilities improves agent reliability"
│  └── top10.dev editorial (top10.dev) → read below

The editorial synthesizes the emerging pattern from teams running agents in production over the past 18 months: the less you ask the LLM to do, the more reliably the system works. It contrasts a prompt-heavy mega-prompt approach to customer support tickets against decomposed, deterministic control flow, arguing the latter is far easier to debug and maintain.

└── "Dominant agent frameworks like LangChain and CrewAI are over-indexed on prompt-based abstractions"
  └── Brian Suh (bearblog) → read

Suh directly challenges the design philosophy of frameworks like LangChain, CrewAI, and AutoGen, which lean heavily on prompt chains where one LLM call feeds another with orchestration logic encoded in natural language. He argues this pattern creates systems that are difficult to debug and unreliable in production, representing a wrong turn in the agent ecosystem.

What happened

Brian Suh published a post titled "Agents need control flow, not more prompts" that struck a nerve with the developer community, racking up over 500 points on Hacker News. The core argument is deceptively simple: the agent frameworks winning in production aren't the ones with the most sophisticated prompting — they're the ones that use ordinary programming constructs to orchestrate small, focused LLM calls.

The post arrives at a moment when the AI agent ecosystem is bloated with frameworks that treat prompts as the primary abstraction layer. LangChain, CrewAI, AutoGen — the dominant patterns all lean heavily on prompt chains, where one LLM call's output feeds into another LLM call's input, with the orchestration logic itself encoded in natural language instructions. Suh's argument is that this is fundamentally backwards.

Instead, the post advocates for what amounts to a return to basics: use `if` statements, `for` loops, `try/catch` blocks, and state machines to handle the control flow of your agent. Use LLM calls sparingly, for the things LLMs are actually good at — classification, generation, summarization — and let deterministic code handle everything else.

Why it matters

The timing of this post matters as much as the content. We're roughly 18 months into the "agent era" — long enough for early adopters to have built, deployed, and (crucially) maintained agent systems in production. The pattern that keeps emerging from teams with real production agents is the same: the less you ask the LLM to do, the more reliably the system works.

Consider the difference between two approaches to an agent that processes customer support tickets:

Prompt-heavy approach: A single mega-prompt instructs the LLM to read the ticket, classify it, decide if it needs escalation, draft a response, check the response for policy compliance, and format the output. The prompt is 2,000 tokens of instructions. When it fails — and it will — you're debugging natural language instructions, guessing which part of your 2,000-token prompt the model misinterpreted.

Control flow approach: Your code calls the LLM once to classify the ticket (5 categories, structured output). A `switch` statement routes to the appropriate handler. Another focused LLM call drafts a response given the classification and relevant templates. A deterministic function checks policy rules. Each piece is testable, loggable, and debuggable independently.

The second approach uses the same LLM, possibly the same total tokens, but it's fundamentally more maintainable. When something breaks, you get a stack trace pointing to a specific function, not a vague sense that "the prompt isn't working right."

This resonates because it maps to a lesson the software industry has learned repeatedly: abstractions that hide control flow create systems that are easy to build and hellish to debug. Enterprise service buses, complex ORM query builders, YAML-driven CI pipelines that outgrew their declarative model — the pattern is always the same. The abstraction works until it doesn't, and then you're fighting both the problem and the framework.

The Hacker News discussion reflects this hard-won experience. Practitioners from companies building agent systems report the same conclusion from different starting points. Teams that started with heavy prompt chains are migrating to thinner LLM calls wrapped in normal code. Teams that started with code-first approaches never hit the same walls.

What this means for your stack

If you're building or evaluating an agent framework, the practical implications are clear:

Audit your prompt-to-code ratio. Look at your agent's architecture and count: how many decisions are being made by prompt instructions vs. by your code? Every decision that could be a deterministic code branch but is instead encoded in a prompt is a future debugging session you're prepaying for with your sanity. Classification outputs should be enums. Routing should be conditionals. Retry logic should be loops with backoff. The LLM should handle language tasks — understanding intent, generating text, extracting structured data — not workflow orchestration.

Prefer small, typed LLM calls. Instead of asking the LLM to return a complex JSON object with nested decisions, make multiple calls that each return a simple, validated type. A classification call returns one of N labels. A generation call returns text. An extraction call returns a structured object against a schema. This makes each call independently testable and its failure mode predictable.

Treat frameworks with suspicion proportional to their abstraction level. The agent frameworks gaining traction with production teams — like Anthropic's own patterns in the Claude agent SDK, or the function-calling patterns in the OpenAI API — tend to be relatively thin orchestration layers. They give you structured tool use and message management, but leave control flow to your programming language. The frameworks that try to replace your `if` statements with prompt-based "reasoning" are the ones teams abandon after the prototype phase.

Invest in observability, not prompt tuning. When your control flow is in code, standard observability tools work: structured logging, distributed tracing, error tracking. You can see exactly which LLM call returned an unexpected result and what the downstream impact was. This is worth more than any prompt optimization.

Looking ahead

The "control flow, not prompts" insight feels obvious in retrospect, which is usually a sign it's correct. As agent architectures mature, expect the successful patterns to look less like novel AI frameworks and more like well-structured applications that happen to call LLMs. The teams shipping reliable agents in 2026 aren't the ones with the cleverest prompts — they're the ones writing the most boring, readable, debuggable code around their LLM calls. The LLM is the engine, but your code is the chassis, the steering, and the brakes. Don't let the engine drive itself.

Hacker News 564 pts 281 comments

Agents need control flow, not more prompts

→ read on Hacker News

// share this

// get daily digest

Top 10 dev stories every morning at 8am UTC. AI-curated. Retro terminal HTML email.