Suh argues that most AI agent failures stem from the absence of real programming constructs — conditionals, loops, error handling, retry logic — between LLM calls. He contends that the industry's prompt-centric frameworks are using a probabilistic system to perform deterministic tasks, which is fundamentally wrong-headed.
The editorial endorses Suh's thesis, noting that using an LLM as a control plane makes every branch point probabilistic. It highlights that tasks like checking file existence, iterating over lists, and retrying failed API calls require deterministic control flow, not intelligence.
Suh directly targets frameworks like LangChain, CrewAI, and AutoGen for building elaborate abstractions that use the LLM itself as the orchestrator deciding which tool to call, when to loop, and when to stop. He characterizes this architectural pattern as 'engineering malpractice dressed up as innovation.'
The editorial extends Suh's argument by emphasizing the debugging nightmare that LLM-based orchestration creates. An if-statement either runs or it doesn't, but an LLM deciding whether to take an action is 'like asking a very smart person who sometimes mishears the question' — the failure modes are invisible and non-reproducible, making reliability engineering nearly impossible.
The HN response was described as 'overwhelmingly affirmative,' with practitioners sharing war stories of agent systems that became reliable only after ripping out prompt-based orchestration and replacing it with plain code. The 193-point score and 96 comments suggest strong community validation of the thesis from hands-on builders.
A blog post by developer Brian Suh — "Agents need control flow, not more prompts" — hit 193 points on Hacker News this week, crystallizing a frustration that's been simmering in the agent-building community for months. The core argument is deceptively simple: most AI agent failures aren't caused by bad prompts. They're caused by the absence of real programming constructs — conditionals, loops, error handling, retry logic — between LLM calls.
The post arrives at a moment when the agent ecosystem is drowning in prompt-centric frameworks. LangChain, CrewAI, AutoGen, and dozens of others have built elaborate abstractions for chaining LLM calls together, often using the LLM itself as the orchestrator that decides which tool to call next, when to loop, and when to stop. The thesis is blunt: this is engineering malpractice dressed up as innovation.
The Hacker News response was overwhelmingly affirmative, with practitioners sharing war stories of agent systems that became reliable only after they ripped out prompt-based orchestration and replaced it with plain code.
The agent gold rush has produced a peculiar architectural pattern: using a probabilistic system (an LLM) to perform deterministic tasks (routing, branching, iteration). When your agent needs to check if a file exists before editing it, that's an if-statement. When it needs to process every item in a list, that's a for-loop. When an API call fails, that's a try-catch with retry logic. None of these require intelligence. They require control flow.
The problem with using the LLM as your control plane is that you've made every branch point in your program probabilistic. An if-statement either runs or it doesn't — it's a coin with two sides, both visible. An LLM deciding whether to take an action is more like asking a very smart person who sometimes mishears the question. The failure modes are invisible and non-deterministic, which makes debugging a nightmare and reliability a prayer.
Consider the typical agent loop: the LLM receives a task, generates a plan, executes step one, observes the result, decides what to do next, and repeats until it declares itself done. Every one of those "decides" is an untyped, uncheckable branch point. There's no compiler catching your off-by-one. There's no type system ensuring your agent handles the error case. There's no stack trace when it silently takes the wrong branch. You've replaced structured programming with vibes.
This isn't a theoretical concern. Teams building production agent systems — customer support bots, code generation pipelines, data processing workflows — consistently report the same pattern: the system works 80% of the time on demos, then fails in bizarre, unreproducible ways in production. The fix is almost always the same: take the decision-making out of the LLM and put it in code. Let the LLM do what it's good at (understanding language, generating text, reasoning about ambiguous situations) and let your programming language do what *it's* good at (everything else).
The emerging consensus among practitioners who've shipped real agent systems looks something like this:
LLM as function, not as program. The LLM gets called with a specific, bounded task: "classify this customer message into one of these 5 categories," "generate a SQL query for this natural language question," "extract structured data from this document." The results flow back into regular code that handles routing, error cases, and iteration.
Deterministic skeleton, probabilistic muscles. Your program's control flow is written in Python, TypeScript, Go — whatever your team knows. The LLM calls are leaf nodes in that control flow, not the trunk. You can unit test the skeleton. You can add logging at every branch point. You can reproduce failures.
The 80/20 of agent reliability: about 80% of the "intelligence" in most agent workflows is just control flow that someone was too lazy to write as code. The remaining 20% — the genuinely ambiguous decisions that require language understanding — is where the LLM earns its keep.
This maps to what successful frameworks are starting to look like. Anthropic's own tool-use patterns in Claude, the function-calling paradigm in OpenAI's API, and lightweight orchestration libraries like Instructor and Marvin all share a philosophy: the LLM is a callable function with typed inputs and outputs, embedded in a program you control.
Contrast this with the "autonomous agent" pattern where you hand the LLM a goal and a set of tools and say "figure it out." That pattern is great for demos and terrible for production. The fully autonomous agent is the microservices-without-monitoring of the AI era: architecturally fashionable, operationally catastrophic.
If you're building agent-based systems today, here's the practical takeaway: audit every point in your pipeline where the LLM is making a routing or control decision. For each one, ask: "Could I replace this with an if-statement, a switch-case, or a lookup table?" If yes, do it. You'll get deterministic behavior, better debuggability, faster execution (no round-trip to the API), and lower cost.
For the decisions that genuinely require LLM judgment, constrain the output. Use structured outputs (JSON mode, tool calls, enum-typed responses) so the LLM's response maps cleanly to a branch in your code. Treat every LLM call like a foreign function call that might return garbage: validate the output, handle the error case, and have a fallback.
The practical architecture looks like: a state machine or workflow engine in your language of choice, with LLM calls at specific nodes where language understanding is actually needed. Libraries like Temporal, Prefect, or even a simple while-loop with match/case statements will get you further than any agent framework that puts the LLM in the driver's seat.
This also has cost implications. Every unnecessary LLM call for a routing decision is latency and tokens you didn't need to spend. Teams that move control flow out of prompts typically report 3-5x reductions in token usage and proportional speedups, because it turns out most of your "agent reasoning" was just the LLM figuring out which if-branch to take.
The agent ecosystem is going through its own version of the JavaScript framework wars: an explosion of abstractions solving problems that don't need to be solved with new abstractions. The blog post's resonance — 193 points on HN is significant for an architectural opinion piece — suggests the community is ready to move past the "prompt everything" phase. The winners in the agent space will be the teams that treat LLMs as powerful but unreliable components embedded in reliable systems, not as replacements for the reliable systems themselves. The boring engineering always wins in the end.
Top 10 dev stories every morning at 8am UTC. AI-curated. Retro terminal HTML email.