Constraint Decay: Why Your Coding Agent Forgets the Rule...

What happened

A paper making the rounds on Hacker News this week — *Constraint Decay: The Fragility of LLM Agents in Back End Code Generation* — puts a name and a number on something most people running coding agents in production have felt but couldn't measure. The authors define constraint decay as the probability, per agent turn, that a constraint stated earlier in the session is violated in newly generated code, and they show it rises roughly monotonically with session length across every frontier model they tested.

The setup is deliberately mundane. The authors take a realistic backend brief — a small REST service with the kind of rules every senior engineer has written into a Notion doc a hundred times. Things like *all monetary fields are integer cents, never floats*. *Every endpoint that mutates state must emit a domain event*. *Tenant ID is required on every query against the orders table*. *No raw SQL outside the repository layer*. They then ask the agent to extend the service over 30-50 turns: add an endpoint, refactor a module, write tests, fix a bug, add a feature flag, and so on.

The agents do fine in the first ten turns. By turn 20, violations creep in: a float here, a missing tenant filter there. By turn 40, in the worst-performing configurations, more than half of new code violates at least one constraint that was stated explicitly in turn 1. The constraint is still sitting in the system prompt. The model can recite it if asked. It just stops applying it.

Why it matters

This is the failure mode that separates demoware from production agents, and naming it matters because it forces a more honest conversation about what these systems are actually doing. The popular framing — 'the agent forgot' — is wrong; the constraint is still in context and the model can quote it back verbatim. What decays is not memory but the *priority* the constraint holds against the gravitational pull of patterns the model has seen a million times in its training data.

Floats for money are everywhere on GitHub. Tenant-scoped queries are not. Domain events are rarer still. So the moment your invariant runs counter to the statistical mode of public code, every additional turn is another roll of the dice against you. The paper's regression analysis suggests that the more 'unusual' a constraint is — measured roughly by how often the opposing pattern appears in pretraining-scale corpora — the faster it decays. Constraints that match the statistical default (e.g., 'use async/await') barely decay at all. Constraints that fight the default decay fastest.

This lines up with what teams shipping Claude Code, Cursor agent mode, Devin, and the various open-source agent frameworks have been quietly reporting. The first hour of a coding session feels magical; the third hour feels like supervising a smart intern who keeps drifting back to bad habits no matter how many times you correct them. The paper's contribution is to show that this isn't anecdotal and isn't model-specific — it's a structural property of how transformer-based agents weight in-context instructions against pretraining priors.

A few caveats are worth surfacing before anyone over-indexes on the result. The benchmark is synthetic, the constraint set is hand-curated, and 'violation' is detected by a static analyzer the authors wrote, which has its own false-positive rate. The strongest configurations in the paper — agents with a constraint-checker tool wired into the loop — keep decay near zero across the full 50 turns, which suggests the problem is tractable, not fundamental. And the paper doesn't really test the obvious mitigation of summarizing and re-injecting constraints every N turns, which several practitioner threads under the HN post pointed out.

What this means for your stack

If you are running an agent against a real codebase, the immediate implication is that system prompts are not where invariants live. They are a hint, not a guarantee. Treat the system prompt the way you'd treat a comment in code: useful documentation, zero enforcement.

The enforcement layer has to be executable. In practice this means three things: (1) every business rule that matters becomes a lint rule, a type, or a test that runs on every agent turn; (2) the agent's tool loop includes a 'check constraints' step that surfaces violations back into context as concrete errors, not as reminders; and (3) the constraint file itself is re-read from disk each turn, so even if it falls out of the attention window, the checker still enforces it. Teams that have built this — Anthropic's own Claude Code project uses a variant, as do most of the serious agent harnesses — report decay rates that look much closer to the paper's instrumented configurations.

The second implication is about session length itself. If decay rises with turns, then the right move is often to *end the session*. Snapshot state, write a handoff note, start fresh. This feels wasteful — you lose the warm context — but the paper's data suggests you lose less than you think, because by turn 30 the model is already operating on a degraded version of that context anyway. The agent-orchestration frameworks worth using in 2026 will be the ones that make this kind of explicit session boundary cheap and natural, not the ones that brag about million-token windows.

Looking ahead

Expect 'constraint decay' to enter the vocabulary the way 'context rot' and 'lost in the middle' did before it — a shorthand for a real, measurable failure mode that vendors will now have to address head-on. The interesting question isn't whether the next generation of models reduces decay (they probably will, marginally) but whether the harness layer around them grows up fast enough to make it irrelevant. The teams treating their agent loop as a *control system* with feedback and invariants, rather than a chat interface with extra steps, are going to ship the agents that actually work past hour three.

Constraint Decay: Why Your Coding Agent Forgets the Rules at Hour 3

// tldr

// viewpoints

// deep dive

What happened

Why it matters

What this means for your stack

Looking ahead

// read from source

Constraint Decay: The Fragility of LLM Agents in Back End Code Generation

// community takes

Constraint Decay: Why Your Coding Agent Forgets the Rules at Hour 3

// tldr

// viewpoints

// deep dive

What happened

Why it matters

What this means for your stack

Looking ahead

// read from source

Constraint Decay: The Fragility of LLM Agents in Back End Code Generation

// community takes

// share this