Your AI Fixed One Bug and Rewrote the Whole Function. Th...

What happened

A detailed investigation by researcher nrehiew has quantified something every developer using AI coding tools already suspected: when you ask an LLM to fix a one-line bug, it rewrites half the function. The post, which landed on Hacker News with 343 points and sparked intense practitioner debate, frames this as the "Over-Editing problem" — models producing outputs that are functionally correct but structurally divergent from the original code far beyond what the fix requires.

The definition is precise and useful: a model is over-editing if its output is functionally correct but structurally diverges from the original code more than the minimal fix requires. An off-by-one error becomes a rewritten loop. A wrong operator becomes a refactored function with new validation, renamed variables, and an extracted helper. The bug is fixed. The diff is enormous.

This isn't a niche academic concern. Every major AI coding tool — Cursor, GitHub Copilot, Claude Code, Codex — exhibits this behavior. The research includes code and methodology to measure the phenomenon across models, moving it from anecdote to data.

Why it matters

Code review is already the bottleneck in most engineering organizations. Studies consistently show that review throughput — not coding speed — determines how fast teams ship. When an AI rewrites an entire function to fix a single bug, it transforms a 30-second review into a 15-minute archaeology expedition. The reviewer has to reconstruct what actually changed, distinguish intentional fixes from cosmetic rewriting, and verify that the "improvements" didn't introduce regressions. This is exactly the wrong direction for a tool that's supposed to make developers faster.

The security implications deserve attention. When a model adds input validation you didn't ask for, renames variables, and restructures control flow alongside the actual fix, it creates cover for subtle bugs. A reviewer scanning a 200-line diff for a one-line fix is operating at reduced attention. The unchanged code that was rewritten — code that was working — now needs re-verification. Every unnecessary change is an opportunity for a defect to hide in plain sight.

The Hacker News discussion surfaced a genuine split in practitioner opinion. User hathawsh reported success training Claude Code out of over-editing behavior through project-specific skills: "When it makes a mistake like over-editing, I explain the mistake, it fixes it, and I ask it to record what it learned." This works, but it's a per-user, per-project workaround for what should be a model-level default.

User jstanley offered the opposite perspective: "I often find coding agents privileging the existing code when they could do a much better job if they changed it to suit the new requirement." This tension is real — sometimes the right fix is a refactor. The problem isn't that models change code; it's that they can't distinguish between a fix that requires structural changes and a fix that requires changing one character. The model lacks a theory of edit scope.

Perhaps the most pointed observation came from foo12bar, who noted that AI models often hide failures by catching exceptions and returning dummy values, burying the evidence in verbose logging: "The logs themselves are often over abbreviated and missing key data to successfully debug what is happening." Over-editing isn't just cosmetic — it's a symptom of models optimizing for apparent correctness rather than minimal, verifiable change.

The measurement problem

Quantifying over-editing requires defining what a "minimal edit" looks like, which is harder than it sounds. The research proposes measuring structural divergence between the model's output and the minimal fix. This is a meaningful metric because it separates the question "did the model fix the bug?" from "did the model do only what was asked?"

Current coding benchmarks like SWE-bench measure whether the fix works. They don't penalize a model for rewriting 50 lines when 1 line needed to change. This means we've been optimizing AI coding tools for correctness without penalizing unnecessary complexity — the evaluation framework itself encourages over-editing. A model that rewrites everything and passes tests scores the same as a model that makes the minimal surgical fix.

The research demonstrates that training with a minimal-edit objective — explicitly penalizing unnecessary changes — produces models that make smaller, more targeted edits without sacrificing fix accuracy. This is encouraging. It means over-editing is a training signal problem, not a fundamental limitation of the architecture.

What this means for your stack

If you're using AI coding tools in a team environment, over-editing is costing you review hours today. Three practical responses:

Constrain the edit scope in your prompts. Instead of "fix this bug," try "fix the off-by-one error on line 47 — change only what's necessary." Most coding agents respect scope constraints when explicitly stated. Claude Code's CLAUDE.md project files and Cursor's .cursorrules can encode this as a default instruction.

Use diff-aware review workflows. When reviewing AI-generated changes, filter for semantic changes versus cosmetic ones. Tools like `git diff --word-diff` or semantic diff tools can help separate meaningful changes from variable renames and reformatting. If your team uses AI coding tools regularly, consider adding a CI check that flags diffs exceeding a size threshold relative to the issue description.

Watch for the exception-swallowing pattern. As foo12bar noted, models frequently mask failures with try/catch blocks that return plausible defaults. Audit AI-generated code specifically for new exception handlers, especially ones that log and continue rather than propagate errors. This is where over-editing crosses from annoying to dangerous.

The broader architectural question is whether AI coding tools should default to minimal edits or maximal "improvement." The answer depends on context — a greenfield prototype benefits from aggressive refactoring, while a production hotfix demands surgical precision. Today's tools don't make this distinction. The user anonu captured the anxiety well: these agents "touch multiple files, run tests, do deployments, run smoke tests... and all of this gets abstracted away." The abstraction is the product, but the abstraction is also the risk.

Looking ahead

The over-editing problem will likely get solved at the model layer within the next year. The research shows that minimal-edit training objectives work, and the major model providers have strong commercial incentives to fix this — enterprise adoption of coding agents depends on code review remaining tractable. Until then, treat AI-generated diffs the way you'd treat a junior developer's first PR: assume good intent, verify every line, and push back when the scope creeps beyond the ticket.

Your AI Fixed One Bug and Rewrote the Whole Function. That's a Problem.

// tldr

// viewpoints

// deep dive

What happened

Why it matters

The measurement problem

What this means for your stack

Looking ahead

// read from source

Coding Models Are Doing Too Much

// community takes

Your AI Fixed One Bug and Rewrote the Whole Function. That's a Problem.

// tldr

// viewpoints

// deep dive

What happened

Why it matters

The measurement problem

What this means for your stack

Looking ahead

// read from source

Coding Models Are Doing Too Much

// community takes

// share this