Study Finds Many SWE-bench-Passing AI PRs Would Be Rejected by Maintainers

March 12, 2026 1 min read

// tldr

METR analysis reveals that AI-generated pull requests passing the SWE-bench benchmark often wouldn't survive real code review — raising hard questions about how we measure AI coding ability.

// deep dive

METR analysis reveals that AI-generated pull requests passing the SWE-bench benchmark often wouldn't survive real code review — raising hard questions about how we measure AI coding ability.

// get daily digest

Top 10 dev stories every morning at 8am UTC. AI-curated. Retro terminal HTML email.

Study Finds Many SWE-bench-Passing AI PRs Would Be Rejected by Maintainers

// tldr

// deep dive

// share this