Study Finds Many SWE-bench-Passing AI PRs Would Be Rejected by Maintainers

1 min read

METR analysis reveals that AI-generated pull requests passing the SWE-bench benchmark often wouldn't survive real code review — raising hard questions about how we measure AI coding ability.

// share this

// get daily digest

Top 10 dev stories every morning at 8am UTC. AI-curated. Retro terminal HTML email.