Despite climbing SWE-bench scores, analysis suggests LLM-generated PRs aren't getting merged at higher rates in practice — raising hard questions about whether benchmarks reflect real-world developer utility.
Top 10 dev stories every morning at 8am UTC. AI-curated. Retro terminal HTML email.