Anthropic's Glasswing Uses AI to Formally Verify Critical Software

4 min read 1 source explainer
├── "LLMs can finally make formal verification economically viable for mainstream software"
│  └── Anthropic (anthropic.com) → read

Anthropic's core thesis with Project Glasswing is that formal verification has been too expensive and too specialized for decades, requiring efforts like 20 person-years for 10,000 lines of C (seL4). They argue that LLMs can close the gap between what we should verify and what we actually verify, making mathematical correctness proofs accessible beyond aerospace and defense.

├── "Current testing and static analysis tools are fundamentally insufficient for critical software correctness"
│  └── Anthropic (anthropic.com) → read

Anthropic argues that test suites, fuzzing, and static analysis — the safety net for millions of lines of production code — fundamentally cannot guarantee the absence of entire bug classes like buffer overflows, race conditions, or logic errors in state machines. Unlike testing which shows absence of specific bugs, formal verification mathematically proves correctness for all inputs.

└── "The targets — kernels, crypto libraries, TLS, compilers — represent the highest-leverage intervention points"
  └── Anthropic (anthropic.com) → read

By focusing on OS kernels, cryptographic libraries, TLS implementations, and compilers, Glasswing targets the foundational code that underpins everything else. A verified compiler or crypto library has cascading trust benefits for all software built on top of it, following the precedent set by projects like CompCert and HACL*.

What happened

Anthropic announced Project Glasswing, an initiative to apply AI — specifically large language models — to the formal verification of critical software. The project, detailed at [anthropic.com/glasswing](https://www.anthropic.com/glasswing), targets the kind of code that underpins everything else: operating system kernels, cryptographic libraries, TLS implementations, compilers, and core infrastructure software.

The announcement landed on Hacker News with over 1,350 points, making it one of the most discussed security-adjacent AI stories of 2026 so far. The name itself is a tell — the glasswing butterfly has transparent wings, a natural metaphor for the kind of transparency Anthropic is pitching: software whose correctness you can see through, all the way down.

Glasswing's core thesis is that formal verification has been too expensive and too specialized for decades, and that LLMs can finally close the gap between what we *should* verify and what we *actually* verify.

Why it matters

Formal verification is the gold standard of software correctness. Unlike testing, which shows the absence of *specific* bugs, formal verification mathematically proves that code satisfies its specification for *all* inputs. Projects like seL4 (a formally verified microkernel), CompCert (a verified C compiler), and the HACL* cryptographic library have demonstrated that verification works — but at staggering cost. seL4 required roughly 20 person-years of proof engineering for about 10,000 lines of C.

The economics have never worked outside of aerospace, defense, and a handful of academic showcases — until now. The average engineering org ships millions of lines of code with test suites, fuzzing, and static analysis as their safety net. Those tools catch a lot, but they fundamentally cannot guarantee the absence of entire bug classes like buffer overflows, race conditions, or logic errors in state machines.

Meanwhile, AI code generation is accelerating the volume problem. GitHub reported in late 2025 that over 40% of new code in repositories with Copilot enabled was AI-generated. More code, generated faster, with the same verification bottleneck. Glasswing positions itself as the answer to a question the industry hasn't fully articulated yet: who verifies the code that AI writes?

The technical approach likely involves using LLMs to generate proof obligations, suggest lemmas, and fill in the tedious intermediate steps that make interactive theorem provers (Coq, Lean, Isabelle) so labor-intensive. This isn't entirely new — researchers at Google DeepMind (AlphaProof), Meta (via HTPS), and various academic groups have shown that LLMs can assist with mathematical theorem proving. But Glasswing appears to be the first major initiative specifically targeting *software* verification at scale, backed by a frontier AI lab with the compute and model capability to attempt it seriously.

What makes this different from academic proof-of-concept work is Anthropic's stated focus on software that already exists and already matters — not toy examples, but production cryptographic code and kernel subsystems.

The Hacker News discussion surfaced predictable but important tensions. Optimists pointed to the potential for democratizing formal methods — if an AI can generate 80% of a proof, the remaining 20% of human expert effort becomes feasible for organizations that could never justify 100%. Skeptics raised the trust bootstrapping problem: if you use an AI to generate a proof, you still need a trusted proof checker (like Coq's kernel) to validate it. The AI doesn't need to be trusted; the checker does. This is a genuine architectural strength — the proof artifact is independently verifiable regardless of how it was produced.

Other commenters flagged a more subtle concern: specification correctness. Formal verification proves that code matches its spec, but if the spec is wrong, you get a perfectly verified implementation of the wrong thing. LLMs are notoriously plausible-sounding when wrong, and writing specs requires the kind of precise reasoning where confident errors are more dangerous than obvious ones.

What this means for your stack

For most practitioners, Glasswing won't change your Monday morning. You're not going to formally verify your Next.js app. But the downstream effects matter.

If you maintain critical libraries or infrastructure: Watch this space. If Glasswing (or competitors) can reduce the cost of verifying a cryptographic primitive from person-years to person-weeks, the expectation will shift. Today, formal verification is a nice-to-have that earns you a conference paper. In two years, it might be a checkbox on procurement questionnaires for security-sensitive dependencies.

If you consume open-source crypto, TLS, or kernel code: You may start seeing "AI-assisted formal verification" badges on libraries. The question will be whether the verification covered the properties you care about (memory safety? functional correctness? side-channel resistance?) and whether the spec was reviewed by domain experts, not just generated by an LLM.

The practical implication for most teams: the bar for what counts as "adequately verified" critical software is about to move up, and the tools to meet that bar are about to get cheaper. If you're choosing between two TLS libraries and one has machine-checked proofs of correctness, that's no longer an academic differentiator — it's a supply chain security decision.

If you're interested in the formal methods toolchain: Lean 4 and Coq are the ecosystems most likely to benefit from AI-assisted proving. If Glasswing generates proofs in Lean (increasingly the community favorite for new work), expect the Lean ecosystem — mathlib, tooling, IDE support — to get a significant boost in investment and attention.

Looking ahead

Glasswing represents a bet that the AI safety lab best known for worrying about AI risk can redirect that same technical rigor toward making *existing* software safer. It's a neat narrative inversion: the same models that raised concerns about AI-generated vulnerabilities are now being aimed at eliminating them in the code we already depend on. The 1,350-point HN reception suggests the developer community sees the potential, even if the trust-but-verify instinct is (correctly) strong. The real test isn't the announcement — it's the first independently audited, AI-assisted formal proof of a production cryptographic library. That's when the economics of software verification actually change.

Hacker News 1470 pts 790 comments

Project Glasswing: Securing critical software for the AI era

→ read on Hacker News

// share this

// get daily digest

Top 10 dev stories every morning at 8am UTC. AI-curated. Retro terminal HTML email.