Anthropic's Project Glasswing Uses AI to Formally Verify Critical Code

4 min read 1 source explainer
├── "LLMs can dramatically reduce the cost of formal verification, making it viable beyond niche critical systems"
│  └── Anthropic (anthropic.com) → read

Anthropic's core thesis with Glasswing is that LLMs can handle tedious intermediate proof steps in formal verification, reducing the brutal cost ratio that currently limits it to avionics, medical devices, and cryptographic libraries. They frame this as making software correctness 'transparent and provable' for a much broader class of software.

├── "The industry focus should shift from AI writing code faster to AI proving code correct"
│  └── top10.dev editorial (top10.dev) → read below

The editorial argues that while the industry has spent three years focused on using LLMs to generate code faster, Anthropic is asking a fundamentally different and more important question — whether LLMs can help prove code correct. This reframing positions verification as the higher-value application of AI to software engineering.

└── "The Hacker News response signals this resonates beyond typical AI hype"
  └── Ryan5453 (Hacker News, 1046 pts) → read

The HN submission drew over 1,000 points and 481 comments, which the editorial notes places it in rare company for a security-focused announcement. This unusually strong engagement from the HN community — typically skeptical of AI marketing — suggests the formal verification angle struck a genuine chord with practitioners.

What happened

Anthropic has unveiled Project Glasswing, an initiative that applies large language models to the problem of formally verifying critical software. The project, announced via [anthropic.com/glasswing](https://www.anthropic.com/glasswing), landed on Hacker News with a score north of 1,000 — placing it in rare company for a security-focused announcement and signaling that it struck a chord well beyond the typical AI hype cycle.

The name itself is telling. The glasswing butterfly (*Greta oto*) has transparent wings — you can see through them. Anthropic is making a deliberate metaphor: the goal is software whose correctness properties are transparent, visible, and provable, not hidden behind layers of testing assumptions.

Glasswing sits at the intersection of two historically separate disciplines: AI-assisted code generation and formal methods. While the industry has spent the last three years focused on using LLMs to *write* code faster, Anthropic is asking a different question — can LLMs help us *prove* code correct?

Why it matters

Formal verification has always been the gold standard for software correctness. Tools like Coq, Isabelle, and Lean can mathematically prove that code satisfies its specification. The problem is that writing formal proofs is brutally expensive. The seL4 microkernel — the most famous formally verified OS kernel — required roughly 20 person-years of effort for about 10,000 lines of C. That's a cost ratio that only makes sense for the most critical systems: avionics, medical devices, cryptographic libraries.

The core bet of Glasswing is that LLMs can dramatically reduce the cost of formal verification, making it economically viable for a much broader class of software. This isn't about replacing human proof engineers entirely — it's about having AI handle the tedious intermediate proof steps while humans focus on specifying *what* should be true.

The timing matters. We're in an era where AI is generating an increasing share of production code. GitHub reported that Copilot-suggested code accounts for over 40% of new code in files where it's active. That's a lot of code that was never manually reasoned about line-by-line. The traditional safety net — code review, unit tests, integration tests — catches bugs, but it doesn't provide guarantees. As AI-generated code proliferates in critical systems, the gap between "tested" and "proven correct" becomes a liability that scales with adoption.

The Hacker News response (1,000+ points) is notable because the formal verification community is typically small and skeptical. That this resonated broadly suggests developers are feeling the tension between shipping AI-generated code faster and maintaining confidence in what they're shipping. The discussion likely includes both enthusiasm from the formal methods crowd (finally, someone with resources is investing here) and skepticism about whether LLMs — which are fundamentally probabilistic — can contribute meaningfully to a discipline built on mathematical certainty.

This is a valid tension. LLMs hallucinate. Formal proofs don't tolerate hallucination — a proof either checks or it doesn't. But that's actually what makes this pairing interesting: the proof assistant acts as a perfect verifier. The LLM proposes proof steps, and the proof checker accepts or rejects them deterministically. You get the creativity and pattern-matching of neural networks with the ironclad guarantees of formal logic — the LLM is the hypothesis generator, not the judge.

What this means for your stack

For most practitioners, the immediate impact is indirect but important. If Glasswing succeeds, the first beneficiaries will be the libraries and infrastructure you depend on — TLS implementations, cryptographic primitives, serialization formats, compiler backends. These are the layers where a single bug (Heartbleed, anyone?) can cascade across millions of systems. Formally verified versions of these components would meaningfully reduce the attack surface of everything built on top of them.

The more direct question is whether Glasswing's approach will eventually reach application-level code. Today, formal verification of business logic is largely impractical. But if AI reduces the cost by even an order of magnitude, you could see verification becoming a CI/CD step for critical paths — payment processing, authentication flows, data migration logic. The practical threshold isn't perfection; it's whether AI-assisted verification becomes cheap enough to run on the 5% of your codebase where bugs are catastrophic.

Teams working on safety-critical systems — fintech, healthcare, autonomous vehicles, infrastructure — should track Glasswing closely. If Anthropic open-sources tooling (their track record with Claude suggests they might), early adoption in verification pipelines could become a competitive advantage and a compliance differentiator.

For the broader developer population, the signal is strategic: the AI-for-code conversation is maturing past "write code faster" toward "write code you can trust." If your organization is evaluating AI coding tools purely on generation speed, you're optimizing the wrong metric. The next wave of value is in AI that helps you understand and verify what you've already shipped.

Looking ahead

Project Glasswing represents Anthropic making a long bet that the highest-value application of AI in software isn't writing more code — it's building justified confidence in the code that already exists. Whether this specific project delivers on that promise or not, the direction is almost certainly right. The industry is generating code faster than it can reason about, and something has to close that gap. Formal verification, supercharged by LLMs, is one of the few approaches that offers actual guarantees rather than probabilistic comfort. Keep an eye on what Glasswing ships as tooling — that's when this moves from research announcement to something you can actually use.

Hacker News 1493 pts 804 comments

Project Glasswing: Securing critical software for the AI era

→ read on Hacker News
ofjcihen · Hacker News

I’m sure the new model is a step above the old one but I can’t be the only person who’s getting tired of hearing about how every new iteration is going to spell doom/be a paradigm shift/change the entire tech industry etc.I would honestly go so far as to say the overhype is detrimental to

9cb14c1ec0 · Hacker News

Now, its very possible that this is Anthropic marketing puffery, but even if it is half true it still represents an incredible advancement in hunting vulnerabilities.It will be interesting to see where this goes. If its actually this good, and Apple and Google apply it to their mobile OS codebases,

redfloatplane · Hacker News

The system card for Claude Mythos (PDF): https://www-cdn.anthropic.com/53566bf5440a10affd749724787c89...Interesting to see that they will not be releasing Mythos generally. [edit: Mythos Preview generally - fair to say they may release a similar model but not this exact one]I'm s

jryio · Hacker News

Let's fast forward the clock. Does software security converge on a world with fewer vulnerabilities or more? I'm not sure it converges equally in all places.My understanding is that the pre-AI distribution of software quality (and vulnerabilities) will be massively exaggerated. More small

burntcaramel · Hacker News

Previously Anthropic subscribers got access to the latest AI but it seems like there’s a League of Software forming who have special privileges. To make or maintain critical software will you have to be inside the circle?Who gates access to the circle? Anthropic or existing circle members or some ot

// share this

// get daily digest

Top 10 dev stories every morning at 8am UTC. AI-curated. Retro terminal HTML email.