Everything in C is undefined behavior — and the compiler knows it

5 min read 1 source clear_take
├── "C is effectively unusable as specified — every non-trivial program contains undefined behavior"
│  └── Thomas Habets (blog.habets.se) → read

Habets argues that the C standard contains over 200 distinct undefined behaviors in C17, making it practically impossible to write any non-trivial C program that is provably UB-free. He challenges readers to find such a program and notes that even the Linux kernel's Makefile (with flags like -fno-strict-aliasing) reads like a confession that the kernel itself cannot comply with the standard.

├── "The real problem is that modern optimizers treat UB as a precondition, not a warning"
│  └── top10.dev editorial (top10.dev) → read below

The editorial frames the practitioner-relevant point as a gap between the language programmers think they're writing and the language compilers actually compile. LLVM and GCC aggressively prune control-flow paths when they detect UB, meaning the optimizer can delete code, rewrite logic, or replace branches with traps — turning what looks like a localized bug into arbitrary global program rewrites.

└── "This is a long-standing, well-documented problem — not a new revelation"
  └── top10.dev editorial (top10.dev) → read below

The editorial situates Habets' post within a fifteen-year argument made by John Regehr, Chris Lattner, and others about UB-driven optimization. What makes Habets' framing land is not novelty but inversion: rather than telling programmers which patterns to avoid, he asks them to prove the negative — and nobody, including kernel maintainers, can.

What happened

Thomas Habets' blog post "Everything in C is undefined behavior" hit the Hacker News front page this week with 351 points and a comment thread that reads like a group therapy session for systems programmers. The thesis is uncomfortably simple: the C standard contains so many traps marked *undefined behavior* — over 200 distinct ones in C17, depending on how you count — that any program of meaningful size will hit at least one. And once you've hit one, the compiler is allowed to do anything: emit the code you wrote, delete it, replace it with `ud2`, or rewrite the surrounding logic as if the UB-triggering branch were unreachable.

Habets walks through the usual suspects — signed integer overflow, shifting by the width of the type, reading uninitialized memory, strict aliasing violations, modifying a string literal — and then keeps going. Pointer arithmetic across allocation boundaries is UB. Comparing pointers from different objects with `<` is UB. Calling `memcpy` with a null source pointer and a length of zero is UB. Even `INT_MIN % -1` is UB on most platforms. He notes that LLVM and GCC will, when they detect these, happily prune entire control-flow paths.

The post lands in the middle of a longer-running argument. Regehr, Lattner, and others have been making versions of this point for fifteen years. What makes Habets' framing land is the inversion: instead of "avoid these patterns," he asks you to find a non-trivial C program that provably contains zero UB. He can't. Neither can the commenters. Neither, importantly, can Linus Torvalds, whose `-fno-strict-aliasing -fno-delete-null-pointer-checks` flag list in the kernel Makefile reads like a confession.

Why it matters

The practitioner-relevant point is not "C is bad." The point is that the language you're compiling is not the language you think you're writing. Modern optimizers treat UB as a precondition, not a warning: if your code would only matter when UB is triggered, the optimizer is allowed to delete it. That's how the infamous CVE-2009-1897 happened — a null check in `tun_chr_poll` was removed by GCC because an earlier dereference "proved" the pointer was non-null. The kernel had a privilege escalation for months because of an optimization the standard explicitly permits.

This is also why the Rust-vs-C debate keeps refusing to die. Rust's safety story isn't just about memory; it's about specification. The C standard is a contract between you and the compiler, and the compiler has the better lawyers. Every version of GCC and Clang since around 2010 has gotten more aggressive about exploiting UB for optimization, and there is no indication that trend is reversing. LLVM 19 (shipped late 2024) added new UB-exploiting passes around `freeze` and poison propagation that surprised even longtime contributors.

The community reactions in the HN thread split predictably. The C-defenders argue, correctly, that most UB is avoidable with discipline, modern tooling (UBSan, ASan, MSan, TSan), and `-fsanitize=undefined` in CI. They point to OpenSSH, SQLite, and the Linux kernel as proof that disciplined C is shippable. The skeptics argue, also correctly, that "don't write bugs" has never been a working strategy at scale, and that the cost of UB-induced bugs — Heartbleed, Shellshock, dirty COW, the recent xz backdoor's exploitation of C semantics — has been borne by users, not vendors. Both camps are right; they're just optimizing for different costs.

The more interesting reaction came from the formal-methods corner. Projects like CompCert (a formally verified C compiler) and the K Framework's C semantics have spent years trying to nail down what a "reasonable" subset of C actually means. Their answer, roughly: a much smaller language than what `gcc -O2` accepts. MISRA C, CERT C, and the C Secure Coding Standard exist because the actual C standard is, in practice, unimplementable as written without escape hatches.

What this means for your stack

If you ship C or C++ in production, three things should be non-negotiable in 2026. First, UBSan in CI on every PR, with `-fsanitize=undefined,address,integer` at minimum. The runtime cost is real (2-3x slowdown is typical), but you only pay it in test. Google's oss-fuzz has caught thousands of UB bugs this way in projects whose maintainers swore their code was clean.

Second, hardening flags by default: `-D_FORTIFY_SOURCE=3`, `-fstack-protector-strong`, `-fstack-clash-protection`, `-fcf-protection=full`, `-Wl,-z,relro,-z,now`. These don't fix UB, but they convert a class of UB-triggered exploits into crashes. The Linux kernel hardening project and the BSDs have been pushing these for years; if your build system doesn't have them, you are downstream of someone else's risk tolerance.

Third, stop writing new C where you have a choice. The CISA, NSA, and the White House ONCD have all published statements in the past two years recommending memory-safe languages for new development; insurance carriers are starting to ask about it during cyber-policy renewals. If you're starting a greenfield systems project today and you choose C over Rust, Zig, or Go for performance reasons, you should have a benchmark, not a vibe. For existing C codebases, the realistic path is incremental: new modules in Rust with C FFI, gradually displacing the perimeter. The Linux kernel, Android, and Windows have all picked this path. The pattern works.

For library authors specifically: assume your callers are hostile to your invariants. Use `_Generic`, `static_assert`, and `[[nodiscard]]` aggressively. Document UB preconditions in the API, not just the manpage. If your function dereferences a pointer, say so. If it requires aligned input, say so. The standard won't help you; your header file is the only contract that's actually read.

Looking ahead

C isn't going anywhere — there's too much of it, too much tooling around it, and too many ABIs frozen to its semantics. But the era of treating C as a "portable assembly language" is over and has been since the optimizer caught up. The honest framing for 2026 and beyond: C is a high-level language with a permissively-specified semantics that the compiler exploits for performance. Treat it like that, instrument it like that, and budget for the bugs like that. Habets' post is uncomfortable not because it's wrong, but because anyone who's spent a weekend debugging a `-O2`-only crash already knows it's right.

Hacker News 450 pts 594 comments

Everything in C is undefined behavior

→ read on Hacker News
muvlon · Hacker News

Yes there is tons of surprising and weird UB in C, but this article doesn&#x27;t do a great job of showcasing it. It barely scratches the surface.Here&#x27;s a way weirder example: volatile int x = 5; printf(&quot;%d in hex is 0x%x.\n&quot;, x, x); This is totally fine if x is just an int, but the v

beeforpork · Hacker News

The UB in unaligned pointers is even worse: an unaligned pointer in itself is UB, not only an access to it. So even implicit casting a void*v to an int*i (like &#x27;i=v&#x27; in C or &#x27;f(v)&#x27; when f() accepts an int*) is UB if the cast pointer is not aligned to int.It is important to unders

quelsolaar · Hacker News

The 5 stages of learning about UB in C:-Denial: &quot;I know what signed overflow does on my machine.&quot;-Anger: &quot;This compiler is trash! why doesn&#x27;t it just do what I say!?&quot;-Bargaining: &quot;I&#x27;m submitting this proposal to wg14 to fix C...&quot;-Depression: &quot;Can you rely

greysphere · Hacker News

The examples aren&#x27;t really undefined behavior. They are examples that could become UB based on input&#x2F;circumstances. Which if you are going to be that generous, every function call is UB because it could exceed stack space. Which is basically true in any language (up to the equivalent def o

bestouff · Hacker News

The problem of UB is not really that it may crash in some architecture. The real problem is that the compiler expects UB code to NOT happen, so if you write UB code anyway the compiler (and especially the optimizer) is allowed to translate that to anything that&#x27;s convenient for its happy path.

// share this

// get daily digest

Top 10 dev stories every morning at 8am UTC. AI-curated. Retro terminal HTML email.