Hague calls Ghuloum's paper 'the clearest piece of technical writing I've ever read' and argues that these two Scheme papers teach compiler construction better than the standard curriculum. His central claim is that starting with a 50-line compiler that handles one integer literal and incrementally growing it through 24 steps produces working compiler writers, while parsing-heavy textbooks produce stuck students.
By resurfacing Hague's 18-year-old post and driving it to 477 points, the submitter endorses the view that the incremental/nanopass approach remains the best on-ramp to compiler writing. The submission's traction reflects agreement that mainstream compiler pedagogy still over-indexes on parsing theory.
The editorial argues that the Dragon Book and its descendants spend ~300 pages on LL/LR/LALR parsing before a single instruction is emitted, training compiler theorists rather than compiler writers. The evidence cited is completion rates: students following Ghuloum finish working native-code compilers, while Dragon Book readers stall in parser territory.
The piece acknowledges the standard objection that real compilers require proper parsing theory, SSA, dominance frontiers, and register allocation — all things LLVM genuinely depends on. This position holds that Ghuloum is fine for understanding, but insufficient preparation for working on production toolchains.
James Hague's 2008 blog post *"Want to Write a Compiler? Just Read These Two Papers"* hit the Hacker News front page again this week, pulling 477 points and hundreds of comments eighteen years after it was written. The two papers in question are Abdulaziz Ghuloum's "An Incremental Approach to Compiler Construction" (Scheme Workshop, 2006) and Sarkar, Waddell, and Dybvig's "A Nanopass Framework for Compiler Education" (ICFP, 2004). Both came out of the Indiana University Scheme group under Kent Dybvig.
The post's staying power is not nostalgia. It is the fact that every generation of programmers rediscovers that the standard compiler curriculum is optimized for the wrong thing. The Dragon Book (Aho, Sethi, Ullman, Lam) spends roughly 300 pages on parsing — LL, LR, LALR, ambiguity resolution — before you emit a single instruction. Modern LLVM tutorials, Crafting Interpreters, and university courses mostly preserve this shape: lexer → parser → AST → IR → codegen, each stage a monolith.
Ghuloum's paper, which Hague calls "the clearest piece of technical writing I've ever read," inverts the entire pedagogy. His compiler starts by handling exactly one program: a single integer literal. The compiler is about 50 lines of Scheme. It emits x86 assembly. It works. Then you add booleans. Then characters. Then unary operations. Each of twenty-four incremental steps ends with a compiler that compiles a strictly larger language than the one before — and passes a growing test suite.
The Dragon Book trains compiler theorists. Ghuloum's paper trains compiler writers. The difference shows up in completion rates, not IQ.
The standard objection is that "real" compilers need real parsing theory, a proper IR, a register allocator, SSA form, dominance frontiers. This is true of LLVM. It is not true of the thing you need to build to understand how compilers work. Ghuloum gets you to a working native-code compiler for a non-trivial subset of Scheme — including closures, tail calls, and heap-allocated data — in roughly 100 pages of paper plus a few hundred lines of code you write yourself. By step 9 you have procedure calls. By step 14 you have heap allocation. By step 21 you have closures with proper capture semantics. Every intermediate version compiles and runs.
The nanopass paper solves a different problem: once your compiler grows past a few hundred lines, the traditional "three giant passes" architecture becomes unreadable. The nanopass answer is to decompose the compiler into many tiny passes — sometimes fifty or more — each doing exactly one transformation over a precisely specified intermediate language. Instead of one 2,000-line codegen file, you get thirty 80-line files, each with a single responsibility and a typed input/output grammar. Dybvig's production Chez Scheme compiler, now the backend for Racket, uses this architecture. It is one of the fastest Scheme implementations on earth, and also arguably the most readable industrial compiler in existence.
The HN thread this week pulled out the real reason these papers endure. A top comment, paraphrased: *"I've tried to write a compiler four times using the Dragon Book. I gave up four times. I finished Ghuloum in a week."* That is not a comment about the Dragon Book being bad — it is a well-respected reference and covers material Ghuloum deliberately omits. It is a comment about the difference between a textbook and a tutorial, and about which one most working engineers actually need.
A secondary thread worth noting: Ghuloum died young, in 2012, at 29. His paper was a chapter of a PhD thesis he never fully completed. Hague's post is partly responsible for keeping the work in circulation. That's a fragile thing for something this foundational — the canonical PDF still lives on a single Scheme workshop server.
If you have ever wanted to write a DSL, a query optimizer, a template engine, a macro system, or a config language with real semantics, you are writing a compiler and you should stop pretending otherwise. The skills transfer directly. Most of the production "DSLs" in modern codebases — GraphQL resolvers, Terraform HCL, dbt models, Prisma schemas, even complex Kubernetes controllers — are badly-implemented compilers whose authors did not know they were writing one.
The concrete recommendation: block a weekend. Read Ghuloum's paper once end-to-end without coding. Then implement it in whatever language you use at work — the paper is in Scheme but the technique is language-agnostic, and people have ported it to Rust, OCaml, Haskell, Go, and Zig. You will emerge with a genuine mental model of how CPUs execute code, why tail calls matter, what a calling convention actually is, and why register allocation is hard. That model pays off every time you touch a profiler, read assembly in Compiler Explorer, or debug a weird performance cliff in hot code.
The nanopass paper is a second weekend, and the payoff is different. It teaches a software architecture lesson that generalizes far beyond compilers: when a pipeline becomes hard to reason about, the answer is usually more stages with tighter contracts, not fewer stages with bigger ones. The same principle is why modern data pipelines prefer small dbt models over monolithic SQL, and why well-factored microservices beat mega-services — though the latter comparison is contested and depends on operational overhead you don't have in an in-process compiler.
LLM-assisted coding makes compiler literacy more valuable, not less. The developers who will extract the most from code-generating models are the ones who can read the generated assembly, recognize when a supposed "optimization" broke aliasing rules, and intuit what the compiler actually did with that inner loop. That intuition comes from having written a compiler once, badly, end-to-end — which is exactly what these two papers let you do in a weekend instead of a semester. Hague's post is eighteen years old. The advice is better now than when he wrote it.
Top 10 dev stories every morning at 8am UTC. AI-curated. Retro terminal HTML email.