The Stanford research demonstrates that applications routinely assume write() calls guarantee data persistence, when in reality buffers, caches, and reordering optimizations between the application and persistent storage can lose data on crash. Their work provides fresh tooling and a modern lens on a problem first identified by Wisconsin's ALICE project in 2014 but made worse by increasingly complex storage stacks.
The editorial emphasizes that most production software treats file I/O as a solved problem — write bytes, close file, move on — while POSIX semantics contain traps that even experienced systems programmers fall into. The piece highlights how the common 'safe atomic write' pattern of temp-file-then-rename is not actually safe without fsync on both the file and parent directory, which almost nobody does.
The research specifically targets the widely-taught atomic write pattern — create temp file, write contents, rename over original — showing it requires fsync on both the file and the parent directory before the rename to be crash-safe. Without both syncs, a crash can leave an empty file or a missing file, contradicting what developers expect from this 'safe' pattern.
The editorial notes this isn't a new observation — Wisconsin's ALICE project identified similar issues in 2014 — but argues the problem has only worsened as applications have grown more complex and storage stacks have added more layers of buffering and optimization. The Stanford contribution is positioned as bringing modern tooling to a chronic, under-addressed systems problem.
Stanford's computer science department published research under the banner "Don't YOLO your file system" — a deliberately provocative title for what is actually a rigorous look at how applications interact with file systems during failure scenarios. The work, hosted at jai.scs.stanford.edu, addresses a problem that most developers acknowledge in theory but ignore in practice: your application's file writes are not as durable as you think they are.
The core claim is straightforward. Applications routinely assume that when they call `write()` and it returns successfully, the data is safe. It isn't. Between your application's write call and the bits actually landing on persistent storage, there's a stack of buffers, caches, and reordering optimizations that can lose your data if the system crashes at the wrong moment. The gap between what developers *assume* file systems guarantee and what file systems *actually* guarantee is where data corruption lives.
This isn't a new observation — the University of Wisconsin's ALICE project found similar issues back in 2014 — but the Stanford work brings fresh tooling and a modern lens to a problem that has only gotten worse as applications have grown more complex and storage stacks have added more layers.
The uncomfortable reality is that most production software treats file I/O like a solved problem. Write the bytes, close the file, move on. But POSIX file system semantics are full of traps that even experienced systems programmers fall into.
Consider a common pattern: your application writes a config file by creating a temporary file, writing the new contents, and renaming it over the old one. This is the "safe atomic write" pattern taught in every systems programming course. Except it's not safe without an `fsync()` on both the file *and* the parent directory before the rename — and almost nobody does both. Without those syncs, a crash can leave you with an empty file, a missing file, or the old contents — depending on which buffers the kernel decided to flush first.
The problem scales beyond config files. Databases, message queues, logging systems, container runtimes — anything that cares about data durability is affected. SQLite gets this right (it's been battle-tested for decades), but most application-level code does not. The research suggests that the majority of applications that claim to do "safe" file writes have at least one crash consistency bug.
What makes this particularly insidious is that these bugs are almost impossible to find through normal testing. Your test suite runs on a machine that doesn't crash mid-write. Your CI pipeline doesn't simulate power failures between `write()` and `fsync()`. The bug only manifests when production hardware loses power, a kernel panics, or an OOM killer strikes at exactly the wrong moment — and then your users lose data while your logs show nothing unusual.
The Hacker News discussion (159 points) reflects genuine practitioner anxiety — this is the kind of systems problem that senior engineers know exists but hope won't bite them. Comments split between those who've been burned by crash consistency bugs in production and those arguing that modern file systems (ext4 with `data=journal`, ZFS, btrfs) mitigate the worst cases. Both camps are right, which is exactly why this research matters.
File system crash consistency sits at the intersection of three guarantees that developers conflate:
Durability — the data survives power loss. This requires explicit `fsync()` or `fdatasync()` calls, and even then, the guarantee depends on the storage hardware actually honoring flush commands. Some SSDs lie about flushes for performance. Some cloud block stores add their own buffering layers.
Atomicity — the write is all-or-nothing. POSIX doesn't guarantee atomic writes beyond a single block (typically 4KB). Writing a 1MB JSON config file is not atomic no matter how you do it. The rename-over-temp-file pattern provides atomicity at the file level, but only if combined with proper syncing.
Ordering — writes happen in the order you issued them. File systems aggressively reorder writes for performance. ext4's default `data=ordered` mode only guarantees that data is written before metadata — it says nothing about the ordering of two different data writes. If your application depends on "file A is written before file B," you need explicit barriers.
The Stanford work provides tooling to systematically test applications against these failure modes. Rather than hoping your file I/O pattern is correct, you can inject simulated crashes at every possible interleaving point and verify that your application recovers to a consistent state. This is the file system equivalent of fuzzing — and like fuzzing, it tends to find bugs immediately.
If you're running any application that stores state on disk — which is nearly all of them — this research has direct implications.
First, audit your critical write paths. Find every place your application writes data that must survive a crash. Check whether it calls `fsync()` on the file descriptor, `fsync()` on the parent directory (for creates and renames), and whether it uses the rename-over-temp-file pattern for atomic updates. Most applications will fail at least one of these checks.
Second, understand your storage stack. If you're on cloud infrastructure, your block store may provide stronger or weaker guarantees than local SSDs. AWS EBS, for example, provides durability at the block level but doesn't guarantee write ordering across blocks without explicit flushes. If you're running databases on cloud VMs and haven't verified the crash consistency behavior of your specific block store, you're running on assumptions.
Third, consider your file system choice. ext4 with `data=journal` mode provides stronger ordering guarantees at a performance cost. ZFS and btrfs provide copy-on-write semantics that eliminate many (but not all) crash consistency issues. If you're still on ext4 `data=ordered` (the default on most Linux distributions), you're relying on the weakest common configuration.
For most application developers, the practical advice is: use a database for structured state, use SQLite for local state, and use the rename-over-temp-file-with-fsync pattern for everything else. Stop hand-rolling file I/O for critical data paths.
The "don't YOLO" framing is apt because it names the real problem: not ignorance, but willful optimism. Most developers know, in the abstract, that file systems have complex crash semantics — they just choose to believe their particular write pattern is fine. Stanford's contribution here isn't just the research findings; it's providing tools that replace faith with evidence. As applications grow more stateful and infrastructure grows more distributed, the cost of getting file I/O wrong only increases. The era of hoping your writes are durable needs to end.
I am still amazed that people so easily accepted installing these agents on private machines.We've been securing our systems in all ways possible for decades and then one day just said: oh hello unpredictable, unreliable, Turing-complete software that can exfiltrate and corrupt data in infinite
This looks great and seems very well thought out.It looks both more convenient and slightly more secure than my solution, which is that I just give them a separate user.Agents can nuke the "agent" homedir but cannot read or write mine.I did put my own user in the agent group, so that I can
I'm wondering if the obvious (and stated) fact that the site was vibe-coded - detracts from the fact that this tool was hand written.> jai itself was hand implemented by a Stanford computer science professor with decades of C++ and Unix/linux experience. (https://jai.scs.stanf
Ugh.The name jai is very taken[1]... names matter.[1]: https://en.wikipedia.org/wiki/Jai_(programming_language)
Top 10 dev stories every morning at 8am UTC. AI-curated. Retro terminal HTML email.
Add this to .claude/settings.json: { "sandbox": { "enabled": true, "filesystem": { "allowRead": ["."], "denyRead": ["~/"], "allowWrite": ["."], "denyWrite": ["/"] } } } You ca