Performed a forensic analysis of 119,866 API calls across two machines, documenting a consistent 17.1% overpayment on both Sonnet-4-6 and Opus-4-6 models compared to the February baseline. The data shows $949.08 in excess charges on Sonnet and $1,581.80 on Opus over roughly five weeks, establishing that the TTL change constitutes a material, undisclosed cost increase.
Frames the change as uniquely damaging because of Anthropic's cache pricing asymmetry — cache writes cost 12.5× more than reads ($3.75 vs $0.30 per million tokens on Sonnet). With a 5-minute TTL, developers pay the expensive write cost far more frequently, destroying the economic model that made prompt caching viable for interactive coding sessions.
Titled the submission emphasizing that Anthropic 'silently' made the change. There was no changelog entry, no developer blog post, and no API status update — developers only discovered the TTL reduction through unexpected bill spikes and exhausted quotas, suggesting Anthropic deliberately avoided disclosure.
By documenting over $2,500 in excess charges from just a two-machine personal setup over five weeks, the analysis implies devastating team-level costs. The editorial extrapolates that a 20-developer team running Claude Code daily would face five-figure annual overcharges — turning what appeared to be competitive API pricing into a significant hidden expense.
On March 6, 2026, Anthropic began rolling out a change to their prompt caching infrastructure that reduced the default cache time-to-live (TTL) from 1 hour to 5 minutes. The rollout completed by March 8. There was no changelog entry, no developer blog post, and no API status update. Developers discovered the change the hard way — through ballooning bills and unexpectedly exhausted quotas.
The issue surfaced publicly via [GitHub issue #46829](https://github.com/anthropics/claude-code/issues/46829) on the Claude Code repository, where a user named cnighswonger posted a forensic analysis of 119,866 API calls across two machines. The data was extracted from local Claude Code session logs (`~/.claude/projects/*/**.jsonl`) and cross-referenced against Anthropic's official `rates.json` pricing file. The numbers are damning: a consistent 17.1% overpayment on both Sonnet-4-6 and Opus-4-6 models compared to the February baseline when 1-hour TTL was active.
In absolute terms, the analyzed workload showed $949.08 in excess charges on Sonnet and $1,581.80 on Opus — over a roughly five-week window. For a two-machine personal setup. Scale that to a team of 20 developers running Claude Code daily, and you're looking at five-figure annual overcharges that appeared with zero warning.
### The Economics of Cache Expiry
To understand why a TTL change hits so hard, you need to understand Anthropic's cache pricing asymmetry. When you write to the prompt cache (`cache_creation`), you pay 12.5× more per token than when you read from it (`cache_read`) — $3.75 per million tokens vs. $0.30 on Sonnet. The entire value proposition of prompt caching is: pay the write cost once, then amortize it over many cheap reads within the TTL window.
With a 1-hour TTL, an interactive coding session naturally stays within the cache window. You write your system prompt, your CLAUDE.md context, your conversation history — and for the next hour, every follow-up request reads that cached context at the discounted rate. A developer might pause to review a diff, read documentation, or grab coffee. The cache survives.
With a 5-minute TTL, that same pause triggers a full cache expiry. The next request must re-upload the entire context at the write rate. The analyzed dataset showed 220 million tokens written to the 5-minute tier that generated 5.7 billion cache reads — evidence that this context was actively being reused and would have stayed warm under the old TTL. Instead, developers paid write-rate prices repeatedly for context the system already had.
The month-by-month waste percentages tell the story clearly: January 2026 (pre-optimization) saw 52.5% waste. February (1-hour TTL) dropped to just 1.1% — the system was working as designed. March jumped back to 25.9% waste after the TTL cut, and April settled at 14.8% as developers learned to work around the new constraints.
### The Quota Squeeze
The cost impact is only half the story. Subscription-tier users on Claude Pro plans report hitting their 5-hour usage quotas for the first time despite what they describe as "moderate usage." This makes mechanical sense: cache creation tokens count at full rate toward quota limits, while cache reads are discounted. When the TTL forces more cache creations, quota burn accelerates even if the developer's actual productivity hasn't changed. You're paying more to do the same work.
### Anthropic's Defense
Jarred Sumner from Anthropic responded in the thread with a nuanced but ultimately unsatisfying explanation. The key claims:
1. The change was intentional, part of ongoing cache optimization — not a bug. 2. 1-hour writes cost more than 5-minute writes (2× base input multiplier vs. 1.25×), so per-write savings exist. 3. Many requests are one-shot with no cache reuse within the hour, making the longer TTL wasteful from Anthropic's infrastructure perspective. 4. The client can select TTL based on reuse patterns, enabling per-request optimization.
The problem with this defense: it optimizes for Anthropic's infrastructure costs, not for the developer's bill. Yes, one-shot requests waste server-side cache memory. But interactive coding sessions — the primary use case for Claude Code — are definitionally not one-shot. They are long-running, context-heavy, and pause-heavy. The 5-minute TTL is a mismatch for the product's core workflow.
Sumner also noted that v2.1.90 fixed a client-side bug where sessions were stuck on 5-minute TTL even after quota exhaustion, suggesting the tooling itself wasn't properly adapting to the new regime.
### Immediate Actions
If you're running Claude Code or any Anthropic API integration with prompt caching, audit your costs now. Compare your February invoice (1-hour TTL baseline) against March and April. The cnighswonger analysis tool at `cnighswonger/claude-code-cache-fix` on GitHub can parse your local session logs to quantify the impact on your specific workload.
For Claude Code users specifically, the community has converged on several workarounds:
- Compact sessions before pauses. If you know you're stepping away for more than 5 minutes, run a session compact to reduce the context that will need to be re-cached. - Front-load your CLAUDE.md. The most important context should appear first, ensuring that if cache creation is expensive, it's at least creating high-value cache entries. - One task per session. Shorter, focused sessions reduce the probability of cache expiry mid-work. - Structure work in sub-5-minute bursts. This is dystopian advice, but it's the mathematical reality of the pricing model.
### The Broader Pattern
This incident fits a growing pattern in the AI API market: silent pricing changes that shift costs to developers without clear communication. When OpenAI deprecated older model endpoints, they gave 6-month deprecation windows. When AWS changes reserved instance pricing, it's a blog post and a 90-day notice. Anthropic changed a core pricing parameter with no notice at all.
For teams building production systems on Anthropic's API, this is a trust signal to take seriously. Not because the change itself is unreasonable — caching infrastructure is genuinely expensive and TTL tradeoffs are real — but because the lack of communication means you cannot plan for pricing changes. Your cost models are only as stable as the last parameter Anthropic decided to quietly adjust.
If you're evaluating Anthropic vs. alternatives for cost-sensitive workloads, factor in pricing volatility risk, not just the current rate card. Build monitoring that alerts on cache hit ratio drops, and set up cost anomaly detection that would have caught this change within days, not weeks.
Anthropic has stated no reversion is planned. The 5-minute TTL is the new normal. The most likely evolution is smarter client-side TTL selection — Claude Code already attempts to choose TTL tiers based on expected reuse — but that puts the optimization burden on the client rather than the platform. Expect third-party tooling to fill the gap: session-aware cache warming, predictive compaction, and cost-optimized prompt structuring are all tractable problems. The developers who got burned by this change are already building the monitoring they wish they'd had. That's the silver lining — and it's thin.
Claude Code and the subscription are now less useful than a few months ago. Claude Code and the service seem to pick up more and more issues as time goes by: more bugs, fast quota drain, reduced quota, poor model performance, cache invalidation problems, MCP related bugs, potential model quantizatio
The title should be changed. It makes it look like they upped the TTL from 1 h to 5 months.The SI symbol for minutes is "min", not "M".A compromise would be to use the OP notation "m".
So a side effect of this is -- even at 1 hour caching -- ...If you run out of session quota too quickly and need to wait more than an hour to resume your work ... you are paying even more penalty just to resume your work -- a penalty you wouldnt have needed if session quota was not so restrictive in
It's also routinely failing the car wash question across all models now, which wasn't the case a month ago. :-/Seeing some things about how the effort selector isn't working as intended necessarily and the model is regressing in other ways: over-emphasizing how "difficult&qu
Top 10 dev stories every morning at 8am UTC. AI-curated. Retro terminal HTML email.
Has anybody else noticed a pretty significant shift in sentiment when discussing Claude/Codex with other engineers since even just a few months ago? Specifically because of the secret/hidden nature of these changes.I keep getting the sense that people feel like they have no idea if they ar