Anthropic's Silent Cache TTL Cut Is Costing Developers 1...

What Happened

On March 6, 2026, Anthropic began rolling out a change to their prompt caching infrastructure that reduced the default cache time-to-live (TTL) from 1 hour to 5 minutes. The rollout completed by March 8. There was no changelog entry, no developer blog post, and no API status update. Developers discovered the change the hard way — through ballooning bills and unexpectedly exhausted quotas.

The issue surfaced publicly via [GitHub issue #46829](https://github.com/anthropics/claude-code/issues/46829) on the Claude Code repository, where a user named cnighswonger posted a forensic analysis of 119,866 API calls across two machines. The data was extracted from local Claude Code session logs (`~/.claude/projects/*/**.jsonl`) and cross-referenced against Anthropic's official `rates.json` pricing file. The numbers are damning: a consistent 17.1% overpayment on both Sonnet-4-6 and Opus-4-6 models compared to the February baseline when 1-hour TTL was active.

In absolute terms, the analyzed workload showed $949.08 in excess charges on Sonnet and $1,581.80 on Opus — over a roughly five-week window. For a two-machine personal setup. Scale that to a team of 20 developers running Claude Code daily, and you're looking at five-figure annual overcharges that appeared with zero warning.

Why It Matters

### The Economics of Cache Expiry

To understand why a TTL change hits so hard, you need to understand Anthropic's cache pricing asymmetry. When you write to the prompt cache (`cache_creation`), you pay 12.5× more per token than when you read from it (`cache_read`) — $3.75 per million tokens vs. $0.30 on Sonnet. The entire value proposition of prompt caching is: pay the write cost once, then amortize it over many cheap reads within the TTL window.

With a 1-hour TTL, an interactive coding session naturally stays within the cache window. You write your system prompt, your CLAUDE.md context, your conversation history — and for the next hour, every follow-up request reads that cached context at the discounted rate. A developer might pause to review a diff, read documentation, or grab coffee. The cache survives.

With a 5-minute TTL, that same pause triggers a full cache expiry. The next request must re-upload the entire context at the write rate. The analyzed dataset showed 220 million tokens written to the 5-minute tier that generated 5.7 billion cache reads — evidence that this context was actively being reused and would have stayed warm under the old TTL. Instead, developers paid write-rate prices repeatedly for context the system already had.

The month-by-month waste percentages tell the story clearly: January 2026 (pre-optimization) saw 52.5% waste. February (1-hour TTL) dropped to just 1.1% — the system was working as designed. March jumped back to 25.9% waste after the TTL cut, and April settled at 14.8% as developers learned to work around the new constraints.

### The Quota Squeeze

The cost impact is only half the story. Subscription-tier users on Claude Pro plans report hitting their 5-hour usage quotas for the first time despite what they describe as "moderate usage." This makes mechanical sense: cache creation tokens count at full rate toward quota limits, while cache reads are discounted. When the TTL forces more cache creations, quota burn accelerates even if the developer's actual productivity hasn't changed. You're paying more to do the same work.

### Anthropic's Defense

Jarred Sumner from Anthropic responded in the thread with a nuanced but ultimately unsatisfying explanation. The key claims:

1. The change was intentional, part of ongoing cache optimization — not a bug. 2. 1-hour writes cost more than 5-minute writes (2× base input multiplier vs. 1.25×), so per-write savings exist. 3. Many requests are one-shot with no cache reuse within the hour, making the longer TTL wasteful from Anthropic's infrastructure perspective. 4. The client can select TTL based on reuse patterns, enabling per-request optimization.

The problem with this defense: it optimizes for Anthropic's infrastructure costs, not for the developer's bill. Yes, one-shot requests waste server-side cache memory. But interactive coding sessions — the primary use case for Claude Code — are definitionally not one-shot. They are long-running, context-heavy, and pause-heavy. The 5-minute TTL is a mismatch for the product's core workflow.

Sumner also noted that v2.1.90 fixed a client-side bug where sessions were stuck on 5-minute TTL even after quota exhaustion, suggesting the tooling itself wasn't properly adapting to the new regime.

What This Means for Your Stack

### Immediate Actions

If you're running Claude Code or any Anthropic API integration with prompt caching, audit your costs now. Compare your February invoice (1-hour TTL baseline) against March and April. The cnighswonger analysis tool at `cnighswonger/claude-code-cache-fix` on GitHub can parse your local session logs to quantify the impact on your specific workload.

For Claude Code users specifically, the community has converged on several workarounds:

- Compact sessions before pauses. If you know you're stepping away for more than 5 minutes, run a session compact to reduce the context that will need to be re-cached. - Front-load your CLAUDE.md. The most important context should appear first, ensuring that if cache creation is expensive, it's at least creating high-value cache entries. - One task per session. Shorter, focused sessions reduce the probability of cache expiry mid-work. - Structure work in sub-5-minute bursts. This is dystopian advice, but it's the mathematical reality of the pricing model.

### The Broader Pattern

This incident fits a growing pattern in the AI API market: silent pricing changes that shift costs to developers without clear communication. When OpenAI deprecated older model endpoints, they gave 6-month deprecation windows. When AWS changes reserved instance pricing, it's a blog post and a 90-day notice. Anthropic changed a core pricing parameter with no notice at all.

For teams building production systems on Anthropic's API, this is a trust signal to take seriously. Not because the change itself is unreasonable — caching infrastructure is genuinely expensive and TTL tradeoffs are real — but because the lack of communication means you cannot plan for pricing changes. Your cost models are only as stable as the last parameter Anthropic decided to quietly adjust.

If you're evaluating Anthropic vs. alternatives for cost-sensitive workloads, factor in pricing volatility risk, not just the current rate card. Build monitoring that alerts on cache hit ratio drops, and set up cost anomaly detection that would have caught this change within days, not weeks.

Looking Ahead

Anthropic has stated no reversion is planned. The 5-minute TTL is the new normal. The most likely evolution is smarter client-side TTL selection — Claude Code already attempts to choose TTL tiers based on expected reuse — but that puts the optimization burden on the client rather than the platform. Expect third-party tooling to fill the gap: session-aware cache warming, predictive compaction, and cost-optimized prompt structuring are all tractable problems. The developers who got burned by this change are already building the monitoring they wish they'd had. That's the silver lining — and it's thin.

Anthropic's Silent Cache TTL Cut Is Costing Developers 17% More

// tldr

// viewpoints

// deep dive

What Happened

Why It Matters

What This Means for Your Stack

Looking Ahead

// read from source

Anthropic silently downgraded cache TTL from 1h → 5M on March 6th

// community takes

Anthropic's Silent Cache TTL Cut Is Costing Developers 17% More

// tldr

// viewpoints

// deep dive

What Happened

Why It Matters

What This Means for Your Stack

Looking Ahead

// read from source

Anthropic silently downgraded cache TTL from 1h → 5M on March 6th

// community takes

// share this