Anthropic's Silent Cache TTL Cut: Optimization or Stealt...

What happened

On March 6, 2026, Anthropic quietly changed how Claude Code handles prompt caching. The default cache time-to-live (TTL) shifted from 1 hour to 5 minutes for most requests — no changelog entry, no blog post, no deprecation notice. A developer named cnighswonger noticed the shift after analyzing 119,866 API calls across two machines and two accounts, tracking JSONL session files from `~/.claude/projects/`. The data was unambiguous: on March 5, 100% of cached tokens used the 1-hour tier. By March 8, 83% had moved to the 5-minute tier. By March 21, it was 93%.

The issue landed on GitHub as [#46829](https://github.com/anthropics/claude-code/issues/46829) with detailed cost breakdowns, phase-by-phase token analysis, and a tool others could use to audit their own sessions. It hit the front page of Hacker News with 363 points. Anthropic closed the issue as NOT_PLANNED, confirming the change was intentional.

The cost math

Prompt caching on Claude works in two tiers: a 5-minute ephemeral cache and a 1-hour extended cache. The pricing asymmetry is significant. Cache *writes* cost roughly 12.5× more than cache *reads* — $3.75/MTok vs $0.30/MTok for Sonnet 4.6, and $6.25/MTok vs $0.50/MTok for Opus 4.6. The 1-hour tier's writes are even more expensive (roughly 2× base input cost), but that higher upfront cost amortizes over more read hits because the cache sticks around longer.

The issue author ran a counterfactual analysis: what would these same 119,866 calls have cost if the 1-hour TTL had remained in place? The answer, for Sonnet: $4,612 instead of $5,561 — a 17.1% cost increase, or $949 in absolute overpayment across three months. Projected to Opus pricing, the gap widens to $1,582. February, when 1-hour TTL was still the default, showed only 1.1% variance. March jumped to 25.9%.

The mechanism is straightforward. With a 5-minute TTL, any pause longer than five minutes between prompts expires the cache. The next prompt triggers a full cache write at the expensive rate instead of a cheap cache read. For developers who think for a few minutes between prompts — which is to say, most developers — this means dramatically more write operations.

Anthropic's defense

Jarred Sumner from the Claude Code team pushed back directly on the issue. His core argument: the cost analysis assumes all 5-minute writes would have become cheap reads under 1-hour TTL, which isn't true. Many requests are one-shot — you ask a question, get an answer, and never revisit that cached context. For those requests, paying the higher 1-hour write cost with no subsequent reads is pure waste.

Sumner stated: "The client picks per request based on the expected cache-reuse pattern; there is no single global default, by design." The March 6 change was part of an ongoing optimization where Claude Code's client-side logic selects the appropriate TTL tier based on whether it expects the content to be re-accessed. The claim is that across the full request mix — one-shot queries, iterative sessions, and everything in between — the new approach is net cheaper.

Anthropics also acknowledged a legitimate bug: in version 2.1.90, a client-side issue could cause sessions that had exhausted their subscription quota and moved to overages to get stuck on 5-minute TTL for the entire session, even when 1-hour would have been appropriate. That bug was fixed.

Why the community isn't buying it

The response on GitHub and HN was skeptical for several reasons. First, the silence. If this change genuinely saves users money, why not announce it? Pricing changes that benefit customers are marketing opportunities. Pricing changes that *hurt* customers are the ones that ship without changelogs.

Second, the quota impact. Multiple users reported hitting 5-hour quota limits for the first time in March 2026, directly coinciding with the TTL change. Cache creation tokens count toward quota at the full input rate, while cache reads are significantly cheaper — so more frequent cache writes burn through subscription quotas faster, regardless of dollar cost. For Pro plan users, this isn't an abstract pricing discussion; it's "my tool stopped working at 2pm."

As user EthanFrostpro put it: "This explains a lot about the quota burn rate increase people have been reporting. A 1h → 5min cache TTL change means cache_create operations happen 12x more frequently for the same session." Khalic-Lab was more blunt: "This change is terrible, need to compact every time I plan on stopping coding for 5 min."

Third, Anthropic's counterfactual is unfalsifiable. The claim that per-request TTL selection is "net cheaper across the request mix" requires access to Anthropic's aggregate data, which they haven't shared. The one developer who *did* publish granular data showed a 17% cost increase. Sample size of one, sure — but it's the only sample anyone can see.

The deeper trust problem

This incident fits a pattern that infrastructure providers should study carefully. When you operate a platform where pricing depends on opaque internal decisions — which cache tier to use, how to count tokens, when to expire context — silent changes to those decisions are functionally equivalent to price changes. The API prices per token didn't change. The behavior that determines *which* tokens you're charged for did.

It's the cloud billing equivalent of shrinkflation. The price on the label stays the same; the box gets smaller. Developers are sophisticated enough to notice, and they have the tools to prove it — cnighswonger's analysis parsed raw JSONL files to reconstruct exactly what happened. The era of "trust us, it's cheaper" without receipts is over when your users can audit every API call.

What this means for your stack

If you're running Claude Code or building on the Anthropic API with prompt caching, here's what to do:

Audit your cache behavior now. The tool at [cnighswonger/claude-code-cache-fix](https://github.com/cnighswonger/claude-code-cache-fix) can analyze your session files and break down your actual cache tier usage. Run it against your February and March data to see if your costs shifted.

Restructure your workflows for 5-minute windows. If you're on a subscription plan, the practical implication is that pauses longer than 5 minutes between prompts will trigger expensive cache rebuilds. Keep sessions focused: one task per session, avoid context-switching mid-session, and front-load critical context in your `CLAUDE.md` so cache writes target high-value content first.

Budget for the new reality. If your team was budgeting based on the old caching behavior, revise upward by 15-25% for iterative coding workflows. One-shot query patterns may indeed be cheaper; long-session patterns are definitively more expensive.

Watch issue #45756. Anthropic indicated that quota weighting for cache reads will be addressed separately in that tracking issue. If cache reads start counting less against quotas, that would meaningfully offset the TTL change's impact on subscription users.

Looking ahead

Anthropic is in an awkward position. They're simultaneously the AI model provider, the IDE tool builder (Claude Code), and the pricing authority — and a change that optimizes one persona's costs can hurt another's. The technical argument that per-request TTL selection is smarter than a blanket 1-hour default is probably correct in aggregate. But "correct in aggregate" doesn't help the developer staring at a quota wall at 2pm on a Tuesday. The fix isn't reverting the change; it's being transparent about it before your users have to reverse-engineer it from JSONL files.

Anthropic's Silent Cache TTL Cut: Optimization or Stealth Price Hike?

// tldr

// viewpoints

// deep dive

What happened

The cost math

Anthropic's defense

Why the community isn't buying it

The deeper trust problem

What this means for your stack

Looking ahead

// read from source

Anthropic silently downgraded cache TTL from 1h → 5M on March 6th

// community takes

Anthropic's Silent Cache TTL Cut: Optimization or Stealth Price Hike?

// tldr

// viewpoints

// deep dive

What happened

The cost math

Anthropic's defense

Why the community isn't buying it

The deeper trust problem

What this means for your stack

Looking ahead

// read from source

Anthropic silently downgraded cache TTL from 1h → 5M on March 6th

// community takes

// share this