Anthropic's Silent Cache TTL Cut: Optimization or Stealth Price Hike?

5 min read 1 source multiple_viewpoints
├── "This is a stealth price increase disguised as a caching infrastructure change"
│  ├── cnighswonger (GitHub Issues) → read

Analyzed 119,866 API calls across two machines and accounts, showing cache behavior shifted from 100% 1-hour tier on March 5 to 93% 5-minute tier by March 21. Calculated a 17.1% cost increase ($949 overpayment on Sonnet over three months, $1,582 projected for Opus), with March variance jumping from 1.1% to 25.9% after the change.

│  └── @lsdmtme (Hacker News, 363 pts)

Surfaced the GitHub issue to Hacker News, framing it as Anthropic 'silently downgrading' the cache TTL. The framing of the submission — emphasizing the lack of any changelog, blog post, or deprecation notice — positions the change as deliberately hidden from users.

├── "The change is intentional and Anthropic has no plans to revert it"
│  └── Anthropic (GitHub Issues) → read

Closed the issue as NOT_PLANNED, confirming the TTL reduction was a deliberate product decision rather than a bug or regression. Offered no public explanation for why the change was made or why it was not communicated to users beforehand.

├── "The lack of communication is worse than the change itself"
│  └── top10.dev editorial (top10.dev) → read below

The editorial emphasizes that the change came with 'no changelog entry, no blog post, no deprecation notice,' and that the community had to reverse-engineer the shift through JSONL session file analysis. This framing positions the communication failure as a breach of developer trust independent of the pricing impact.

└── "The 5-minute TTL fundamentally breaks caching economics for real developer workflows"
  └── cnighswonger (GitHub Issues) → read

Demonstrated that any pause longer than five minutes between prompts — a common pattern when developers read code, think, or context-switch — expires the cache and triggers expensive full cache writes on the next prompt. The 12.5× cost asymmetry between cache writes and reads means the shorter TTL disproportionately punishes normal, non-continuous usage patterns.

What happened

On March 6, 2026, Anthropic quietly changed how Claude Code handles prompt caching. The default cache time-to-live (TTL) shifted from 1 hour to 5 minutes for most requests — no changelog entry, no blog post, no deprecation notice. A developer named cnighswonger noticed the shift after analyzing 119,866 API calls across two machines and two accounts, tracking JSONL session files from `~/.claude/projects/`. The data was unambiguous: on March 5, 100% of cached tokens used the 1-hour tier. By March 8, 83% had moved to the 5-minute tier. By March 21, it was 93%.

The issue landed on GitHub as [#46829](https://github.com/anthropics/claude-code/issues/46829) with detailed cost breakdowns, phase-by-phase token analysis, and a tool others could use to audit their own sessions. It hit the front page of Hacker News with 363 points. Anthropic closed the issue as NOT_PLANNED, confirming the change was intentional.

The cost math

Prompt caching on Claude works in two tiers: a 5-minute ephemeral cache and a 1-hour extended cache. The pricing asymmetry is significant. Cache *writes* cost roughly 12.5× more than cache *reads* — $3.75/MTok vs $0.30/MTok for Sonnet 4.6, and $6.25/MTok vs $0.50/MTok for Opus 4.6. The 1-hour tier's writes are even more expensive (roughly 2× base input cost), but that higher upfront cost amortizes over more read hits because the cache sticks around longer.

The issue author ran a counterfactual analysis: what would these same 119,866 calls have cost if the 1-hour TTL had remained in place? The answer, for Sonnet: $4,612 instead of $5,561 — a 17.1% cost increase, or $949 in absolute overpayment across three months. Projected to Opus pricing, the gap widens to $1,582. February, when 1-hour TTL was still the default, showed only 1.1% variance. March jumped to 25.9%.

The mechanism is straightforward. With a 5-minute TTL, any pause longer than five minutes between prompts expires the cache. The next prompt triggers a full cache write at the expensive rate instead of a cheap cache read. For developers who think for a few minutes between prompts — which is to say, most developers — this means dramatically more write operations.

Anthropic's defense

Jarred Sumner from the Claude Code team pushed back directly on the issue. His core argument: the cost analysis assumes all 5-minute writes would have become cheap reads under 1-hour TTL, which isn't true. Many requests are one-shot — you ask a question, get an answer, and never revisit that cached context. For those requests, paying the higher 1-hour write cost with no subsequent reads is pure waste.

Sumner stated: "The client picks per request based on the expected cache-reuse pattern; there is no single global default, by design." The March 6 change was part of an ongoing optimization where Claude Code's client-side logic selects the appropriate TTL tier based on whether it expects the content to be re-accessed. The claim is that across the full request mix — one-shot queries, iterative sessions, and everything in between — the new approach is net cheaper.

Anthropics also acknowledged a legitimate bug: in version 2.1.90, a client-side issue could cause sessions that had exhausted their subscription quota and moved to overages to get stuck on 5-minute TTL for the entire session, even when 1-hour would have been appropriate. That bug was fixed.

Why the community isn't buying it

The response on GitHub and HN was skeptical for several reasons. First, the silence. If this change genuinely saves users money, why not announce it? Pricing changes that benefit customers are marketing opportunities. Pricing changes that *hurt* customers are the ones that ship without changelogs.

Second, the quota impact. Multiple users reported hitting 5-hour quota limits for the first time in March 2026, directly coinciding with the TTL change. Cache creation tokens count toward quota at the full input rate, while cache reads are significantly cheaper — so more frequent cache writes burn through subscription quotas faster, regardless of dollar cost. For Pro plan users, this isn't an abstract pricing discussion; it's "my tool stopped working at 2pm."

As user EthanFrostpro put it: "This explains a lot about the quota burn rate increase people have been reporting. A 1h → 5min cache TTL change means cache_create operations happen 12x more frequently for the same session." Khalic-Lab was more blunt: "This change is terrible, need to compact every time I plan on stopping coding for 5 min."

Third, Anthropic's counterfactual is unfalsifiable. The claim that per-request TTL selection is "net cheaper across the request mix" requires access to Anthropic's aggregate data, which they haven't shared. The one developer who *did* publish granular data showed a 17% cost increase. Sample size of one, sure — but it's the only sample anyone can see.

The deeper trust problem

This incident fits a pattern that infrastructure providers should study carefully. When you operate a platform where pricing depends on opaque internal decisions — which cache tier to use, how to count tokens, when to expire context — silent changes to those decisions are functionally equivalent to price changes. The API prices per token didn't change. The behavior that determines *which* tokens you're charged for did.

It's the cloud billing equivalent of shrinkflation. The price on the label stays the same; the box gets smaller. Developers are sophisticated enough to notice, and they have the tools to prove it — cnighswonger's analysis parsed raw JSONL files to reconstruct exactly what happened. The era of "trust us, it's cheaper" without receipts is over when your users can audit every API call.

What this means for your stack

If you're running Claude Code or building on the Anthropic API with prompt caching, here's what to do:

Audit your cache behavior now. The tool at [cnighswonger/claude-code-cache-fix](https://github.com/cnighswonger/claude-code-cache-fix) can analyze your session files and break down your actual cache tier usage. Run it against your February and March data to see if your costs shifted.

Restructure your workflows for 5-minute windows. If you're on a subscription plan, the practical implication is that pauses longer than 5 minutes between prompts will trigger expensive cache rebuilds. Keep sessions focused: one task per session, avoid context-switching mid-session, and front-load critical context in your `CLAUDE.md` so cache writes target high-value content first.

Budget for the new reality. If your team was budgeting based on the old caching behavior, revise upward by 15-25% for iterative coding workflows. One-shot query patterns may indeed be cheaper; long-session patterns are definitively more expensive.

Watch issue #45756. Anthropic indicated that quota weighting for cache reads will be addressed separately in that tracking issue. If cache reads start counting less against quotas, that would meaningfully offset the TTL change's impact on subscription users.

Looking ahead

Anthropic is in an awkward position. They're simultaneously the AI model provider, the IDE tool builder (Claude Code), and the pricing authority — and a change that optimizes one persona's costs can hurt another's. The technical argument that per-request TTL selection is smarter than a blanket 1-hour default is probably correct in aggregate. But "correct in aggregate" doesn't help the developer staring at a quota wall at 2pm on a Tuesday. The fix isn't reverting the change; it's being transparent about it before your users have to reverse-engineer it from JSONL files.

Hacker News 521 pts 397 comments

Anthropic silently downgraded cache TTL from 1h → 5M on March 6th

→ read on Hacker News
sunaurus · Hacker News

Has anybody else noticed a pretty significant shift in sentiment when discussing Claude/Codex with other engineers since even just a few months ago? Specifically because of the secret/hidden nature of these changes.I keep getting the sense that people feel like they have no idea if they ar

foofloobar · Hacker News

Claude Code and the subscription are now less useful than a few months ago. Claude Code and the service seem to pick up more and more issues as time goes by: more bugs, fast quota drain, reduced quota, poor model performance, cache invalidation problems, MCP related bugs, potential model quantizatio

cassianoleal · Hacker News

The title should be changed. It makes it look like they upped the TTL from 1 h to 5 months.The SI symbol for minutes is "min", not "M".A compromise would be to use the OP notation "m".

albert_e · Hacker News

So a side effect of this is -- even at 1 hour caching -- ...If you run out of session quota too quickly and need to wait more than an hour to resume your work ... you are paying even more penalty just to resume your work -- a penalty you wouldnt have needed if session quota was not so restrictive in

disillusioned · Hacker News

It's also routinely failing the car wash question across all models now, which wasn't the case a month ago. :-/Seeing some things about how the effort selector isn't working as intended necessarily and the model is regressing in other ways: over-emphasizing how "difficult&qu

// share this

// get daily digest

Top 10 dev stories every morning at 8am UTC. AI-curated. Retro terminal HTML email.