Anthropic Quietly Cut Cache TTL by 12×. Developers Got the Bill.

5 min read 1 source clear_take
├── "Anthropic made an undisclosed change that materially increased costs, and the data proves it"
│  ├── cnighswonger (GitHub Issues) → read

Analyzed 119,866 API calls across two machines over four months, showing a clean transition from 100% 1-hour TTL to 93% 5-minute TTL starting March 6th. Computed $949–$1,582 in overpayment per model (17.1% waste) using Anthropic's own published rates, with no announcement, changelog, or documentation update accompanying the change.

│  └── @lsdmtme (Hacker News, 498 pts)

Surfaced the GitHub issue to the broader developer community, framing the change as a silent downgrade from 1-hour to 5-minute cache TTL. The post resonated widely, accumulating 498 points and 386 comments, indicating broad concern about the undisclosed nature of the pricing shift.

├── "The 5-minute TTL disproportionately punishes normal developer workflows"
│  └── cnighswonger (GitHub Issues) → read

Argues that real development involves natural pauses — reviewing PRs, reading docs, grabbing coffee — that easily exceed 5 minutes but rarely exceed 1 hour. The 12.5× cost multiplier between cache reads and writes means every routine break triggers a full context re-upload, turning normal work patterns into a billing penalty.

├── "The impact extends beyond cost to quota-limited subscription users"
│  └── cnighswonger (GitHub Issues) → read

Points out that Pro and subscription users are quota-limited, not just cost-billed, meaning the shorter TTL doesn't just inflate bills — it burns through usage quotas faster. This makes the change doubly harmful for users who chose subscriptions specifically to cap their spending.

└── "The lack of transparency is as damaging as the cost increase itself"
  └── top10.dev editorial (top10.dev) → read below

The editorial emphasizes that the transition happened with 'no announcement, no changelog entry, and no documentation update' over 16 days. This framing positions the silence as a trust violation separate from the financial impact — developers building production systems on Anthropic's APIs need to be able to rely on documented behavior.

What Happened

On March 6, 2026, Anthropic changed how Claude Code handles prompt caching — and told nobody. A developer who goes by cnighswonger noticed something odd in their billing: after 33 consecutive days of clean 1-hour cache TTL behavior across two independent machines, 5-minute TTL tokens suddenly reappeared. Within two days, 5-minute writes dominated 83% of traffic. By March 21st, that number hit 93%.

The evidence came from parsing raw session files (`~/.claude/projects/**/*.jsonl`) — 119,866 API calls spanning January through April 2026. The data tells an unambiguous story: February saw zero 5-minute cache creation tokens. March 5th was the last clean 1-hour-only day. March 6th, the first 5-minute tokens appeared. March 8th, they were dominant.

The transition from 100% 1-hour TTL to 93% 5-minute TTL happened in exactly 16 days, with no announcement, no changelog entry, and no documentation update.

Why It Matters

The economics here are straightforward and painful. Cache writes cost $3.75–$6.25 per million tokens depending on the model. Cache reads cost $0.30–$0.50 per million tokens. That's a 12.5× cost multiplier every time your cache expires and context has to be re-uploaded.

With a 1-hour TTL, a developer who pauses for 15 minutes to review a PR, grab coffee, or read documentation comes back to a warm cache. With a 5-minute TTL, that same pause means a full context re-upload at write rates. For a Claude Sonnet 4-6 user making 68,264 API calls in March, the difference was $719.09 — a 25.9% cost premium over what February's pricing model would have produced.

The total damage across four months and two models:

- Claude Sonnet 4-6: $949.08 overpaid (17.1% waste) - Claude Opus 4-6: $1,581.80 overpaid (17.1% waste)

These aren't theoretical numbers. They're computed from actual API responses using Anthropic's own published rates.

But the cost story isn't even the worst part. Pro and subscription users are quota-limited, not just cost-limited — and cache creation tokens count at full rate. Users started hitting 5-hour quota exhaustion for the first time in March. No explanation was offered. The quota walls just appeared.

Anthropic's Defense

Jarred Sumner from Anthropic responded to the issue with a counterargument worth taking seriously: the change was intentional optimization, not a regression. The core claim is that Claude Code now selects TTL tier per-request based on expected cache-reuse patterns. For one-shot requests where context is used once and never revisited, a 1-hour TTL is actually more expensive — 1-hour writes cost roughly 2× base input, versus 1.25× for 5-minute writes.

In other words, Anthropic's position is that the old behavior (1-hour TTL for everything) was a blunt instrument, and the new behavior (per-request TTL selection) is smarter routing that reduces aggregate cost.

There's a reasonable case that Anthropic is right about the economics in aggregate — and completely wrong about how they handled the rollout.

The issue author's cost tables do assume that all 5-minute writes would have been cheap cache reads under 1-hour TTL, which overstates the damage. Some of those writes genuinely were one-shot contexts. But even if you discount the numbers by 30-40%, you're still looking at hundreds of dollars in unexpected costs for a single moderate-usage developer.

Anthropic also acknowledged a real bug: v2.1.90 fixed a client-side issue where sessions that had exhausted quota could get stuck on 5-minute TTL indefinitely. That's a genuine fix. But it also means that for weeks, quota-exhausted sessions were being penalized by a bug that compounded the already-painful TTL change.

What This Means for Your Stack

If you're building on Claude's API, you need to audit your cache behavior now. The data extraction methodology is straightforward — parse your session JSONL files and look for `usage.cache_creation.ephemeral_5m_input_tokens` versus `ephemeral_1h_input_tokens` fields. The tool the issue author built (`cnighswonger/claude-code-cache-fix`) automates this analysis.

Practical mitigations until Anthropic offers a configurable TTL:

- Keep sessions focused. One task per session reduces the chance of cache expiration mid-workflow. If you're pausing for more than 5 minutes, expect a full context re-upload. - Front-load your context. Put your highest-value context (system prompts, CLAUDE.md, project conventions) at the top of every session. If you're paying cache-write rates, spend them on content that earns its weight across the session. - Monitor your token breakdown. Add cache-tier tracking to your API cost dashboards. The 5m/1h split is visible in API responses — there's no reason to fly blind. - Budget for the new reality. If your February costs were your baseline, add 15-25% for the same usage patterns going forward.

The deeper lesson: any cost-sensitive application built on a third-party API needs observability into billing-tier changes, because the provider's changelog won't always tell you.

The Trust Problem

Anthropic asked the community to trust that per-request TTL optimization is better for users in aggregate. Maybe it is. But "trust us, this is cheaper" is a hard sell when the change was invisible, the cost increase was measurable, and the first explanation came from a GitHub issue comment — not a blog post, not a changelog, not an email to affected accounts.

This isn't unique to Anthropic. AWS has a long history of pricing changes that technically improve aggregate economics while hurting specific usage patterns. Google Cloud has deprecated APIs with 90 days notice that broke production systems. The pattern is familiar: provider optimizes for the median user, power users absorb the variance.

The difference is that those providers usually *tell you*. A silent change to billing-critical infrastructure, discovered only through forensic log analysis, is a trust deficit that no amount of economic justification can fully cover.

Looking Ahead

Anthropic deferred the question of whether cache-read quota weighting will be disclosed (issue #45756), and declined to expose a user-configurable TTL setting. Both of those decisions deserve scrutiny. If the per-request optimization is genuinely better, making it transparent costs Anthropic nothing and buys significant goodwill. If it's not — well, that's exactly why users are asking for the knob.

The AI infrastructure market is moving fast enough that developers will tolerate a lot of rough edges. But they won't tolerate surprise invoices. Anthropic has a window to turn this into a trust-building moment: publish the TTL selection logic, expose the configuration, and commit to changelogs for billing-impacting changes. The alternative is that every future cost anomaly gets treated as adversarial until proven otherwise.

Hacker News 504 pts 392 comments

Anthropic silently downgraded cache TTL from 1h → 5M on March 6th

→ read on Hacker News
sunaurus · Hacker News

Has anybody else noticed a pretty significant shift in sentiment when discussing Claude/Codex with other engineers since even just a few months ago? Specifically because of the secret/hidden nature of these changes.I keep getting the sense that people feel like they have no idea if they ar

foofloobar · Hacker News

Claude Code and the subscription are now less useful than a few months ago. Claude Code and the service seem to pick up more and more issues as time goes by: more bugs, fast quota drain, reduced quota, poor model performance, cache invalidation problems, MCP related bugs, potential model quantizatio

cassianoleal · Hacker News

The title should be changed. It makes it look like they upped the TTL from 1 h to 5 months.The SI symbol for minutes is "min", not "M".A compromise would be to use the OP notation "m".

albert_e · Hacker News

So a side effect of this is -- even at 1 hour caching -- ...If you run out of session quota too quickly and need to wait more than an hour to resume your work ... you are paying even more penalty just to resume your work -- a penalty you wouldnt have needed if session quota was not so restrictive in

disillusioned · Hacker News

It's also routinely failing the car wash question across all models now, which wasn't the case a month ago. :-/Seeing some things about how the effort selector isn't working as intended necessarily and the model is regressing in other ways: over-emphasizing how "difficult&qu

// share this

// get daily digest

Top 10 dev stories every morning at 8am UTC. AI-curated. Retro terminal HTML email.