Why Your Claude Code Max Quota Burns Out in 90 Minutes

5 min read 1 source explainer
├── "Background sessions silently draining quota is the core usability failure"
│  ├── molu0219 (GitHub Issues) → read

Filed the original issue with detailed token accounting showing that three background sessions — left open but not actively used — racked up 691 API calls and 103.9 million cache_read tokens, consuming 78% of the post-reset quota. Demonstrates that the actual coding session (222 calls) was moderate, and the drain came from sessions users reasonably assumed were idle.

│  └── @cmaster11 (Hacker News, 692 pts) → view

Submitted the issue to Hacker News where it drew 692 points and 613 comments, signaling broad community recognition that the background session behavior is unexpected. The framing — 'moderate usage' exhausting a $100/month plan in 90 minutes — highlights the gap between advertised plan value and actual experience.

├── "Prompt cache expiry on stale sessions is expected behavior, not a bug — the problem is transparency"
│  ├── @bcherny (Claude Code team) (GitHub Issues) → view

Confirmed that when a session goes stale (idle for over an hour), the prompt cache expires, and resuming triggers a full cache miss — sending up to 966,000 tokens as new cache creation in a single call. Framed this as the inherent physics of prompt caching with large context windows rather than a defect, while acknowledging users aren't equipped to reason about these costs.

│  └── top10.dev editorial (top10.dev) → read below

Argues the story exposes a fundamental mismatch between how developers think LLM coding tools consume resources and how they actually do. The 1M context window is a headline feature, but each API call against a near-max context session is enormously expensive — a cost model users have no intuition for and no tooling to monitor.

├── "Anthropic's issue triage and community response was dismissive and tone-deaf"
│  └── @GitHub community (91 downvoters) (GitHub Issues) → view

The issue was auto-flagged as a duplicate and nearly closed, earning 91 downvote reactions on the duplicate detection comment — effectively a community revolt. This forced the issue back open for proper investigation, suggesting Anthropic's automated triage is inadequate for high-signal user reports about billing and quota behavior.

└── "Subscription pricing for LLM tools is fundamentally misaligned with token-based cost structures"
  └── top10.dev editorial (top10.dev) → read below

Notes that users on plans as expensive as Max 20x ($200/month) report similar quota exhaustion, indicating the problem scales across tiers. The flat subscription model sets an expectation of predictable usage, but the underlying token economics — especially cache misses on large context windows — create wildly unpredictable consumption that no tier of subscription adequately covers.

What happened

On April 9, a Claude Code user filed [issue #45756](https://github.com/anthropics/claude-code/issues/45756) documenting something that immediately resonated with the community: their Pro Max 5x plan — Anthropic's $100/month tier with Opus access — burned through its entire quota in 90 minutes of what they described as "moderate usage." The issue drew 119 upvote reactions and 22 comments within four days, with multiple users on plans as expensive as Max 20x ($200/month) reporting similar experiences.

The original reporter, @molu0219, provided granular token accounting. During a 1.5-hour evening session using claude-opus-4-6 on WSL2, their main session made just 222 API calls with a peak context of 182,302 tokens. That alone shouldn't drain a Max plan. But three background sessions — left open but not actively used — racked up 691 additional calls and 103.9 million cache_read tokens. Background sessions that users thought were idle consumed 78% of the post-reset quota window.

The issue was initially auto-flagged as a duplicate and nearly closed, earning 91 downvote reactions on the duplicate detection comment — a community revolt that forced it back open for proper investigation.

Why it matters

This story matters because it exposes a fundamental mismatch between how developers *think* LLM coding tools consume resources and how they *actually* do.

### The 1M context window tax

Claude Code's 1M token context window is a headline feature. But each API call against a session at near-max context is enormously expensive. When a session goes stale (idle for over an hour), Claude Code's prompt cache expires, and resuming that session triggers a full cache miss — sending up to 966,000 tokens as new cache creation in a single call. That's not a bug; it's the physics of how prompt caching works with large context windows. But it's a physics that users aren't equipped to reason about.

@bcherny from the Claude Code team confirmed this in a pinned response: "Prompt cache misses when using 1M token context window are expensive. Since Claude Code uses a 1 hour prompt cache window for the main agent, if you leave your computer for over an hour then continue a stale session, it's often a full cache miss."

### The ghost session problem

The more insidious finding is the background session drain. @molu0219's three idle sessions — named things like "token-analysis" and "career-ops" — weren't being actively used but continued making API calls. The token-analysis session alone made 296 calls consuming 57.6 million cache_read tokens. Developers who open multiple Claude Code terminals (a common workflow when juggling projects) are unknowingly multiplying their quota burn rate.

### Community forensics settle the cache debate

The original hypothesis was that cache_read tokens might be counted at full rate against quota rather than the expected discounted rate. This would have been a pricing scandal. A community member, @cnighswonger, built a custom interceptor (claude-code-cache-fix) and analyzed 1,500+ API calls across six quota reset windows to test three models: cache_read at 0.0x, 0.1x, and 1.0x. The statistical winner, with a coefficient of variation of just 34.4% across windows, was the 0.0x model — cache_read tokens don't meaningfully count toward quota at all.

This is actually consistent with Anthropic's published pricing. But it deepened the mystery: if cache reads are free, why does quota drain so fast? The answer came from the interaction between cache *creation* (which does count, at full rate) and cache *misses* (which force re-creation). Every time you resume a stale session with a large context, you're paying full price to rebuild the cache — and with a 1M context window, that single cache creation event can consume a meaningful fraction of your hourly quota.

### The pricing transparency gap

What makes this particularly frustrating for users is the opacity. Claude Code doesn't surface real-time quota consumption, doesn't warn when background sessions are burning tokens, and doesn't alert users when they're about to resume a session that will trigger an expensive cache miss. The token accounting in @molu0219's report required manual API call logging and spreadsheet analysis — tooling that Anthropic itself should be providing.

The 119 upvotes on this issue aren't from people who don't understand token pricing. They're from paying customers on Anthropic's most expensive consumer plans who feel like they're flying blind.

What this means for your stack

Close your idle sessions. This is the single highest-leverage change. If you have multiple Claude Code terminals open across projects, each one is potentially making background API calls. Close what you're not actively using. The cost of `cd`-ing into a different project directory is trivial compared to the cost of a ghost session draining your quota.

Use `/clear` before resuming work after breaks. If you step away from a Claude Code session for more than an hour, run `/clear` before continuing. Yes, you lose context. But the alternative is a cache miss that sends your entire context window — potentially 966k tokens — as a fresh cache creation event. The `/clear` command is now the most cost-effective keystroke in Claude Code.

Consider whether you actually need 1M context. Anthropic is reportedly considering defaulting new sessions to 400k instead of 1M. You can get ahead of this by being intentional about context size. Most coding tasks don't need 1M tokens of context, and the quota math changes dramatically at smaller windows. A cache miss on a 400k context session costs roughly 2.4x less than one on a 966k session.

If you're evaluating AI coding tools for a team, factor in quota predictability alongside raw capability. A tool that's 20% less capable but has transparent, predictable resource consumption may deliver more net value than one that's brilliant for 90 minutes and then locked out.

Looking ahead

Anthropic's response — shipping UX nudges and exploring a context window default change — is the right direction but insufficient. The real fix is quota transparency: a real-time token budget dashboard, background session warnings, and cache miss cost previews before they happen. The community has already built third-party interceptors to get this visibility. When your paying users have to reverse-engineer your quota system with statistical analysis and custom proxies, the product has a legibility problem. The 1M context window is a powerful feature, but power without a gauge is just a way to run out of gas faster.

Hacker News 715 pts 635 comments

Pro Max 5x Quota Exhausted in 1.5 Hours Despite Moderate Usage

→ read on Hacker News
bcherny · Hacker News

Hey all, Boris from the Claude Code team here.We've been investigating these reports, and a few of the top issues we've found are:1. Prompt cache misses when using 1M token context window are expensive. Since Claude Code uses a 1 hour prompt cache window for the main agent, if you leave yo

chandureddyvari · Hacker News

Claude has gotten noticeably worse for me too. It goes into long exploration loops for 5+ minutes even when I point it to the exact files to inspect. Then 30 minutes later I hit session limits. Three sessions like that in a day, and suddenly 25% of the weekly limit is gone.I ended up buying the $100

SkyPuncher · Hacker News

I skimmed the issue. No wonder Anthropic closes these tickets out without much action. That’s just a wall of AI garbage.Here’s what I’ve done to mostly fix my usage issues:* Turn on max thinking on every session. It save tokens overall because I’m not correcting it of having it waste energy on dead

geeky4qwerty · Hacker News

I'm afraid the music may be slowly fading at this party, and the lights will soon be turned on. We may very well look back on the last couple years as the golden era of subsidized GenAI compute.For those not in the Google Gemini/Antigravity sphere, over the last month or so that community

jameson · Hacker News

I'm noticing a fair number of degradation of Claude infrastructure recently and makes me wonder why they can't use Claude to identify or fix these issues in advance?It seems a counter intuitive to Anthropic's message that Claude uncovered bugs in open source project*.[*] https:/&

// share this

// get daily digest

Top 10 dev stories every morning at 8am UTC. AI-curated. Retro terminal HTML email.