Why Your Claude Code Max Quota Burns Out in 90 Minutes

What happened

On April 9, a Claude Code user filed [issue #45756](https://github.com/anthropics/claude-code/issues/45756) documenting something that immediately resonated with the community: their Pro Max 5x plan — Anthropic's $100/month tier with Opus access — burned through its entire quota in 90 minutes of what they described as "moderate usage." The issue drew 119 upvote reactions and 22 comments within four days, with multiple users on plans as expensive as Max 20x ($200/month) reporting similar experiences.

The original reporter, @molu0219, provided granular token accounting. During a 1.5-hour evening session using claude-opus-4-6 on WSL2, their main session made just 222 API calls with a peak context of 182,302 tokens. That alone shouldn't drain a Max plan. But three background sessions — left open but not actively used — racked up 691 additional calls and 103.9 million cache_read tokens. Background sessions that users thought were idle consumed 78% of the post-reset quota window.

The issue was initially auto-flagged as a duplicate and nearly closed, earning 91 downvote reactions on the duplicate detection comment — a community revolt that forced it back open for proper investigation.

Why it matters

This story matters because it exposes a fundamental mismatch between how developers *think* LLM coding tools consume resources and how they *actually* do.

### The 1M context window tax

Claude Code's 1M token context window is a headline feature. But each API call against a session at near-max context is enormously expensive. When a session goes stale (idle for over an hour), Claude Code's prompt cache expires, and resuming that session triggers a full cache miss — sending up to 966,000 tokens as new cache creation in a single call. That's not a bug; it's the physics of how prompt caching works with large context windows. But it's a physics that users aren't equipped to reason about.

@bcherny from the Claude Code team confirmed this in a pinned response: "Prompt cache misses when using 1M token context window are expensive. Since Claude Code uses a 1 hour prompt cache window for the main agent, if you leave your computer for over an hour then continue a stale session, it's often a full cache miss."

### The ghost session problem

The more insidious finding is the background session drain. @molu0219's three idle sessions — named things like "token-analysis" and "career-ops" — weren't being actively used but continued making API calls. The token-analysis session alone made 296 calls consuming 57.6 million cache_read tokens. Developers who open multiple Claude Code terminals (a common workflow when juggling projects) are unknowingly multiplying their quota burn rate.

### Community forensics settle the cache debate

The original hypothesis was that cache_read tokens might be counted at full rate against quota rather than the expected discounted rate. This would have been a pricing scandal. A community member, @cnighswonger, built a custom interceptor (claude-code-cache-fix) and analyzed 1,500+ API calls across six quota reset windows to test three models: cache_read at 0.0x, 0.1x, and 1.0x. The statistical winner, with a coefficient of variation of just 34.4% across windows, was the 0.0x model — cache_read tokens don't meaningfully count toward quota at all.

This is actually consistent with Anthropic's published pricing. But it deepened the mystery: if cache reads are free, why does quota drain so fast? The answer came from the interaction between cache *creation* (which does count, at full rate) and cache *misses* (which force re-creation). Every time you resume a stale session with a large context, you're paying full price to rebuild the cache — and with a 1M context window, that single cache creation event can consume a meaningful fraction of your hourly quota.

### The pricing transparency gap

What makes this particularly frustrating for users is the opacity. Claude Code doesn't surface real-time quota consumption, doesn't warn when background sessions are burning tokens, and doesn't alert users when they're about to resume a session that will trigger an expensive cache miss. The token accounting in @molu0219's report required manual API call logging and spreadsheet analysis — tooling that Anthropic itself should be providing.

The 119 upvotes on this issue aren't from people who don't understand token pricing. They're from paying customers on Anthropic's most expensive consumer plans who feel like they're flying blind.

What this means for your stack

Close your idle sessions. This is the single highest-leverage change. If you have multiple Claude Code terminals open across projects, each one is potentially making background API calls. Close what you're not actively using. The cost of `cd`-ing into a different project directory is trivial compared to the cost of a ghost session draining your quota.

Use `/clear` before resuming work after breaks. If you step away from a Claude Code session for more than an hour, run `/clear` before continuing. Yes, you lose context. But the alternative is a cache miss that sends your entire context window — potentially 966k tokens — as a fresh cache creation event. The `/clear` command is now the most cost-effective keystroke in Claude Code.

Consider whether you actually need 1M context. Anthropic is reportedly considering defaulting new sessions to 400k instead of 1M. You can get ahead of this by being intentional about context size. Most coding tasks don't need 1M tokens of context, and the quota math changes dramatically at smaller windows. A cache miss on a 400k context session costs roughly 2.4x less than one on a 966k session.

If you're evaluating AI coding tools for a team, factor in quota predictability alongside raw capability. A tool that's 20% less capable but has transparent, predictable resource consumption may deliver more net value than one that's brilliant for 90 minutes and then locked out.

Looking ahead

Anthropic's response — shipping UX nudges and exploring a context window default change — is the right direction but insufficient. The real fix is quota transparency: a real-time token budget dashboard, background session warnings, and cache miss cost previews before they happen. The community has already built third-party interceptors to get this visibility. When your paying users have to reverse-engineer your quota system with statistical analysis and custom proxies, the product has a legibility problem. The 1M context window is a powerful feature, but power without a gauge is just a way to run out of gas faster.

Why Your Claude Code Max Quota Burns Out in 90 Minutes

// tldr

// viewpoints

// deep dive

What happened

Why it matters

What this means for your stack

Looking ahead

// read from source

Pro Max 5x Quota Exhausted in 1.5 Hours Despite Moderate Usage

// community takes

Why Your Claude Code Max Quota Burns Out in 90 Minutes

// tldr

// viewpoints

// deep dive

What happened

Why it matters

What this means for your stack

Looking ahead

// read from source

Pro Max 5x Quota Exhausted in 1.5 Hours Despite Moderate Usage

// community takes

// share this