Uber's COO Says the AI Token Bill No Longer Pencils Out

What happened

In a Business Insider piece picked up on Hacker News (153 points and climbing), Uber COO Andrew Macdonald said the quiet part out loud: the company is finding it harder to justify the money it spends on AI inference. The internal slang, per the article, is *tokenmaxxing* — the practice of throwing more context, more retries, more chain-of-thought, and more agentic loops at every problem because tokens are cheap until, suddenly, they aren't.

Macdonald didn't announce a freeze or a pullback. He flagged a discipline problem. Uber, a company whose entire business is built on second-by-second optimization of supply, demand, and pricing, is publicly saying its AI spend has outrun its ROI model. That admission, from an operator famous for squeezing basis points out of every ride, is the most honest signal the industry has had all year about where enterprise AI economics actually sit.

The HN thread reads exactly like you'd expect: half the top comments are FinOps leads describing the same pattern at their own shops — Claude and GPT bills that 10x'd in six months with no corresponding revenue line, engineering teams that wrap every internal tool in an agent loop because it's the new resume item, and finance teams who don't yet have the vocabulary to push back.

Why it matters

For two years the working assumption inside most engineering orgs has been: tokens are getting cheaper faster than we can spend them, so don't optimize. That assumption is breaking on three fronts at once.

First, model prices stopped falling on the frontier. Sonnet 4.5 and GPT-5-class models cost roughly what their predecessors cost, and the cheap-tier models (Haiku, Mini, Flash) are now where the price war lives — but those aren't the models powering the agentic workflows ballooning the bill. The frontier tier is where the reasoning happens, and the frontier tier is flat or up.

Second, agentic patterns multiply token usage non-linearly. A single user request that used to be one 2,000-token call is now a planner, three tool calls, a critic, and a re-plan — easily 40,000 tokens, often more. Multiply that by a few million daily requests and you don't have a feature, you have a P&L event. Uber, with its volume, hits this wall first; everyone else hits it on a lag.

Third, the unit economics conversation is finally happening at the COO level, not the staff-engineer level. When the person who owns operating margin starts naming the line item in interviews, internal budget reviews follow within a quarter. Expect Q3 earnings calls to be full of "AI efficiency" framing — which is the polite version of "we found out our agents were running for-loops."

The community reaction on HN is worth quoting in spirit: the top-voted replies aren't defending the spend, they're swapping techniques to bring it down. Prompt caching hit rates as a KPI. Per-feature token budgets enforced at the SDK layer. Replacing reasoning models with fine-tuned small models for the 80% of traffic that doesn't need them. The fact that the discourse has shifted from "which model is smartest" to "which model is cheapest for this exact subtask" is the real story.

And there's a structural irony here. Uber spent the 2010s being the company that taught a generation of operators to ruthlessly cost-account every variable input. It is now the company telling the industry that AI vendors have, for a brief moment, escaped that scrutiny. That moment is closing.

What this means for your stack

If you ship anything that calls an LLM in production, three things should change this quarter.

Instrument before you optimize. Most teams cannot answer the question "what does a single user session cost us in tokens?" Add per-request token accounting tagged by feature, user tier, and model. If you're on Anthropic, turn on prompt caching and track cache hit rate as a first-class metric — a 70%+ hit rate on system prompts is achievable and cuts input costs by 90% on cached portions. If you're on OpenAI, the equivalent is the cached input pricing tier; same idea, same discipline.

Right-size by subtask, not by app. The default of "we standardized on Sonnet" or "we standardized on GPT-5" is the tokenmaxxing pattern Macdonald is complaining about. Classify your calls: which ones genuinely need frontier reasoning, and which ones are formatting, extraction, or classification that Haiku, Mini, or a fine-tuned 8B open model handles for a tenth the cost? The teams winning on AI margin in 2026 aren't the ones with the smartest model — they're the ones with the most boring routing layer.

Put a budget on the agent loop. Agentic frameworks default to unbounded iteration with a soft cap of 10 or 20 steps. In production that's a runaway-train waiting to happen. Hard-cap tool calls per session, log every loop that hits the cap, and treat those logs as bug reports against your prompt design, not as "the agent is thinking hard."

Looking ahead

The Macdonald quote isn't the top of the cycle — it's the moment the cycle becomes legible. Expect a wave of "AI cost transparency" tooling (think Datadog-for-tokens), expect model vendors to roll out more aggressive caching and batch discounts to keep the largest accounts from defecting, and expect open-weight models to claw back share specifically in the high-volume, low-reasoning slots where the math is most embarrassing. The companies that come out of 2026 with healthy AI margins will be the ones that, like Uber circa 2014 with driver supply, treated every token as a variable cost from day one — not the ones still measuring success in benchmark scores.

Uber's COO Says the AI Token Bill No Longer Pencils Out

// tldr

// viewpoints

// deep dive

What happened

Why it matters

What this means for your stack

Looking ahead

// read from source

Ubers COO says its getting harder to justify the money spent on AI tokenmaxxing

// community takes

Uber's COO Says the AI Token Bill No Longer Pencils Out

// tldr

// viewpoints

// deep dive

What happened

Why it matters

What this means for your stack

Looking ahead

// read from source

Ubers COO says its getting harder to justify the money spent on AI tokenmaxxing

// community takes

// share this