Uber's COO Says the AI Token Bill No Longer Pencils Out

4 min read 1 source clear_take
├── "Enterprise AI spending has outpaced ROI and needs financial discipline"
│  └── Andrew Macdonald (Business Insider) → read

Uber's COO publicly stated it is getting harder to justify the money spent on AI tokens, flagging that 'tokenmaxxing' — piling on context, retries, and agentic loops — has outrun the company's ROI model. Coming from an operator known for squeezing basis points out of every ride, this is a direct signal that enterprise AI economics are not penciling out at current usage patterns.

├── "The 'tokens are getting cheaper' assumption is breaking down"
│  └── top10.dev editorial (top10.dev) → read below

Argues the two-year working assumption that token prices fall faster than usage grows is no longer true on the frontier tier — Sonnet 4.5 and GPT-5-class models cost roughly what predecessors cost, with the price war confined to cheap-tier models that don't power the agentic workflows ballooning bills. Combined with non-linear token multiplication from agentic patterns, this inverts the 'don't optimize' default that engineering orgs have been operating under.

└── "Engineering culture and weak finance pushback are driving the overspend"
  └── @HN commenters (FinOps leads) (Hacker News, 153 pts) → view

Top comments describe the same pattern across many shops: Claude and GPT bills 10x'd in six months with no matching revenue, engineers wrapping every internal tool in an agent loop partly for resume value, and finance teams lacking the vocabulary to push back. The viewpoint frames the problem as organizational and cultural, not purely technical.

What happened

In a Business Insider piece picked up on Hacker News (153 points and climbing), Uber COO Andrew Macdonald said the quiet part out loud: the company is finding it harder to justify the money it spends on AI inference. The internal slang, per the article, is *tokenmaxxing* — the practice of throwing more context, more retries, more chain-of-thought, and more agentic loops at every problem because tokens are cheap until, suddenly, they aren't.

Macdonald didn't announce a freeze or a pullback. He flagged a discipline problem. Uber, a company whose entire business is built on second-by-second optimization of supply, demand, and pricing, is publicly saying its AI spend has outrun its ROI model. That admission, from an operator famous for squeezing basis points out of every ride, is the most honest signal the industry has had all year about where enterprise AI economics actually sit.

The HN thread reads exactly like you'd expect: half the top comments are FinOps leads describing the same pattern at their own shops — Claude and GPT bills that 10x'd in six months with no corresponding revenue line, engineering teams that wrap every internal tool in an agent loop because it's the new resume item, and finance teams who don't yet have the vocabulary to push back.

Why it matters

For two years the working assumption inside most engineering orgs has been: tokens are getting cheaper faster than we can spend them, so don't optimize. That assumption is breaking on three fronts at once.

First, model prices stopped falling on the frontier. Sonnet 4.5 and GPT-5-class models cost roughly what their predecessors cost, and the cheap-tier models (Haiku, Mini, Flash) are now where the price war lives — but those aren't the models powering the agentic workflows ballooning the bill. The frontier tier is where the reasoning happens, and the frontier tier is flat or up.

Second, agentic patterns multiply token usage non-linearly. A single user request that used to be one 2,000-token call is now a planner, three tool calls, a critic, and a re-plan — easily 40,000 tokens, often more. Multiply that by a few million daily requests and you don't have a feature, you have a P&L event. Uber, with its volume, hits this wall first; everyone else hits it on a lag.

Third, the unit economics conversation is finally happening at the COO level, not the staff-engineer level. When the person who owns operating margin starts naming the line item in interviews, internal budget reviews follow within a quarter. Expect Q3 earnings calls to be full of "AI efficiency" framing — which is the polite version of "we found out our agents were running for-loops."

The community reaction on HN is worth quoting in spirit: the top-voted replies aren't defending the spend, they're swapping techniques to bring it down. Prompt caching hit rates as a KPI. Per-feature token budgets enforced at the SDK layer. Replacing reasoning models with fine-tuned small models for the 80% of traffic that doesn't need them. The fact that the discourse has shifted from "which model is smartest" to "which model is cheapest for this exact subtask" is the real story.

And there's a structural irony here. Uber spent the 2010s being the company that taught a generation of operators to ruthlessly cost-account every variable input. It is now the company telling the industry that AI vendors have, for a brief moment, escaped that scrutiny. That moment is closing.

What this means for your stack

If you ship anything that calls an LLM in production, three things should change this quarter.

Instrument before you optimize. Most teams cannot answer the question "what does a single user session cost us in tokens?" Add per-request token accounting tagged by feature, user tier, and model. If you're on Anthropic, turn on prompt caching and track cache hit rate as a first-class metric — a 70%+ hit rate on system prompts is achievable and cuts input costs by 90% on cached portions. If you're on OpenAI, the equivalent is the cached input pricing tier; same idea, same discipline.

Right-size by subtask, not by app. The default of "we standardized on Sonnet" or "we standardized on GPT-5" is the tokenmaxxing pattern Macdonald is complaining about. Classify your calls: which ones genuinely need frontier reasoning, and which ones are formatting, extraction, or classification that Haiku, Mini, or a fine-tuned 8B open model handles for a tenth the cost? The teams winning on AI margin in 2026 aren't the ones with the smartest model — they're the ones with the most boring routing layer.

Put a budget on the agent loop. Agentic frameworks default to unbounded iteration with a soft cap of 10 or 20 steps. In production that's a runaway-train waiting to happen. Hard-cap tool calls per session, log every loop that hits the cap, and treat those logs as bug reports against your prompt design, not as "the agent is thinking hard."

Looking ahead

The Macdonald quote isn't the top of the cycle — it's the moment the cycle becomes legible. Expect a wave of "AI cost transparency" tooling (think Datadog-for-tokens), expect model vendors to roll out more aggressive caching and batch discounts to keep the largest accounts from defecting, and expect open-weight models to claw back share specifically in the high-volume, low-reasoning slots where the math is most embarrassing. The companies that come out of 2026 with healthy AI margins will be the ones that, like Uber circa 2014 with driver supply, treated every token as a variable cost from day one — not the ones still measuring success in benchmark scores.

Hacker News 233 pts 304 comments

Ubers COO says its getting harder to justify the money spent on AI tokenmaxxing

→ read on Hacker News
dmazzoni · Hacker News

I remember at Google at around 2007 - 2009, as Google was massively expanding its data centers, there was a lot of unused capacity, especially during off-hours. Any engineer could run as many jobs as they wanted at zero priority, which means the job would be first in line to be killed if a more impo

delichon · Hacker News

There is little new under the big fusion reactor in the sky. I just read a chapter in James Glieck's "The Information" about tokenmaxxing in the telegraphy industry. There used to be a big market for code books to reduce the per-character charges for sending telegrams. Compression was

FartyMcFarter · Hacker News

If any company announces that they use token consumption as an employee performance signal, for me that's close to a red flag to stay away from that company.No company with good engineering leadership should act like this is remotely a good idea.

mrkeen · Hacker News

I always used to wonder this about software stacks even prior to LLMs, but it seems more relevant now somehow:When will Uber (or your favourite company) be 'done'? They've been writing software for 16 years.They match drivers to passengers. More software isn't going to increase t

crorella · Hacker News

Tokenmaxxing makes no sense, it is akin to write extremely inefficient SQL / Spark Jobs, full of cartesian joins, ultra skewed datasets, etc, just for the sake of using as much compute / memory / IO as possible.This always happens when the metric becomes the goal, companies should nur

// share this

// get daily digest

Top 10 dev stories every morning at 8am UTC. AI-curated. Retro terminal HTML email.