Argues that the industry's 'models get cheaper forever' narrative ignores how agentic architectures invert the cost curve. As models get smarter, they get more autonomy, leading to more tool calls, longer context windows, and verification loops — pushing total tokens per useful output up faster than per-token prices come down.
Fortune's reporting frames this as especially damning because Microsoft is the lead vendor pushing Copilot and agent products — they sell the picks and shovels and are admitting the picks are heavier than the gold. For a non-trivial slice of enterprise workflows, per-task inference now exceeds the fully-loaded labor cost of a junior knowledge worker in low-cost geographies.
Submitted the Fortune story to Hacker News, where it gathered 89 points — signaling the developer community sees Microsoft's internal cost finding as a significant data point worth surfacing, given Microsoft's central position in the agentic AI rollout.
Highlights the technical mechanism: a single Copilot chat turn uses 4-8K tokens, but a multi-step agent task burns 200K-1M tokens because every tool call re-sends the full conversation history, tool schemas, and intermediate results. The cost problem isn't model pricing — it's that agentic loops fundamentally multiply token consumption per task.
Fortune reported on May 22 that Microsoft — the company most aggressively pushing Copilot and agent-based products — has run into a math problem with its own AI stack. Internally, the cost of running agentic workflows on enterprise tasks is, in several measured categories, exceeding what it would cost to pay a human employee to do the same work. The piece centers on token economics: agents don't make one inference call, they make dozens or hundreds per task, each one resending the growing context window.
The specifics matter. A single Copilot chat turn might consume 4-8K tokens. An agent executing a multi-step task — read email, summarize, draft reply, check calendar, schedule, confirm — can burn through 200K-1M tokens before it returns a final answer, because every tool call re-serializes the conversation history plus tool schemas plus intermediate results. Microsoft's finding is that for a non-trivial slice of enterprise workflows, the per-task inference bill has crossed the per-task fully-loaded labor cost of a junior knowledge worker in low-cost geographies.
This isn't a leaked memo or a whistleblower. It's the company that sells the picks and shovels acknowledging the picks are heavier than the gold.
The AI industry has been selling a very specific story for two years: models get cheaper on a Moore's-Law-like curve, agents replace humans, margins expand forever. The first half is true — GPT-4-class intelligence is ~95% cheaper than it was in 2023. The second half assumed token consumption per task would stay roughly constant. It hasn't. It's exploded.
Agentic architectures are the inverse of the cost-down story: as models get smarter, we give them more autonomy, which means more tool calls, longer context, more retries, and more verification loops — and total tokens per useful output go up faster than per-token prices come down. An Anthropic researcher put it bluntly on Twitter last month: "We made inference 100x cheaper and agents 1000x more expensive."
The Microsoft data is the first real public datapoint from an at-scale operator. It aligns with what every founder building agents already knows but rarely says out loud. Anyone who's watched a LangGraph trace knows the pattern: the agent makes a tool call, the model re-reads the entire conversation including the 40K-token system prompt, makes another call, re-reads everything plus the new tool output, repeat. Caching helps. It doesn't solve it.
The deeper issue is that human labor has a structural advantage agents don't: humans hold state in their heads for free. A junior analyst doesn't "re-read the brief" before every keystroke. An agent does. Until somebody cracks persistent, cheap working memory at the model level — and KV-cache reuse is a partial answer, not a complete one — agents will keep paying a tax that biological reasoners don't.
The community reaction on Hacker News (89 points, mostly skeptical-of-the-skeptics) split predictably. One camp: "of course, this is what we've been telling our CFOs." Another: "the curve will bend, give it 18 months." Both can be right. The curve will bend. The bend probably won't be fast enough to save the agent-replaces-everyone narrative on the timeline the 2025 funding rounds priced in.
If you're building anything agentic, your unit economics now depend more on context engineering than on which frontier model you pick. A few concrete moves that actually move the needle:
Aggressive context pruning. Most agent frameworks default to passing the full conversation history on every call. Don't. Summarize aggressively at every N steps, and feed the agent only the artifacts it provably needs for the next decision. A 90% reduction in context size is often a 90% reduction in cost with negligible accuracy loss.
Tiered model routing. Use a Haiku-class model for the planner and a Sonnet-class for the executor, not the other way around. Most of the token volume in an agent loop is mechanical routing decisions, not the actual work. Save the expensive model for the steps that need it.
Prompt caching, used correctly. Anthropic's cache cuts repeated context to 10% of cost, but only if you structure your prompts so the cacheable prefix is genuinely stable. Most teams put session-specific data in the system prompt and torch their cache hit rate without realizing it.
Replace tools with code where possible. Every tool call is a round-trip with full context resend. A deterministic function that the model writes once and reuses is dramatically cheaper than ten tool calls accomplishing the same thing. Anthropic's recent push toward code execution as a first-class agent primitive is partly cost-driven, not just capability-driven.
Measure cost per successful task, not cost per token. A cheaper model that needs three retries is more expensive than the right model first time. Your dashboard should show $/completion, not $/1K-tokens.
The Microsoft story is a milestone, not a verdict. Agent unit economics will improve — the question is whether they improve fast enough to justify the valuations being assigned to companies whose entire pitch is "we replace knowledge workers at scale." Watch for three things: per-token prices on Sonnet-class models (likely to drop another 50% in 12 months), context-window engineering primitives (Anthropic's memory tool, OpenAI's stateful Assistants, Google's implicit caching all converging here), and honest disclosure from operators about real total cost of ownership rather than the cherry-picked demo numbers. Until then, treat any pitch deck claiming sub-human cost per task with the same skepticism you'd apply to a self-driving demo in perfect weather: probably true under controlled conditions, probably not yet true in production.
The premise of this article is incorrect - MS isn't cancelling Claude code internal usage because of AI costs too much, they're cancelling it because GitHub copilot is the compete product and they want their employees to use their product.It's the same reason Teams got so much attenti
The 'tokenmaxxing' trend is probably the more inane ideas emanating out of this whole AI wave. It goes in the opposite direction of efficiency and productivity maximization. Yet, it has wide acceptance.
Literally nowhere in the article does Microsoft report AI is more expensive than paying human employees.
Burning tokens is as easy as throwing dollars in a furnace. Token usage is not a good measure of productivity. Problem is nobody has really been able to figure out how to gauge productive AI engagement. Are your developers maximizing productivity or are they burning tokens or resisting change.
Top 10 dev stories every morning at 8am UTC. AI-curated. Retro terminal HTML email.
The title seems misleading, and reading the article explains the reason more clearly. There's nonsense OKR's and objectives at these companies to burn as many tokens as possible. It turns out that when you make a metric out of token usage, it unsurprisingly ends up becoming extremely expen