Reports that Amazon employees are deliberately inflating AI token consumption through trivial prompts and unnecessary code rewrites to satisfy management tracking of Q Developer usage. The article frames this as a predictable outcome when usage metrics become part of performance conversations rather than measuring actual productivity gains.
Explicitly invokes Goodhart's Law to argue that Amazon is measuring the wrong thing entirely. The editorial distinguishes between measuring whether developers use AI tools versus whether AI tools make developers more productive, arguing the gap between these two questions is precisely where tokenmaxxing thrives.
Argues that AI assistant value varies dramatically across tasks, codebases, and developer skill levels. A senior engineer in a well-understood codebase may get marginal value while a junior developer exploring an unfamiliar API might find it transformative — blanket adoption mandates ignore this reality and punish those who have already optimized their workflows.
Frames tokenmaxxing not as employee misbehavior but as a rational response to top-down pressure. When Amazon leadership makes AI adoption a strategic priority and managers track usage frequency in performance conversations, employees predictably optimize for the metric being measured — running trivial prompts and generating unnecessary boilerplate to register consumption on dashboards.
Amazon employees have coined a term for a practice that's becoming endemic inside the company: "tokenmaxxing." The word — a mashup of AI token consumption and the internet's -maxxing suffix for obsessive optimization — describes the deliberate inflation of AI tool usage metrics to satisfy internal pressure to adopt AI coding assistants.
The pressure flows downhill. Amazon leadership has made AI adoption a strategic priority, and managers are tracking how frequently their teams use tools like Amazon Q Developer (the company's AI coding assistant, formerly CodeWhisperer). When usage metrics become part of performance conversations, employees respond rationally: they game the numbers. That means running trivial prompts, asking AI tools to rewrite code that doesn't need rewriting, and generating boilerplate through AI that could be typed faster by hand — all to register token consumption that shows up on dashboards.
The story, reported by Ars Technica and generating significant discussion on Hacker News (167+ points), captures a growing tension inside Big Tech: the gap between executive AI mandates and the messy reality of developer workflows.
This is a textbook case of Goodhart's Law — "when a measure becomes a target, it ceases to be a good measure." Amazon isn't measuring whether AI tools make developers more productive. They're measuring whether developers *use* AI tools. These are fundamentally different questions, and the gap between them is where tokenmaxxing lives.
The core problem is that AI coding tools deliver uneven value across different tasks, codebases, and developer skill levels. A senior engineer working in a well-understood codebase with strong conventions might get marginal value from an AI assistant. A junior developer exploring an unfamiliar API might find it transformative. Mandating uniform adoption ignores this reality and penalizes the engineers who've already optimized their workflows.
The practice also creates a false signal problem for leadership. If executives see rising token consumption across the org, they'll conclude their AI strategy is working. They'll double down on tooling investments, set more aggressive adoption targets, and cite internal metrics in earnings calls. Meanwhile, the actual productivity impact might be flat or even negative — developers are now spending time feeding the metrics machine instead of writing production code.
There's a deeper irony here. Amazon built its management culture on data-driven decision-making — the famous "mechanisms" and metrics that replace trust with measurement. Tokenmaxxing is what happens when that culture collides with a technology whose value resists simple quantification. You can measure tokens consumed, completions accepted, and lines of AI-generated code merged. You can't easily measure whether the developer *thought better* because the AI suggested an approach they hadn't considered, or whether they shipped a day earlier because autocomplete handled the boilerplate.
Amazon is not alone in this. Reports from multiple large tech companies suggest similar dynamics. Google has pushed Gemini integration across its development stack. Meta has mandated internal AI tool adoption. Microsoft, which owns GitHub Copilot, has every incentive to show wall-to-wall internal usage. The question is whether any of these companies are measuring the right thing.
The instinct to track AI adoption by usage volume comes from a reasonable place. Leadership needs *some* signal that expensive tool investments are reaching developers. But volume metrics create perverse incentives: they reward the developer who asks the AI to regenerate a function twelve times over the one who gets the right answer on the first prompt.
Better metrics exist, but they're harder to instrument:
- Completion acceptance rate — what percentage of AI suggestions do developers actually keep? A high acceptance rate suggests the tool is generating useful output. A low rate with high volume is a red flag. - Time-to-merge for AI-assisted PRs — are PRs that include AI-generated code moving through review faster or slower? - Developer satisfaction surveys — blunt but useful. Ask developers whether the tools help, and believe the answers. - Opt-in vs. opt-out rates — if the tool is genuinely useful, you don't need to mandate it. Track organic adoption instead of forced compliance.
The uncomfortable truth for management is that the best AI adoption metric might be no metric at all. Make the tools available, make them good, remove friction from the onboarding experience, and let developers self-select. The engineers who find value will use them. The ones who don't will stop. Mandating usage doesn't change the tool's value — it just obscures the signal.
If you're a team lead or engineering manager, this is a cautionary tale about how you introduce AI tools. Rolling out Copilot, Cursor, or any AI assistant with usage targets attached will produce tokenmaxxing in your org too. The goal should be reducing friction — make the tool easy to try, easy to configure, and easy to ignore when it's not helping. Measure outcomes (velocity, bug rates, developer satisfaction), not inputs (tokens consumed).
If you're an individual contributor at a company with AI adoption mandates, the tokenmaxxing phenomenon is worth understanding even if you don't participate. Your org's AI metrics are probably inflated, which means leadership decisions based on those metrics — more tooling investment, workflow changes, hiring decisions — may be built on false signals. Be the person in the room who asks what the acceptance rate looks like, not just the consumption rate.
If you're building AI developer tools, this should inform your product metrics. Customers will ask you for usage dashboards to justify their spend. Give them usage numbers, but also give them quality signals — acceptance rates, time savings estimates, opt-in trends. Help your buyers avoid the Goodhart trap, or their developers will route around your tool the moment management stops watching.
Tokenmaxxing is a symptom, not the disease. The disease is the assumption that AI tool adoption is inherently good and that more usage equals more value. As AI coding tools mature and genuinely improve, the best evidence will be developers choosing to use them without being told to. Until then, every organization tracking AI adoption by volume is measuring its own credulity. The companies that figure out how to measure AI's *impact* instead of its *consumption* will be the ones that actually capture the productivity gains everyone is chasing.
I swear the industry is being Garry Tanned.Senior management let go our localisation staff. Now they want us to use AI to translate. They still want manual review.We use Github Copilot at work, we get a measly 300 requests with the budget to go over if necessary. Opus 4.7 or GPT 5.5 would eat all of
Saw a good joke on twitter about it. Something like:"You spent $23, over the $20 food limit. Be more careful next time. You spent $600 on tokens, $200 more than the average. Congratulations!"
I work at Amazon (standard disclaimer: just sharing my own experience, not an official spokesperson, etc.)I can't say that this isn't happening, but at least the parts of the company I get visibility into, what the article describes isn't my experience. There is a lot of interest in u
It is damn fascinating to see just how many (big, serious) organizations are creating unnecessary internal strife over this.One of my favorite heuristics/quotes applies here: "no matter how good the strategy, occasionally consider the result."Want to know if AI is working for your org
Top 10 dev stories every morning at 8am UTC. AI-curated. Retro terminal HTML email.
The fact that management signed off on measuring AI use through token usage shows how incompetent management really is, including in allegedly technical conmpanies like Amazon. Tokenmaxxing was an entirely expected and rational response. IOW You measure employees in stupid ways, you're going to