Is AI really more expensive than hiring people?

For many tasks as deployed today, yes. Fortune reported in May 2026 that Microsoft's internal reports show AI agents costing more than human employees for many tasks, and Nvidia VP Bryan Catanzaro said his team's compute costs are "far beyond the costs of the employees." The caveat: those numbers describe AI deployed under usage mandates without per-task cost measurement. Well-routed, well-measured AI on economically suitable tasks can be far cheaper than human labor. The variable is measurement, not the technology.

What is tokenmaxxing?

Running unnecessary AI tasks purely to inflate usage metrics. The term emerged when Amazon employees, ranked by an internal AI-usage leaderboard called KiroRank against a target of 80%+ of developers using AI weekly, began assigning agents meaningless work to climb the rankings, directly raising Amazon's compute costs. Amazon deprecated the leaderboard in late May 2026, with SVP Dave Treadwell telling staff not to use AI just for the sake of using AI.

Why did Amazon scrap its AI usage leaderboard?

Because the leaderboard was gamed exactly as Goodhart's law predicts: once usage became a target, employees maximized usage rather than value, running pointless agent tasks that inflated both their scores and the company's bill. In a metered system, gamed usage metrics convert directly into invoice line items, which makes usage-based KPIs uniquely expensive among vanity metrics.

What is the Tokenomics Foundation?

A Linux Foundation initiative announced June 3, 2026 to establish open standards, benchmarks, and best practices for AI infrastructure economics, in partnership with the FinOps Foundation. Initial supporters include Microsoft, Google Cloud, IBM, JPMorganChase, Oracle, Salesforce, SAP, ServiceNow, Accenture, KPMG, Booking.com, and Flexera. Its premise is that tokens have become the new unit of technology spend and need the same cost-management discipline cloud spend got from FinOps.

Will token prices keep falling?

Both directions are true at once. Gartner projects inference on trillion-parameter models will cost roughly 90% less by 2030, but per-token prices leveled off after the 2023-2025 decline and new frontier models are pricing upward (Claude Fable 5 launched at double Opus 4.8). Goldman Sachs projects token consumption growing 24x to 120 quadrillion tokens per month by 2030, so falling unit prices do not imply falling bills. Gartner's own analyst warns against confusing cheap commodity tokens with affordable frontier reasoning.

What should companies measure instead of AI usage?

Cost per task (tokens consumed times effective rates, per task, per model, per team), value per task (even crude proxies like time saved or tickets resolved), and the trend of that ratio over time. Usage volume rewards consumption; cost-per-outcome rewards efficiency. Instrumentation is incentive design: a dashboard showing tokens consumed produces tokenmaxxing, while one showing cost per resolved task produces optimization.

Tokenmaxxing: Microsoft Says AI Costs More Than Its People, Amazon Killed Its Usage Leaderboard, and the Adoption Era Just Ended

Name: UsageBox
Rating: 4.8 (50 reviews)
Author: UsageBox

TL;DR (June 2026): The adoption-at-all-costs era of enterprise AI ended in about three weeks. Fortune reported that Microsoft's internal reports show using AI agents is more expensive than paying human employees for many tasks, and Nvidia's VP of applied deep learning said the quiet part out loud: "For my team, the cost of compute is far beyond the costs of the employees." Amazon scrapped its internal AI usage leaderboard (KiroRank) after employees started "tokenmaxxing", running pointless agent tasks to climb the rankings while inflating the company's own compute bill. Sam Altman conceded token costs are becoming "an issue." And the Linux Foundation launched the Tokenomics Foundation, backed by Microsoft, Google Cloud, IBM, and JPMorganChase, to standardize AI cost management alongside the FinOps Foundation. The common thread: usage was the wrong metric all along. The right one is cost per outcome, and 2026 is the year everyone is forced to start measuring it.

For two years the enterprise AI playbook had one verb in it: adopt. Mandate the tools, set usage targets, rank the teams, celebrate the token counts. In May and June 2026 that playbook collapsed in public, at three of the biggest companies in the world, in three different ways, and the post-mortems all point at the same root cause. Nobody was measuring what the usage bought. Here is the reckoning in full, and the measurement discipline that replaces the leaderboard.

Act one: the bills arrive

Fortune's May 22 report put a headline on something platform teams had been whispering for months: Microsoft's internal cost reports show that for many tasks, running AI agents costs more than the humans they were meant to augment. The mechanics are not mysterious. An agentic workflow does not make one API call; it makes hundreds per task, each metered, each compounding, until the per-task total quietly crosses the per-task cost of the person. Microsoft, the company with perhaps the best visibility into enterprise AI economics anywhere, responded by canceling most of its direct Claude Code licenses after six months and consolidating engineers onto GitHub Copilot CLI, the same retreat we documented in the Tokenpocalypse.

The supporting cast made it a chorus. Bryan Catanzaro, Nvidia's VP of applied deep learning, on his own team: "the cost of compute is far beyond the costs of the employees." Uber's CTO Praveen Neppalli Naga told The Information the company burned its entire 2026 AI coding-tools budget in four months. And on June 4, Sam Altman, the person who sells the tokens, admitted that token costs are becoming "an issue." When the vendor of the commodity concedes the commodity is straining its buyers, the debate about whether there is a cost problem is over.

Act two: tokenmaxxing, or Goodhart's law at compute prices

The detail that makes Uber's budget burn instructive rather than just embarrassing: the company had actively incentivized AI usage through internal leaderboards ranking teams by tool usage. Spend was not a side effect of the strategy. Spend was the scoreboard.

Amazon ran the same experiment and got the same result, faster and funnier. An employee-built dashboard called KiroRank, living inside Amazon's Kiro developer platform, ranked staff by AI usage against a corporate target of more than 80% of developers using AI weekly. Employees responded the way employees respond to every metric that determines status: they optimized it. Workers began assigning AI agents meaningless tasks purely to inflate their usage scores, a behavior the internet immediately christened tokenmaxxing, burning real compute dollars to climb a fake ladder. In late May, senior VP Dave Treadwell deprecated the dashboard ("not a formal or approved tool"), with the company line now stressing best practices over raw adoption. The plea that escaped the building: "please don't use AI just for the sake of using AI."

This is Goodhart's law, the oldest trap in management: when a measure becomes a target, it stops being a measure. But there is a twist that makes the AI version uniquely expensive. A gamed OKR usually wastes time. A gamed usage metric in a metered system converts the gaming directly into invoice line items. Every fake task tokenmaxxed onto a leaderboard was billed at real per-token rates. Amazon's employees were, in effect, running a denial-of-budget attack on their own employer, with the employer's enthusiastic encouragement.

Act three: the institutions move in

The clearest sign a cost category has come of age is that it gets a standards body. On June 3 the Linux Foundation announced the Tokenomics Foundation, dedicated to open standards, benchmarks, and best practices for AI infrastructure economics, operating in partnership with the FinOps Foundation and unveiled to practitioners at FinOps X in San Diego the following week. The supporter list reads like the buy side of the entire problem: Microsoft, Google Cloud, IBM, JPMorganChase, Oracle, Salesforce, SAP, ServiceNow, Accenture, KPMG, Booking.com, Flexera.

The foundation's framing matches the receipts above almost word for word: tokens have become "the new unit of technology spend," per-token prices fell hard through 2023-2025 but have leveled off, with new frontier models pricing upward (Fable 5 at double Opus being June's exhibit A), while Goldman Sachs projects token consumption multiplying 24x by 2030 to 120 quadrillion tokens per month. Gartner's counterweight, that inference on a trillion-parameter model should cost ~90% less by 2030, comes with its own analyst's warning not to "confuse the deflation of commodity tokens with the democratization of frontier reasoning." Cheap tokens get cheaper; the tokens you actually want for hard work stay expensive, a dynamic we mapped in the subsidy question.

Why "AI vs the salary line" is the wrong fight, and what the right one is

The viral framing, "AI is more expensive than people!", deserves one honest caveat: it is a statement about badly measured AI, not about AI. The Microsoft reports describe a world where agents were deployed under usage mandates, with no per-task cost visibility, on workloads nobody had triaged for economic fit. Of course that loses to a salary. The same agent, routed to the cheapest capable model, fed a trimmed context, cached where repetitive, and pointed only at tasks where automation has real leverage, can be absurdly cheaper than any human alternative. The difference between the two outcomes is not the model. It is the measurement.

Which is the actual lesson of all three acts: usage volume is a vanity metric with a price tag. The replacement is unit economics, and it requires exactly three numbers per workload:

Cost per task, measured, not estimated: tokens consumed × effective rates, per task, per model, per team, the discipline from our cost-per-task benchmark. This is the number Microsoft's reports computed late, and the number a usage leaderboard cannot see at all.
Value per task, even crudely: minutes of human time saved, tickets resolved, PRs merged. Crude beats absent. Without it, the cost number has no denominator and every budget conversation is theology.
The ratio's trend, because both sides move: models get cheaper (Gartner), workloads get heavier (Goldman's 24x), tokenizers and verbosity drift, and frontier prices step upward. A ratio that clears the bar today can sink below it in a quarter without anyone changing a line of code.

The operational playbook follows directly. Kill usage-based KPIs and leaderboards; they are tokenmaxxing factories. Gate agent workloads on a cost-per-outcome budget, not an adoption target, with the per-engineer caps and burn-rate alerts from the FinOps playbook. Route ruthlessly, the router pattern's 45-85% savings are precisely the gap between mandated usage and measured usage. And put the meter where the incentive is: if a dashboard shows tokens consumed, someone will maximize tokens consumed; if it shows cost per resolved task, someone will minimize it. Instrumentation is incentive design.

The honest take

It is tempting to read June 2026 as the AI backlash arriving on the balance sheet. The truer reading is less dramatic and more useful: this is what the end of the pilot phase looks like. Pilots are judged by adoption; production is judged by unit economics, and three of the world's most sophisticated technology organizations just demonstrated, expensively and in public, what happens when you carry pilot-phase metrics into production-phase spend. The companies that come out of this ahead will not be the ones that use the most AI or the least. They will be the ones that can answer, per task and per team, what a unit of AI work costs and what it returns, the question a usage leaderboard was structurally incapable of asking. The Tokenomics Foundation exists because that answer is about to be demanded of everyone. Better to build the meter before the board asks.

Key Topics

•tokenmaxxing
•AI cost vs human cost
•Tokenomics Foundation
•AI unit economics
•usage metrics
•Goodhart's law
•Microsoft
•Amazon KiroRank
•AI FinOps
•June 2026

Next Steps

Measure cost per outcome, not usage volume, with UsageBox Browse all articles

←

→

Explore More Articles

Discover our complete collection of usage-based billing guides and implementation patterns.

View all articles

Tokenmaxxing: Microsoft Says AI Costs More Than Its People, Amazon Killed Its Usage Leaderboard, and the Adoption Era Just Ended

Act one: the bills arrive

Act two: tokenmaxxing, or Goodhart's law at compute prices

Act three: the institutions move in

Why "AI vs the salary line" is the wrong fight, and what the right one is

The honest take

Key Topics

Next Steps

Related Articles

The AI Usage Meter Is Now a Management Instrument: Every Token Your Team Spends Is a Tracked, Attributable Signal (2026)

Claude Fable 5 Lasted 72 Hours: The Government Pulled It, and the Refunds Are Messy

What Claude Code Actually Costs in 2026: Per Token, Per Month, and Two June Deadlines

Explore More Articles