The argument in one line: Sometime in 2026 the entire industry quietly agreed on a new unit of AI commerce, and it is not the subscription and it is not "unlimited." It is the spend cap. Uber capped its own engineers, vendors shipped budget controls, and "unlimited" got exposed as what it always was: a promise that lasted until the unit economics said otherwise. The open question is no longer whether to cap. It is where you set the cap and whether it enforces or just warns.
One Hacker News headline did more to clarify AI pricing in 2026 than a year of vendor blog posts: "Uber's $1,500/month AI limit is a useful signal for AI tool pricing." It hit 617 points and 760 comments in a day. The signal everyone read into it was simple. One of the most engineering-heavy companies in the world looked at unconstrained AI tooling, watched it consume an entire annual budget in four months, and responded not by negotiating a better flat rate but by capping spend per engineer.
That is the tell. When the buyers start imposing caps on themselves, the era of "just pay for a seat and go wild" is over. And it pairs with the older, angrier classic that keeps resurfacing in every pricing thread: "Stop selling unlimited when you mean until we change our minds," 358 points of accumulated grievance about a word the industry never meant literally. Put the two together and the conclusion writes itself. Unlimited is dead. The cap won. This is the argument for why that is not a loss, and what the winners will do with it.
"Unlimited" was always a forward bet, and the bet expired
Unlimited pricing only works when the marginal cost of a heavy user is small enough to be absorbed by the average. Gym memberships work because the people who never show up subsidize the people who do. Unlimited data plans worked because spectrum is shared and the heaviest users were rare. The model is a bet that the right tail stays thin.
AI inference breaks the bet because the right tail is not thin and not cheap. As we detailed in the breakdown of the Microsoft and Uber budget blowups, a single agent-mode user can out-consume a hundred light users, and the heaviest engineers are precisely the ones you most want using the tool. There is no average to hide behind when the top decile is two orders of magnitude above the median and growing. This is the same force that is ending flat-rate AI pricing across the board. "Unlimited" was flat rate with a more generous adjective, and it expired for the same reason.
So when a vendor sold unlimited and then walked it back, the cynicism was earned, but the surprise was not warranted. The walk-back was priced into the model from the start. "Until we change our minds" was always the asterisk. The only thing that changed in 2026 is that inference got cheap enough to adopt widely and expensive enough at scale that the asterisk came due all at once.
The cap is the replacement, and it is quietly everywhere
Watch what the serious players actually shipped, not what they said. GitHub moved Copilot to credits with budget controls. Cursor pushed existing users onto usage-based plans with overage. Codex realigned pricing to track API token usage instead of a flat per-message fee, and Zed moved its assistant to token-based billing. Uber, on the buyer side, set a hard per-engineer dollar ceiling. Different companies, different products, one converging primitive: a number, denominated in dollars or credits, that your usage draws down against and that someone decided in advance.
That primitive is the spend cap, and it is becoming the default unit of AI commerce the way the seat was the default unit of SaaS for fifteen years. The interesting design space is no longer subscription versus usage. It is the shape of the cap: who sets it, at what level, how it degrades when you hit it, and whether it is a wall or a warning sign painted to look like a wall.
The controversy: a cap is only honest if it enforces
Here is where most implementations are quietly dishonest, and where the next wave of backlash is already forming. A cap that only sends an alert at the limit is not a cap. It is a notification that you have already overspent. The difference between a budget that warns and a budget that blocks is the difference between a product a CFO trusts and a product that generates the "I had no idea" support ticket. We made this case at length in the companion piece on why metered AI billing is breaking developer trust: the anger is not about paying for usage, it is about caps that turned out to be suggestions.
An enforcing cap has to do something real when it is reached. It can hard-block until the next cycle, downgrade the user to a cheaper model, suppress the expensive agent mode while leaving completions on, or require an explicit override that creates an audit record. The mechanics of that are well understood; we cover them in hard spend caps and usage kill switches and the Copilot spending-cap admin guide. What is not yet common is treating the enforcing cap as a baseline expectation rather than an enterprise upsell. The vendors who get there first will own the trust the rest are bleeding.
How to set a cap people do not resent
The reason caps feel like punishment is that they are usually set by finance, in a spreadsheet, with no connection to the work. A cap that preserves both budget and goodwill follows a few rules. We go deeper in the FinOps playbook for capping AI cost per engineer, but the principles are short.
- Set it against observed data, not a guess. If your median engineer spends 500 dollars and your top decile spends 2,000, a 1,500 cap touches the few without throttling the many. Cap blindly at the median and you have just slowed down your whole org.
- Degrade, do not just deny. At the limit, drop to a cheaper model or disable agent mode rather than hard-blocking the tool entirely. A degraded tool keeps people working; a dead one sends them to a manager to complain.
- Make the cap visible before it bites. A cap with a live meter alongside it is a budget. A cap with no meter is a trap. The cap and the dashboard are the same feature shipped together or not at all.
- Allow an audited override. The engineer mid-incident who needs to blow past the cap should be able to, with the override logged. Rigid caps with no escape hatch get routed around, usually by turning the whole control off.
Why this is good news, not a retreat
It is tempting to read the death of unlimited as the industry getting stingier. It is closer to the opposite. Unlimited was never abundance; it was a teaser rate with a repricing event scheduled for whenever the vendor noticed. The spend cap, done honestly, is the first pricing primitive that tells the customer the truth up front: here is what this costs, here is your ceiling, here is what happens when you reach it, and here is the meter so you are never surprised. That is a more respectful contract than "unlimited, terms subject to change."
The winners of the next phase will not be the vendors who cling to a flat number they cannot sustain, and they will not be the ones who slap a usage meter on the invoice and call it transparency. They will be the ones who treat the cap as a product: data-driven defaults, graceful degradation, a live meter, and an enforcing limit that a buyer can trust. The cap won because it is the honest unit. The remaining work is making it feel like control instead of a leash.
That is the layer UsageBox is built for: enforcing budgets and kill switches, real-time per-user metering, and the attribution that turns a scary total into a managed line item. Unlimited is not coming back. A cap your customers do not resent is the thing worth building instead.