Does setting a budget on Google Cloud or the Gemini API stop spending?

No. Google’s own documentation states that setting a budget does not automatically cap your usage or spending. Budgets send alert emails and optional Pub/Sub messages, but they do not prevent further use or billing once a threshold is crossed. A genuine hard cap on Google Cloud has to be built yourself, typically by wiring a budget Pub/Sub notification to a Cloud Function that disables billing on the project. Even then, expect charges to keep reporting in for up to two days after you act.

Can I set a hard spending limit on the OpenAI API?

Not anymore. OpenAI replaced its old hard spending cap with a soft monthly budget in late 2025. When you exceed that budget, API requests continue to be processed without interruption and you receive a notification rather than a cutoff. If you need a true stop, you have to enforce it yourself in front of the API with a real-time meter and a circuit breaker, because the platform will keep serving requests and billing you.

Which AI provider actually offers a real hard spend cap?

Anthropic is the closest of the three. It offers per-tier monthly spend limits plus custom per-workspace spend and rate limits, and per-user monthly spend limits on the Claude Code workspace. When you hit a workspace spend limit the API stops serving that workspace until the next cycle. It is the most cap-like native control among the major vendors, though you should still pair it with key hygiene and your own metering.

Why did the bill keep growing even after the team paused the API?

Billing reporting is not real time. Google documents that usage charges can take up to two days to be reported, and that usage accrued before you disable billing is still billed, including charges not yet visible in the transaction history. So when a team pauses at, say, $44k, the already-incurred usage continues to land and the total can climb well past that, as one widely reported case reached about $128k. This lag is exactly why a meter in your own request path, which counts before the provider invoice does, is the only thing that can stop a runaway in seconds.

How does a usage metering layer stop a runaway agent or stolen key?

It keeps a live balance that every billable call decrements, scopes hard caps per key, customer, and environment so one credential can only burn its own slice, fires anomaly alerts on spend velocity within seconds, and trips a circuit breaker that revokes the key or refuses requests at your gateway. Because the decision runs against a meter you control rather than the provider’s lagged billing, it triggers fast. The honest caveat: it stops spend that flows through your application, so you still need provider-side caps and key rotation for a credential being called against the provider directly.

Hard Spend Caps and Usage Kill-Switches: Stopping a Leaked Key or Runaway Agent From Bankrupting You

Name: UsageBox
Rating: 4.8 (50 reviews)
Author: UsageBox

The short version: A leaked API key or a runaway agent can turn a $180 month into an $82,000 bill in 48 hours, and the providers most people assume will stop it usually will not. Google Cloud budgets are alerts, not caps. OpenAI removed its hard spending limit and left a notification behind. Anthropic is the outlier with real per-workspace spend limits. The only reliable defense is a layer that meters usage in real time, holds a live balance, fires anomaly alerts in seconds, and trips a circuit breaker that actually revokes access. This is what that layer looks like and where the provider controls fall short.

The story repeats often enough that it has become a genre. A small team's Google Cloud key is exposed, an attacker drives Gemini calls around the clock, and the owner wakes up to a bill that could end the company. One widely shared case on r/googlecloud was titled "$82,000 in 48 Hours from stolen Gemini API Key. My monthly Usage Is $180. Facing Bankruptcy." Another team, a small company in Japan, reported about $128,000 in unauthorized Gemini usage, and the charges kept climbing even after they paused the API.

The reactions in those threads are not about one stolen key. They are about a structural gap: people assume usage-based billing comes with a hard ceiling, and it almost never does. As one commenter put it, "Google do not allow spend caps... how on earth can they allow anyone to run up bills like this with no financial checks is beyond me." This is about closing that gap, including the parts your provider cannot help you with.

Why a budget alert is not a kill-switch

The most expensive misunderstanding in usage-based spend is treating an alert as a control. An alert tells a human something happened. A cap stops the thing from happening. The major providers draw that line in different places.

Provider	What the "limit" actually does	Hard cutoff available?
Google Cloud / Gemini API	Budgets send alert emails and Pub/Sub messages. Google's own docs state that "setting a budget does not automatically cap" usage or spending.	Not natively. You build it yourself with a Pub/Sub topic and a Cloud Function that disables billing on the project.
OpenAI API	A monthly budget is a soft threshold. Once exceeded, "API requests will continue to be processed without interruption." The old hard cap was removed in late 2025.	No. Notification only.
Anthropic / Claude API	Per-tier monthly spend limits, plus custom per-workspace spend and rate limits, and per-user limits on the Claude Code workspace. Hit the limit and the API stops until the next cycle.	Yes, the closest to a real native cap of the three.

Two of the three big AI vendors give you a doorbell, not a deadbolt. And even the deadbolt has a delay, which is the next trap.

The reporting-lag trap that turns a cap into a leak

Even when you act fast, billing data does not arrive in real time. Google documents that it "might take up to two days for usage charges in the project to be reported," and that after you disable billing, "usage charges that accrue prior to disabling billing... are billed," including charges not yet in the transaction history. That is exactly what the Japanese team hit: they paused at roughly $44k and watched it climb past $128k as already-incurred usage caught up.

The lesson is precise. A control that depends on the provider's billing pipeline to notice the overspend is always running minutes-to-days behind the attack. To stop a runaway in seconds you need a meter you own, in the request path, counting before the provider's invoice does. That is the difference between watching the bill and stopping the spend.

The four controls that actually contain a runaway

Containment is not one feature, it is a short stack of them, applied cheapest-first. None requires you to trust the provider's billing lag.

1. A real-time balance, not an end-of-month total

Every billable call decrements a live balance the moment it happens. This is the foundation: you cannot enforce a cap you only compute at invoice time. UsageBox is the metering and balance layer here, ingesting each event, holding the remaining budget per key, per customer, and per tenant, and exposing it for a decision before the next call goes out. This is the same real-time ledger described in the Usage API guide, pointed at cost control instead of invoicing.

2. Hard caps scoped to the blast radius

One global monthly cap is too coarse. A stolen key should be able to burn only its own slice, not the whole account. Set caps per API key, per customer, and per environment, so a compromised production key trips long before it can touch the org-wide ceiling. Scoping caps tightly is also how multi-tenant platforms keep one bad actor from spending another tenant's budget, an extension of the isolation work in securing API keys for multi-tenant systems.

3. Anomaly and spike alerts measured in seconds

An $82k incident from a $180 baseline is a 455x spike. That is not subtle, and a rate-of-change detector catches it in the first minutes if it is watching the live meter rather than the daily billing export. Alert on velocity (spend per minute against the trailing baseline), not just on absolute thresholds, so a slow drain and a fast burst both trip. The event-driven alerting pattern in the real-time usage alerting architecture is built for exactly this: invalidate on each event, collapse bursts, and fire without polling lag. This is the early-warning sibling of margin billing drift detection, except here the stakes are a fraud bill, not a slow margin leak.

4. A circuit breaker that revokes, not just warns

The alert has to be able to pull the cord. When velocity or balance crosses the line, the breaker revokes the offending key, flips a feature flag, or returns a hard error at your gateway, before the next expensive call leaves your network. Because the decision runs against a meter you control, it triggers in seconds rather than waiting on the provider's two-day reporting window. The enforcement loop in the usage enforcement guide shows the mechanics: check the balance, flag a hard-stop policy, refuse the request.

Where the metering layer ends and the provider begins

This is not a silver bullet, so here is the boundary. A meter in your request path stops spend that flows through your application. It cannot stop spend on a key an attacker is calling the provider with directly, outside your infrastructure, using a credential that leaked to a public repo. For that exposure you still need provider-side controls: Anthropic's workspace spend limits, a Google Cloud Pub/Sub-to-Cloud-Function killswitch that disables billing, key rotation, and not committing secrets in the first place. The honest architecture is both layers. Your meter contains the runaway agent and the leaked key used through your app in seconds; the provider cap and good key hygiene contain the key used against the provider directly. The metering layer shrinks the blast radius, but it sits next to provider caps, it does not replace them.

An incident runbook worth writing down before you need it

Detect: Velocity alert fires when spend-per-minute exceeds the trailing baseline by your multiple (a small multiple catches the slow drains too).
Contain: The circuit breaker revokes the implicated key and flips the tenant or environment into a refuse-and-log mode at the gateway.
Pull provider levers in parallel: Disable the API or project upstream, rotate the credential, and on Anthropic let the workspace spend limit hold the line.
Reconcile: Expect the provider total to keep climbing for hours as lagged usage reports in. Reconcile your meter against the eventual invoice and keep the timestamped trace for any dispute, the same audit discipline behind defending unexpected bills.
Tune: Lower the scoped cap for that key class and shorten the alert window. Almost every horror story shares one detail: no cap, no alert, or both set far too loose.

The teams who avoid the front-page bill are not the ones who never get a key stolen. They are the ones whose meter noticed in the first two minutes and whose breaker had the authority to say no. The flat-rate era is ending and variable spend is now permanent, which is exactly why a cap you control beats a budget you only get notified about. We cover the budgeting side of that shift in budgeting when flat-rate plans disappear.

Key Topics

•spend caps
•hard limits
•kill-switch
•circuit breaker
•API key leak
•runaway agent
•anomaly alerts
•usage-based billing
•Gemini
•OpenAI
•Anthropic

Next Steps

Put a real-time spend cap on every key Browse all articles

←

→

Explore More Articles

Discover our complete collection of usage-based billing guides and implementation patterns.

View all articles