Building an open-source usage guard for Cloudflare Workers
If you’ve spent any time online, you’ve seen the posts:
- Unexpected Cloudflare bill, 2000$
- 45 000$ overage from Vercel for Jmail
- collection of more of those horror stories
Most cloud providers don’t let you set hard spending caps while using their platform; they will usually let you set alerts, but nothing more. That means you could go bankrupt overnight because the recursive function you were so proud of had an edge case you didn’t think about. Or your project got more traffic than you ever expected and you didn’t optimize for it. Or someone found your dirty little secret and played with it.
I’ve been building on the Cloudflare ecosystem for a while, and I’m at the point where I want to open some projects to the public. I don’t want to take the risk of growing a huge bill overnight because of a bug or a malicious actor. No one is immune to that.
So I searched for solutions, found nothing fitting my needs, and built my own: cf-usage-guard.
What I needed
Three things that Cloudflare’s built-in spending alerts don’t give you:
Per-resource thresholds. I want to set a limit on KV writes independently from D1 rows or R2 mutations. A global “you’ve spent $X” alert is useless when you need to know which resource is burning money.
Dollar-denominated budget caps. I want to say “I don’t want to spend more than $10 a month in overage” or “cap AI neurons at $5/day.” Not percentages of some opaque included amount: actual dollars.
Automatic protection. CF gives us spending alerts, but what if I’m sleeping? Even if I see the notification, I’d have to rush to my computer and manually disable things. I want the system to stop itself.
How it works
cf-usage-guard is a TypeScript library you plug into your Workers. The architecture is straightforward:
- A cron trigger (e.g., every 5 minutes) polls the Cloudflare GraphQL Analytics API
- The guard evaluates usage against your configured thresholds for each resource
- When a threshold is crossed, it trips a circuit breaker and persists the state to KV
- Proxy wrappers around your env bindings check the breaker state before every operation. If tripped, the call is blocked
import { createUsageGuard, guardEnv } from "cf-usage-guard";
const guard = createUsageGuard({
accountId: env.CF_ACCOUNT_ID,
apiToken: env.CF_API_TOKEN,
kvBinding: env.USAGE_GUARD_KV,
thresholds: {
"kv-writes": { trip: 80 },
"r2-class-a": { maxOverageUsd: 5 },
"ai-neurons": { maxOverageUsd: 2, granularity: "daily" },
},
budget: { maxUsd: 10, granularity: "monthly" },
});
// One line wraps all your bindings
const safeEnv = await guardEnv(env, guard);
// safeEnv.MY_KV, safeEnv.MY_DB, safeEnv.MY_BUCKET - all protected
17 resources tracked
The guard covers all 17 billable resource types across 10 Cloudflare services on the Workers paid plan: Workers requests and CPU time, KV (reads, writes, deletes, lists), D1, R2, Queues, Durable Objects, Workers AI neurons, Vectorize queries, Pages requests, and Stream minutes. Each resource has sensible defaults (trip at 90-95% of included quota) that you can override individually.
Three protection tiers
I designed three levels of protection that you can mix and match:
Passive mode
The lightest touch. The guard evaluates usage and sends alerts (Discord, Slack, or custom webhooks) when thresholds are crossed, but doesn’t block anything. Good for monitoring before you trust the system.
Active mode
This is what most people want. The guardEnv() call wraps your KV, D1, R2, Queue, AI, and Vectorize bindings in proxies that check the breaker state before every operation. When a resource is tripped, calls to that resource throw (or silently skip, your choice) while other resources continue working.
// When kv-writes is tripped, this throws UsageGuardError
await safeEnv.MY_KV.put("key", "value");
// But kv-reads still works if it's under threshold
const val = await safeEnv.MY_KV.get("key");
The binding detection is automatic: it duck-types each binding by checking for characteristic methods (prepare/batch = D1, head/createMultipartUpload = R2, getWithMetadata = KV, etc.).
Nuclear mode
For the truly paranoid: hook into onEvaluate and use the Cloudflare API to disable workers, delete routes, or take other drastic action when the guard trips.
Budget caps in dollars
Beyond percentage-based thresholds, you can set dollar caps:
{
// Per-resource: "stop R2 mutations if overage exceeds $5"
thresholds: {
"r2-class-a": { maxOverageUsd: 5 },
},
// Global: "total overage across all resources must stay under $10"
budget: { maxUsd: 10, granularity: "monthly", warn: 80 },
}
The guard calculates estimated overage using Cloudflare’s published per-unit pricing. When the global budget trips, it only blocks resources that are actually generating overage. The rest keep working.
Failure modes
I spent a lot of time thinking about what happens when things go wrong:
- CF API is down and you’re not tripped: stay not tripped (fail-open, don’t break your app because analytics is flaky)
- CF API is down and you are tripped: stay tripped (don’t unmask a real spike just because you can’t check)
- KV read fails: return not tripped (fail-open)
- State gets corrupted: a separate
tripped:safety-net key in KV survives corruption and keeps you protected - Alert delivery fails: caught and logged, other channels still fire, retry happens on next evaluate cycle
The philosophy is: never break your application because the guard itself failed. But if you’re in a tripped state, stay tripped until we can confirm it’s safe.
The analytics lag caveat
I want to be honest about the main limitation: Cloudflare’s Analytics API has a lag of a few minutes. If a bug burns through $5,000 in 10 seconds, this guard won’t catch it in time.
But that’s not really the common scenario. The much more likely failure mode is a worker silently grinding through resources for hours or days: an inefficient query running in a loop, a retry storm, a crawler hammering your endpoints. That’s exactly what this catches.
Zero dependencies
The library has zero npm dependencies. It uses only Cloudflare Workers built-in APIs (fetch for GraphQL, KV for state persistence). This was a deliberate choice: fewer dependencies means fewer things that can break, and it keeps the supply chain attack surface minimal.
Try it out
It’s MIT licensed and available on npm and GitHub:
Feature requests and contributions are welcome. If you’ve built something similar or have a different approach to cost protection on Cloudflare, I’d love to hear about it.
To be clear, this project should not exist and is a best-effort to reduce the unexpected overage probabilities and allows me to have a better sleep at night. Until Cloudflare gives us proper spend caps, we’ll have to protect ourselves however we can, and for now that’s my answer to it.
What’s next
The guard covers the most common overage scenarios today, but there’s more I want to add:
- Cost projection: estimating where you’ll land at the end of the billing period based on current usage trends, with alerts like “at this rate, you’ll hit the limit on day X.”
- Daily/weekly usage summary reports sent to Discord or Slack, so you get a snapshot of your consumption without having to check dashboards.
- Storage monitoring: the CF Analytics API currently only exposes operation counts, not GB stored. R2 and Durable Objects storage monitoring will require separate API calls.
- More service coverage: Hyperdrive connections, Workers Workflows steps, Analytics Engine writes, and Email Routing.
guardDOproxy: wrapping Durable Object namespaces the same wayguardKVandguardD1work today.- Dashboard UI component: an embeddable usage chart you can drop into an admin page.