Reliability Glossary

What is an error budget?

An error budget is the allowable amount of unreliability over a window — 100% minus your SLO — that a team can "spend" before reliability work must take priority over new features.

Error budget, defined

An error budget is the direct consequence of choosing an SLO below 100%. If your SLO says 99.9% of requests should succeed, then 0.1% are allowed to fail — that 0.1% is your error budget for the window. It reframes reliability from "never fail" to "fail no more than this much," which is both honest and operationally useful.

Because 100% reliability is impossible and prohibitively expensive to approach, the error budget gives teams permission to take risk deliberately. As long as budget remains, you can ship faster, run experiments, and tolerate the occasional blip. When the budget is exhausted, the policy shifts the team toward stabilization.

Key ideas behind error budgets

An error budget is simple arithmetic that becomes powerful when paired with a policy. These ideas turn it from a number into a decision-making tool.

100% minus the SLO

The budget is the gap between perfection and your objective. A 99.9% SLO yields a 0.1% budget, which over 30 days is roughly 43 minutes of allowable downtime.

Spending the budget

Every outage, failed deploy, or latency breach "spends" budget. Teams treat it like currency: while there's plenty left, they invest it in velocity and risk.

Error-budget policy

A written policy says what happens when the budget runs low — for example, freezing risky releases and redirecting effort to reliability until the budget recovers.

Velocity vs reliability

The budget defuses the eternal fight between product and operations. Instead of arguing opinions, both sides agree to follow the budget: room to ship, or time to stabilize.

Burn rate

Burn rate measures how fast you're consuming the budget. A sudden spike — burning days of budget in minutes — is a strong, early signal worth alerting on.

Why error budgets matter

Error budgets were popularized by Google's SRE practice as a way to make reliability a shared, data-driven decision rather than a turf war. They give product teams a clear runway to move fast and give operations an objective trigger to pull the brakes — both grounded in the same SLO.

The discipline only works if you actually enforce the policy. A budget you always ignore is just a number on a dashboard. The teams that benefit are the ones that genuinely slow down when the budget is spent and genuinely take risk when it's healthy.

Tracking the budget with AllStak

An error budget is only as good as the signals feeding it. AllStak's uptime monitoring, error tracking, and request-performance data give you the availability and reliability measurements that show how much budget each incident consumed.

Pair those signals with notification rules so a fast burn — a sudden surge of failures — pages the right people before the budget for the whole window is gone.

Error budget FAQ

How do you calculate an error budget?

Subtract your SLO from 100%. A 99.9% availability SLO gives a 0.1% error budget, which over a 30-day window is about 43 minutes of allowable downtime.

What does it mean to 'spend' an error budget?

Every failure that counts against your SLO consumes budget. Outages, failed deploys, and latency breaches all draw it down. When it's healthy you can take risk; when it's depleted you focus on reliability.

What is an error-budget policy?

It's an agreed set of actions triggered by budget levels — for instance, freezing risky feature releases and prioritizing reliability work once the budget is exhausted, until it recovers.

Why not aim for 100% reliability?

Each additional nine of reliability costs disproportionately more, and users rarely perceive the difference. A sensible error budget lets you invest that effort where it actually moves the product forward.

Watch your error budget burn in real time

AllStak's uptime, error, and performance data show how each incident draws down your budget — and notification rules warn you before it's gone. Start free.