Reliability Glossary

Latency percentiles explained

A latency percentile like p99 tells you the response time that 99% of requests come in under, exposing the slow tail that averages hide.

Latency percentiles, defined

A percentile ranks your requests by response time and tells you the value at a given position. p99 latency of 800 ms means 99% of requests finished in 800 ms or less, and the slowest 1% took longer. p50 (the median) is the midpoint; p95 and p99 describe progressively slower portions of the distribution — the "tail."

Percentiles matter because the average lies. A handful of very slow requests can drag the mean up while hiding the fact that most users are fine — or, worse, a low average can mask a painful tail. Since real users experience individual requests, not an average, you tune for the percentiles they actually feel.

Reading the distribution

Each percentile answers a different question about how your service feels. Together they paint a far truer picture than any single number.

p50 — the median

Half of requests are faster than the p50 and half are slower. It describes the typical experience, but says nothing about how bad things get for unlucky users.

p95 — the near tail

95% of requests are faster than the p95. It catches the slowdowns that a meaningful slice of users hit regularly — often the first place a degradation shows up.

p99 — the far tail

99% of requests are faster than the p99; the slowest 1% are worse. At scale that 1% is a large number of real, frustrated users, which is why p99 gets so much attention.

Why the average misleads

An average blends fast and slow into one number that matches no actual request. It can look healthy while a painful tail quietly hurts a meaningful share of your users.

Tail latency

The slow end of the distribution. Tail latency compounds: a page that fans out to many backend calls is only as fast as its slowest dependency, so high tails ripple into visibly slow pages.

Why percentiles matter

User experience is governed by the tail, not the average. If your p50 is snappy but your p99 is several seconds, a meaningful fraction of every user's sessions feels slow — and the more requests a single page makes, the more likely each user hits the tail at least once. This is why latency SLOs are written against high percentiles.

Percentiles also guide where to invest. Improving the p50 might shave milliseconds nobody notices, while cutting the p99 can rescue the worst experiences and the customers most likely to churn. Watching them as a trend reveals regressions an average would hide entirely.

Latency percentiles in AllStak

AllStak's request performance and application monitoring surface latency as percentiles, not just averages, so you can watch your p95 and p99 alongside throughput and error rates and see when the tail starts to drift.

When a slow percentile points at a particular endpoint, distributed tracing in the same platform helps you follow a slow request across services to find which span is responsible for the tail.

Related terms

Latency percentiles FAQ

What does p99 latency mean?

p99 latency is the response time that 99% of requests are faster than. If your p99 is 800 ms, then 99% of requests completed in 800 ms or less and the slowest 1% took longer.

Why not just use the average latency?

Averages blend fast and slow requests into one number that no real request matches. A small number of very slow requests can either inflate the average or hide in it, so percentiles describe real experience far better.

What is tail latency?

Tail latency is the slow end of the distribution — your p95, p99, and beyond. It matters because pages that make many backend calls are only as fast as their slowest dependency, so the tail shapes real-world speed.

Which percentile should I set an SLO on?

Latency SLOs are typically written against p95 or p99 because those reflect the experience of your slowest, most at-risk users. The exact choice depends on how latency-sensitive the service is.

Explore more

Capabilities

Compare

See your p95 and p99, not just the average

AllStak's request performance shows latency as percentiles, and distributed tracing helps you find the slow span behind a bad tail. Start free.

Start free All terms