Kubernetes CrashLoopBackOff: what it means and how to fix it
CrashLoopBackOff isn't the error — it's Kubernetes telling you a container keeps dying and it's deliberately waiting longer between restarts. The real error is one kubectl command away.
What CrashLoopBackOff actually means
CrashLoopBackOff is a waiting state, not an error in itself. The sequence: the kubelet starts your container, the container exits (crash or clean exit), the restart policy says restart it, and it dies again. After repeated failures, Kubernetes inserts an exponentially growing delay between restart attempts — 10s, 20s, 40s, doubling up to a five-minute cap — and during that delay the pod shows CrashLoopBackOff. If a container then runs cleanly for ten minutes, the backoff timer resets.
Because the state only says "it keeps dying", the diagnosis lives elsewhere: in the container's exit code, its last logs, and the pod's events. It's also worth knowing what CrashLoopBackOff is not: an image that can't be pulled shows ImagePullBackOff, and a missing ConfigMap or Secret referenced in env vars shows CreateContainerConfigError. CrashLoopBackOff specifically means the container started and then died.
Common root causes of CrashLoopBackOff
The application crashes at startup
A missing or malformed env value, an unreachable database, a failed migration, or an unhandled exception in initialization code. The container typically exits with code 1, and the last lines of kubectl logs --previous contain the application's own error message.
OOMKilled: memory limit too low
The container exceeded its memory limit and the kernel killed it — exit code 137, with reason OOMKilled in the pod's last state. The app may be leaking, or the limit may simply be below the workload's honest needs. Logs often end mid-sentence because the kill is instant.
A misconfigured liveness probe
A classic: the app needs 60 seconds to start, but the liveness probe starts checking at 10 — so Kubernetes kills a perfectly healthy container forever. Events show "Liveness probe failed" before each restart. The fix is a startupProbe (or a longer initialDelaySeconds), not faster code.
Wrong command or a process that exits immediately
A bad command/args override, an entrypoint script that finishes, or a process that daemonizes itself — the container's main process must stay in the foreground. A container that exits with code 0 and restartPolicy: Always still loops, because Kubernetes expects services to run forever.
How to investigate and fix CrashLoopBackOff
Two commands — kubectl describe pod and kubectl logs --previous — solve the large majority of crash loops. Work from the exit code outward.
- 1
Describe the pod
kubectl describe pod <name> shows the restart count, the Last State block with the exit code and reason (Error, OOMKilled), and the Events list — probe failures, kill signals, scheduling problems. This single output usually classifies the failure before you read any logs.
- 2
Read the previous container's logs
kubectl logs <pod> --previous prints the output of the crashed instance — the current one may have no logs yet. The application's dying words (a stack trace, a "missing env var" message, a connection error) are usually in the last twenty lines.
- 3
Interpret the exit code
Exit 1 (or other small codes): application error — read the logs. Exit 137: SIGKILL, most often OOMKilled (confirm via the reason field) or a liveness-probe kill after the grace period. Exit 143: SIGTERM, a graceful shutdown request. Exit 0 with continued restarts: the process is finishing when it should serve.
- 4
Check the probes against real startup time
If events show liveness failures, measure how long the app genuinely takes to become ready and compare with the probe's initialDelaySeconds, period, and failureThreshold. Add a startupProbe for slow starters so the liveness probe only takes over after the app is up.
- 5
Verify config, secrets, and dependencies
Confirm every env var, ConfigMap, and Secret the app reads actually contains what it expects (kubectl exec into a debug pod, or check with kubectl get secret -o yaml). If the app crashes reaching a database or queue, test that dependency from inside the cluster network, not from your laptop.
- 6
Correlate with the deploy and fix forward or roll back
If the loop began with a rollout, kubectl rollout undo restores service while you diagnose the new revision. For OOMKilled, raise the memory limit to the workload's measured needs — and investigate a leak if usage keeps climbing after the raise.
How to prevent crash loops
- Use startup probes for any app that takes more than a few seconds to boot, and set liveness probes against measured startup times.
- Set memory requests and limits from observed usage with headroom — limits guessed at deploy time are where OOMKilled loops are born.
- Validate required env vars and config at process start with a clear fatal message, so a bad rollout fails loudly and legibly.
- Make services resilient to slow dependencies — retry with backoff at startup instead of crashing when the database isn't ready yet.
- Alert on pod restart counts and OOMKilled events, not just availability — a service can stay "up" while quietly looping.
How AllStak helps with CrashLoopBackOff
AllStak's Kubernetes monitoring shows pod states — including CrashLoopBackOff — alongside restart counts, so a container that's quietly dying every few minutes surfaces as a visible, alertable signal instead of a number buried in kubectl output. Container memory charts next to the pod's limit make OOMKilled loops obvious: usage sawtoothing into the ceiling, restart, repeat.
Pod logs in the same platform preserve the crashed container's final lines past the restart, and deploy markers on the timeline show whether the loop began with a rollout. AllStak doesn't diagnose the crash for you — but it puts the state, the exit evidence, the logs, and the memory chart on one screen, which is most of the investigation.
CrashLoopBackOff — frequently asked questions
Is CrashLoopBackOff an image problem?
Usually not. A pull failure (wrong tag, missing registry credentials) shows ImagePullBackOff or ErrImagePull instead — the container never starts at all. CrashLoopBackOff means the image pulled and the container started, then died. The closest image-related cause is a wrong entrypoint or command inside an otherwise valid image.
Does deleting the pod fix it?
It resets the backoff timer and gets you a fresh container faster, but if the cause is unchanged — same config, same limits, same image — the new pod loops identically. Deleting pods is occasionally useful to re-read updated ConfigMaps/Secrets, but it's a retry, not a fix.
What does exit code 137 mean?
128 + 9: the process was killed with SIGKILL. In Kubernetes that's most often the kernel OOM-killing a container that exceeded its memory limit (the pod's last state shows reason OOMKilled), or the kubelet force-killing a container that ignored SIGTERM after a failed liveness probe. Check the reason field to tell them apart.
How long does the backoff delay get?
The restart delay starts around ten seconds and doubles with each failure — 10s, 20s, 40s, 80s — capping at five minutes. So a pod stuck in a long loop restarts roughly every five minutes. After a container runs cleanly for ten minutes, the timer resets to the beginning.
Explore more
By framework
Compare
See your crash loops before they page you
AllStak's Kubernetes monitoring tracks pod states, restart counts, container memory, and logs in one place — the whole CrashLoopBackOff investigation on one screen.