Docker

Docker container restart loop: what it means and how to fix it

"Restarting (1) 5 seconds ago" means your container keeps exiting and the restart policy keeps reviving it. The exit code and the logs say why.

What a restart loop actually is

A Docker restart loop is the combination of two things: a container whose main process keeps exiting, and a restart policy — on-failure, always, or unless-stopped — that keeps starting it again. Docker adds a doubling delay between restarts (starting at 100ms, capped at one minute), which is why docker ps shows "Restarting (1) X seconds ago" with the container's last exit code in parentheses.

The loop itself is just policy doing its job; the question is why the process exits. Containers live and die with their main (PID 1) process, so anything that ends that process — a crash, a configuration error, a completed script, an OOM kill, or a process that daemonizes itself into the background — ends the container. The exit code recorded by Docker is the primary clue for which of these happened.

Common root causes of a restart loop

The app crashes at startup

Missing env vars, a bad config file mount, a port already in use inside the container, or a code error in initialization. The container exits with code 1 within seconds of starting, and docker logs shows the same error message on every iteration.

A dependency isn't ready at startup

The app starts before its database, queue, or cache accepts connections, fails the initial connect, and exits — classic in docker-compose, where depends_on orders startup but does not wait for readiness. Once everything is warm a manual restart works, which is the telltale sign.

The process doesn't stay in the foreground

An entrypoint that launches a daemonizing service (nginx without "daemon off;", a service started with & or systemctl-style init) returns immediately — so PID 1 exits with code 0 and Docker, under restart: always, dutifully loops it. Containers need a foreground process.

OOM kills and flapping healthchecks

A container breaching its --memory limit is killed with exit 137 (State.OOMKilled = true in docker inspect). Separately, a too-strict HEALTHCHECK doesn't make Docker itself restart anything — but orchestrators (Swarm, Compose with auto-restart tooling, Kubernetes) recreate "unhealthy" containers, producing the same loop with a different driver.

How to investigate and fix a restart loop

Get the exit code, read the logs, and reproduce interactively if needed — most loops resolve to one clear error message the container has been printing all along.

  1. 1

    Check the status and exit code

    docker ps -a shows the restart state and last exit code; docker inspect --format '{{.State.ExitCode}} {{.State.OOMKilled}}' <name> gives both precisely. Code 1 = app error, 137 = SIGKILL (often OOM), 126/127 = command not executable/not found, 0 = the process simply finished.

  2. 2

    Read the container's logs

    docker logs --tail 100 <name> survives restarts and shows the process's final output from each attempt. The repeated error at the end of every iteration — a missing variable, a refused connection, a stack trace — is usually the entire diagnosis.

  3. 3

    Check for OOM kills

    If the exit code is 137, check State.OOMKilled in docker inspect and the host's dmesg. If true, raise the --memory limit to the workload's real needs or fix the app's memory growth; if false, something sent SIGKILL — check healthcheck-driven orchestration and manual kills.

  4. 4

    Verify the entrypoint runs in the foreground

    If the container exits 0 almost instantly, the main process isn't staying alive. Make the service run foreground (nginx -g 'daemon off;', exec the final process in shell scripts so it becomes PID 1) and confirm the CMD/ENTRYPOINT combination is what you think it is with docker inspect.

  5. 5

    Reproduce interactively

    Stop the loop (docker update --restart=no <name>, then docker stop) and run the image by hand: docker run -it --entrypoint sh <image>, then invoke the real command yourself. Watching it fail in an interactive shell — with the same env and mounts — beats reading logs through a keyhole.

  6. 6

    Fix startup ordering for dependencies

    Add retry-with-backoff to the app's connection logic (the robust fix), or use depends_on with condition: service_healthy plus a healthcheck on the dependency in Compose. A service that tolerates a slow database start survives reboots, deploys, and the next incident too.

How to prevent restart loops

  • Build connection retries with backoff into services instead of assuming dependencies are ready the instant the container starts.
  • Validate required configuration at startup with a clear fatal log line — the loop is unavoidable, but the diagnosis shouldn't be.
  • Set memory limits from measured usage with headroom, and monitor container memory so OOM kills stop being a surprise.
  • Write healthchecks that test real readiness with sane intervals and retries — a flapping healthcheck causes the restarts it's meant to detect.
  • Alert on container restart counts and uptime, not just "is it running" — restart: always can mask a dying service for weeks.

How AllStak helps with container restart loops

AllStak's Docker monitoring tracks container lifecycle events and restart counts per host, so a container quietly looping under restart: always becomes a visible spike you can alert on — instead of something you discover weeks later in docker ps. Per-container memory charts beside the configured limit make OOM-driven loops self-evident.

Container logs collected into the same platform preserve the dying process's output across restarts, with host metrics and deploy markers on the same timeline — so "it started looping when the new image shipped" or "it loops whenever the host runs hot" stops being a hunch and becomes a chart. The diagnosis is still yours; the evidence is in one place.

Docker restart loops — frequently asked questions

What does exit code 137 mean for a container?

128 + 9 — the process received SIGKILL. The usual cause is the kernel OOM killer enforcing the container's memory limit; docker inspect's State.OOMKilled confirms it. If OOMKilled is false, the SIGKILL came from elsewhere: docker kill, an orchestrator replacing an unhealthy container, or a stop whose grace period expired.

Which restart policy should I use?

For long-running services, unless-stopped is the usual choice: it survives daemon restarts but respects a manual stop. on-failure[:max-retries] suits jobs that should retry a bounded number of times. always is like unless-stopped but revives even manually stopped containers after a daemon restart — which surprises people. None of them fix a crashing app; they only decide how persistently it loops.

Why doesn't depends_on fix my startup crashes?

Because plain depends_on only orders container start, not readiness — your app launches the moment the database container starts, often seconds before PostgreSQL accepts connections. Use depends_on with condition: service_healthy plus a healthcheck on the dependency, and add connection retries in the app itself for everything Compose can't see.

How do I stop the loop long enough to debug?

docker update --restart=no <name> changes the policy on the live container, then docker stop ends it without revival. From there, inspect the exit code and logs calmly, or run the image interactively with docker run -it --entrypoint sh to execute the real command by hand and watch it fail.

Know when a container starts looping

AllStak watches container restarts, memory, and logs across your hosts, so a silent restart loop becomes an alert with the evidence attached.