Learn

Learn: Production Errors & Incidents Explained

Practical explainers for the errors on-call engineers actually face: HTTP 5xx, OutOfMemory, CrashLoopBackOff, connection failures — causes, investigation, prevention.

HTTP 500 Internal Server Error: what it means and how to fix it

A 500 is your application admitting it failed. Here is how to read it, trace it to the exception behind it, and stop it from coming back.

HTTP 502 Bad Gateway: what it means and how to fix it

A 502 is your proxy telling you it couldn't get a valid response from your application. The fix is almost always one hop behind the proxy.

HTTP 503 Service Unavailable: what it means and how to fix it

A 503 means the service is deliberately saying "not right now" — because of overload, maintenance, or no healthy backends. The fix depends on which one it is.

HTTP 504 Gateway Timeout: what it means and how to fix it

A 504 means your proxy waited, got nothing, and gave up. The real question is always: what made the upstream so slow?

Java OutOfMemoryError: what it means and how to fix it

"OutOfMemoryError" is a family of distinct failures — heap, Metaspace, GC overhead, native memory — and each one points to a different fix.

Node.js "JavaScript heap out of memory": what it means and how to fix it

V8 ran out of old-space heap and aborted the process. Whether the fix is a flag or a leak hunt depends on what the memory curve looks like.

PostgreSQL "connection refused": what it means and how to fix it

"Connection refused" is a TCP-level rejection: nothing accepted the connection at that address and port. That single fact rules out half the usual suspects.

Redis connection timeout: what it means and how to fix it

Redis answers in microseconds — until one slow command, a persistence fork, or a saturated network makes every client wait. Here's how to find which.

Kubernetes CrashLoopBackOff: what it means and how to fix it

CrashLoopBackOff isn't the error — it's Kubernetes telling you a container keeps dying and it's deliberately waiting longer between restarts. The real error is one kubectl command away.

Docker container restart loop: what it means and how to fix it

"Restarting (1) 5 seconds ago" means your container keeps exiting and the restart policy keeps reviving it. The exit code and the logs say why.