HTTP 504

HTTP 504 Gateway Timeout: what it means and how to fix it

A 504 means your proxy waited, got nothing, and gave up. The real question is always: what made the upstream so slow?

What an HTTP 504 actually means

HTTP 504 Gateway Timeout means a gateway or proxy — nginx, a load balancer, or a CDN — forwarded the request to an upstream server and did not receive a response within its configured timeout. Like the 502, the 504 page is generated by the intermediary, not your application. Unlike the 502, nothing came back broken — nothing came back at all in time.

An important subtlety: the upstream may still be working on the request after the proxy gave up. The user sees a 504, the app finishes the work seconds later, and side effects — a payment, an email, a database write — may have happened anyway. That is why 504s on non-idempotent endpoints deserve extra care, and why the fix is usually to make the work faster or asynchronous, not just to raise the timeout.

Common root causes of a 504

Genuinely slow upstream work

A slow database query, a missing index, an N+1 query pattern, or a long call to an external API pushes response time past the proxy's limit. The proxy is behaving correctly; the application is the slow part.

Proxy timeout shorter than legitimate work

Some endpoints legitimately take long — reports, exports, large uploads — but the proxy's read timeout (e.g. nginx proxy_read_timeout, default 60s) or the load balancer's idle timeout is shorter. The mismatch turns a working feature into a 504.

Worker saturation — requests queued, never picked up

All app workers or threads are busy (often pinned by slow downstream calls), so new requests sit in a queue until the proxy's timer expires. The endpoint isn't slow by itself — the service has no free hands.

Network problems between proxy and upstream

Packet loss, a misbehaving overlay network, security-group changes, or DNS pointing at an unreachable address make the proxy's connection attempts hang until timeout. Less common than slow apps, but it produces identical symptoms.

How to investigate and fix a 504

Identify which hop timed out, measure how long the upstream actually takes, and then decide whether to make the work faster, move it off the request path, or deliberately adjust the timeout chain.

  1. 1

    Identify which layer timed out

    A request often passes CDN → load balancer → nginx → app. Each hop has its own timeout, and the 504 page's branding and headers reveal who gave up first. Knowing the layer tells you which timeout value and which log to read.

  2. 2

    Measure the upstream's real latency

    Use access logs (nginx $upstream_response_time), APM data, or traces to see how long the affected endpoint actually takes. If it routinely runs close to the timeout, you've found the problem; if it's normally fast, look for saturation or network issues instead.

  3. 3

    Find the slow dependency

    Check database slow-query logs, external API latencies, and lock contention at the failure timestamps. Most 504s decompose into one query or one third-party call that consumes nearly the whole budget.

  4. 4

    Check worker and pool saturation

    Inspect worker/thread pool usage, request queue depth, and database connection pool waits. If queues are deep, the fix is to unpin workers (timeouts on downstream calls), add capacity, or shed load — not to touch the endpoint that happened to time out.

  5. 5

    Audit the timeout chain

    List every timeout on the path — CDN origin timeout, LB idle timeout, proxy_read_timeout, app worker timeout — and make them deliberate: each outer layer slightly longer than the inner one, so failures produce clear errors at the right layer instead of mysterious races.

  6. 6

    Correlate with deploys and traffic

    504s that start after a deploy point to a new slow query or a removed index; 504s that track traffic peaks point to capacity. For genuinely long work, move it off the request path with a job queue and return 202 with a status endpoint instead of holding the connection.

How to prevent 504 errors

  • Monitor endpoint latency percentiles (p95/p99) and alert when they approach the proxy timeout — a 504 is just latency that crossed the line.
  • Put explicit timeouts on every downstream call so one slow dependency can't pin workers and starve the whole service.
  • Move long-running work — reports, exports, bulk operations — to background jobs instead of holding HTTP connections open.
  • Watch database slow-query logs and add indexes before a query's growth pushes it past the timeout.
  • Document the full timeout chain and review it when adding a new layer (CDN, mesh, gateway) so values stay coherent.

How AllStak helps with 504 errors

AllStak's uptime monitoring catches 504s from outside with response-time history, so you can see latency creeping toward the timeout days before the first failure. Application monitoring shows which endpoints are slow and how their latency distribution shifted, narrowing the search to the specific route and time window.

Infrastructure metrics reveal the saturation side — CPU, memory, and load on app and database hosts — and centralized logs hold the slow-query lines and proxy timeout messages from the same minutes. With deploy markers on the same timeline, telling "new slow query" apart from "traffic outgrew capacity" takes minutes instead of meetings.

HTTP 504 — frequently asked questions

What is the difference between a 504 and a 502?

Both are proxy-generated, but a 502 means the upstream responded badly — refused the connection, closed it early, or sent garbage — while a 504 means the upstream didn't respond at all within the timeout. 502 points to a dead or crashing app; 504 points to a slow or unreachable one.

Does the request still execute after a 504?

Often yes. The proxy gave up waiting, but the upstream usually keeps processing unless it detects the closed connection and aborts. That means writes, payments, or emails may complete even though the client saw an error — design non-idempotent endpoints with idempotency keys so retries are safe.

Should I just raise the proxy timeout?

Only when the work is legitimately long and you've decided to accept it on the request path. Raising timeouts to mask slow queries trades a clear error for held connections, deeper queues, and worse cascading failures under load. Measure first; usually the right fix is faster work or a background job.

Why does my client time out before any 504 appears?

Because the client's own timeout is shorter than the proxy's. The browser, SDK, or mobile app gives up and reports a network error while the proxy is still waiting — so server-side logs show nothing unusual. Keep client timeouts slightly longer than the slowest hop, or shorten the server-side budget.

See latency before it becomes a 504

AllStak tracks endpoint response times from outside and in, so the slow query behind your next timeout shows up on a chart before it shows up as an outage.