Java

Java OutOfMemoryError: what it means and how to fix it

"OutOfMemoryError" is a family of distinct failures — heap, Metaspace, GC overhead, native memory — and each one points to a different fix.

What a Java OutOfMemoryError actually means

java.lang.OutOfMemoryError is thrown when the JVM cannot satisfy an allocation and the garbage collector cannot free enough memory to help. The message after the error name matters enormously: "Java heap space" means the object heap (capped by -Xmx) is exhausted; "GC overhead limit exceeded" means the JVM is spending almost all its time collecting while reclaiming almost nothing — a heap on the brink; "Metaspace" means class metadata space ran out, typically from classloader leaks or heavy dynamic class generation; and "unable to create native thread" means the OS, not the heap, refused resources.

A separate failure is often confused with all of these: the container OOMKilled (exit code 137). There, the Linux kernel kills the whole process because total memory — heap plus Metaspace plus thread stacks plus direct buffers plus native allocations — exceeded the cgroup limit. No Java exception is thrown at all; the process just disappears. Diagnosing memory problems starts with identifying which of these you actually have.

Common root causes of OutOfMemoryError

A memory leak holding objects alive

Unbounded caches and maps, collections that only grow, static fields referencing large graphs, listeners never unregistered, or ThreadLocals never cleared in pooled threads. The GC works correctly — your code simply never lets go. Heap usage after each full GC climbs steadily until -Xmx is hit.

Undersized heap for the real workload

No leak — the application genuinely needs more than -Xmx allows: bigger datasets, more concurrent users, larger batch jobs than anyone load-tested. The telltale sign: heap after full GC stays flat over time but peaks under load reach the ceiling.

Metaspace growth and classloader leaks

Frameworks that generate classes at runtime (proxies, bytecode generation), repeated hot redeploys in an app server, or classloaders pinned by a single surviving reference. Metaspace lives outside the heap and grows until MaxMetaspaceSize (or native memory) runs out — raising -Xmx does nothing for it.

Native and off-heap memory exceeding the container limit

Direct ByteBuffers, JNI libraries, thread stacks, and the JVM's own overhead all live outside -Xmx. Setting -Xmx equal to (or near) the container's memory limit leaves no headroom, so the kernel OOM-kills the process — exit 137, no stack trace, often misread as a random crash.

How to investigate and fix an OutOfMemoryError

Identify the exact flavor first, capture a heap dump, and use it to decide between the two fundamentally different fixes: patching a leak or resizing memory.

  1. 1

    Identify which OOM you actually have

    Read the full error message and the process exit code. A Java stack trace with "Java heap space" / "Metaspace" / "GC overhead limit exceeded" is a JVM-level error; a silent death with exit code 137 and an oom-kill line in dmesg or kubectl describe is the kernel killing the container. They have different causes and different fixes.

  2. 2

    Enable and capture a heap dump

    Run with -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/dumps so the JVM writes an .hprof automatically at the moment of failure. For a live process, jcmd <pid> GC.heap_dump captures one on demand. Without a dump you are guessing; with one, the answer is usually obvious.

  3. 3

    Analyze the dump: leak or undersizing?

    Open the dump in Eclipse MAT or a similar analyzer and look at the dominator tree. One object graph holding 70% of the heap — a cache, a map, a list — is a leak with a name and a code path. Memory spread evenly across legitimate working data suggests the heap is simply too small.

  4. 4

    Read the GC logs for the trend

    Enable GC logging (-Xlog:gc*) and chart heap usage after each full GC. A steadily rising floor over hours or days is the signature of a leak; a flat floor with load-correlated peaks is undersizing. This single chart is the most reliable leak detector there is.

  5. 5

    Check container limits versus JVM sizing

    For 137 exits, compare the container memory limit against -Xmx plus Metaspace, thread stacks, and direct memory. Prefer -XX:MaxRAMPercentage (e.g. 75%) over a hardcoded -Xmx in containers, leaving real headroom for everything that lives outside the heap.

  6. 6

    Fix, then verify under load

    Patch the leak (bound the cache, unregister the listener, clear the ThreadLocal) or resize deliberately — then rerun the same load and confirm the post-GC floor stays flat. An OOM fix isn't done until the memory trend proves it.

How to prevent OutOfMemoryError

  • Always run with -XX:+HeapDumpOnOutOfMemoryError in production — the dump costs nothing until the day it saves the investigation.
  • Monitor heap usage and GC time continuously, and alert on a rising post-GC floor — that's the leak announcing itself weeks early.
  • Bound every cache (size and TTL) and prefer weak/soft references or dedicated cache libraries over bare static maps.
  • Size the JVM relative to its container with MaxRAMPercentage and leave headroom for Metaspace, threads, and direct buffers.
  • Load-test memory behavior with production-sized data before launch — most "sudden" OOMs are workloads nobody ever simulated.

How AllStak helps with Java memory problems

AllStak's infrastructure monitoring tracks memory usage on every host and container over time, so the slow climb that precedes an OutOfMemoryError is visible as a trend — and alertable — long before the crash. When the process dies, the memory chart at the moment of failure tells you immediately whether you're looking at a gradual leak or a sudden spike.

Error tracking captures the OutOfMemoryError events your application manages to report, with release tags that show which deploy the growth started under, and centralized logs hold the GC lines and kernel oom-kill messages from the same timeline. AllStak won't analyze your heap dump for you — that's a job for MAT — but it tells you when to take one and which process to take it from.

Java OutOfMemoryError — frequently asked questions

Does restarting fix an OutOfMemoryError?

It clears the symptom, not the cause. If the heap is undersized, the next equivalent load reproduces it; if there's a leak, the post-GC floor starts climbing again immediately and the crash returns on a schedule. Restart to restore service, but capture a heap dump and the GC trend before the evidence is gone.

What's the difference between OutOfMemoryError and OOMKilled?

OutOfMemoryError is the JVM failing to allocate within its own configured limits — you get a Java stack trace. OOMKilled (exit 137) is the Linux kernel killing the entire process because total container memory exceeded the cgroup limit — no Java error at all. The first is fixed inside JVM flags and code; the second by aligning container limits with total JVM footprint.

Will increasing -Xmx solve it?

Only if the problem is genuine undersizing — a flat post-GC floor with load peaks touching the ceiling. Against a leak, a bigger heap just delays the crash and lengthens GC pauses on the way there. And for Metaspace, native-thread, or OOMKilled failures, -Xmx is the wrong knob entirely.

What does "GC overhead limit exceeded" mean?

The JVM detected it was spending the vast majority of time (by default over 98%) in garbage collection while recovering a tiny fraction of heap (under 2%) — so instead of grinding on, it fails fast. Treat it exactly like "Java heap space": the heap is effectively full, from either a leak or undersizing.

See the memory climb before the crash

AllStak charts memory on every host and container and alerts on the trend, so a leak becomes a graph you act on — not a 3 a.m. outage.