When a Java service hits OutOfMemoryError, the fastest mistake is to treat every case like a generic heap problem. Java memory incidents are often less about “memory is full” and more about which memory area is under pressure and why that pressure built up.
The short version: capture the exact OutOfMemoryError variant first. Heap pressure, metaspace growth, direct memory exhaustion, and large queued backlogs do not point to the same fix path.
If you want the wider Java routing view first, step back to the Java Troubleshooting Guide.
Start with the exact error shape
Different OutOfMemoryError messages mean different bottlenecks.
For example:
- Java heap space
- GC overhead limit exceeded
- Metaspace
- Direct buffer memory
These do not imply the same root cause, and they should not trigger the same response.
That is why the first job is not “increase heap” but “identify which memory area is failing.”
What OOM incidents often look like in production
Before the crash or forced restart, you may see:
- queue backlog continuing to grow
- GC activity increasing sharply
- latency getting worse before the process dies
- container memory limits reached even when heap sizing looks reasonable
- deployment or traffic changes exposing old retention assumptions
An OOM is usually the end of a story that started earlier with retention, backlog, class loading, or native pressure.
Common causes
1. Heap retained by application objects
This is the most familiar pattern.
Large objects or too many retained references can slowly fill heap:
- large collections
- caches without clear bounds
- in-memory queues
- request payload retention
- response aggregation buffers
The heap may not leak forever, but if objects live much longer than expected, the JVM can still fail under real traffic.
2. Metaspace growth
Not every OOM is about ordinary objects.
Heavy class loading, dynamic proxies, bytecode generation, or repeated classloader churn can push metaspace much higher than expected.
This is especially relevant in systems with:
- plugin loading
- dynamic frameworks
- repeated redeploy patterns
- custom classloader usage
3. Direct or native memory pressure
Some incidents happen outside the normal heap story.
Examples include:
- direct byte buffers
- JNI allocations
- off-heap caches
- process memory overhead in containers
In those cases, heap metrics may look acceptable while the process still fails.
4. Queue backlog and retained work inflate memory
This is often missed.
If thread pools, messaging buffers, or request backlogs keep growing, the queued work itself can retain many objects at once.
That means the root problem may be throughput collapse rather than a classic object leak.
5. Capacity assumptions are wrong
Traffic, payload size, tenant count, or data shape may have outgrown the original JVM sizing and queue design.
Sometimes the code did not change much, but the workload did.
A practical debugging order
1. Capture the exact OutOfMemoryError variant
Do not summarize it as just “OOM.”
The exact message narrows the search space immediately.
2. Identify whether pressure is heap, metaspace, or native
This single distinction prevents many wasted hours.
Heap tuning will not solve direct buffer exhaustion. Metaspace fixes will not solve queue retention.
3. Inspect caches, collections, queues, and payload-heavy paths
Look for the places where the application can retain far more data than expected.
Ask:
- what grows with traffic?
- what grows with retries or backlog?
- what has no clear upper bound?
4. Compare recent traffic and deployment changes
The incident may follow:
- a new feature path
- larger request payloads
- more concurrent work
- changed cache behavior
- classloading differences
OOM incidents often make more sense when viewed as a workload shift.
5. Change JVM sizing only after the pressure source is clear
More memory can buy time, but it should not replace diagnosis.
If the pressure source is retention or runaway backlog, a larger heap may only delay the same failure.
Example: queue backlog causing heap pressure
ExecutorService pool = Executors.newFixedThreadPool(8);
for (Task t : tasks) {
pool.submit(() -> process(t));
}
If process(t) slows down and incoming work keeps arriving, queued tasks may retain payload objects, references, and closures long enough to turn a throughput incident into a heap incident.
That is why some OOMs are really backlog problems wearing a memory-shaped mask.
A useful JVM option
java -XX:+HeapDumpOnOutOfMemoryError -jar app.jar
Capturing a heap dump on the first OOM gives you a concrete object graph instead of guessing after the process is gone.
If storage and policy allow it, this is often one of the highest-value safeguards you can enable.
What to change after you find the pressure source
If heap retention is the issue
Reduce retention, bound caches and queues, and remove long-lived references.
If metaspace is the issue
Inspect classloader behavior, dynamic code generation, and redeploy patterns.
If native or direct memory is the issue
Trace off-heap usage and container memory assumptions instead of focusing only on heap charts.
If backlog is the issue
Treat queue growth and throughput collapse as the primary incident, not a secondary detail.
If the workload simply outgrew sizing
Resize intentionally, but only after you understand the path consuming memory.
A useful incident question
Ask this:
Which memory area actually failed, and what was growing there before the JVM died?
That question is much more actionable than “Should we raise heap?”
FAQ
Q. Should I just increase heap size first?
Not before you know which memory area is actually failing.
Q. Can thread pools cause memory pressure too?
Yes. Large backlogs and queued work can retain many objects at once.
Q. What is the fastest first step?
Capture the exact error variant and map it to the affected memory area.
Q. Is every OOM a memory leak?
No. Some are classic leaks, but others come from backlog, larger workloads, or configuration that no longer matches reality.
Read Next
- If queued work and backlog look more suspicious than pure retention, open Java Thread Pool Queue Keeps Growing next.
- If metaspace looks like the dominant issue, compare with Java Metaspace Usage High.
- If the incident also showed heavy GC behavior, continue with Java GC Pauses Too Long.
- If you want the wider Java routing view first, go back to the Java Troubleshooting Guide.
Related Posts
- Java Thread Pool Queue Keeps Growing
- Java Metaspace Usage High
- Java GC Pauses Too Long
- Java Troubleshooting Guide
Sources:
- https://docs.oracle.com/javase/8/docs/api/java/lang/OutOfMemoryError.html
- https://docs.oracle.com/en/java/javase/21/troubleshoot/
While AdSense review is pending, related guides are shown instead of ads.
Start Here
Continue with the core guides that pull steady search traffic.
- Middleware Troubleshooting Guide: Redis vs RabbitMQ vs Kafka A practical middleware troubleshooting guide for developers covering when to reach for Redis, RabbitMQ, or Kafka symptoms first, and which problem patterns usually belong to each tool.
- Kubernetes CrashLoopBackOff: What to Check First A practical Kubernetes CrashLoopBackOff troubleshooting guide covering startup failures, probe issues, config mistakes, and what to inspect first.
- Kafka Consumer Lag Increasing: Troubleshooting Guide A practical Kafka consumer lag troubleshooting guide covering what lag usually means, which consumer metrics to check first, and how poll timing, processing speed, and fetch patterns affect lag.
- Kafka Rebalancing Too Often: Common Causes and Fixes A practical Kafka troubleshooting guide covering why consumer groups rebalance too often, what poll timing and group protocol settings matter, and how to stop rebalances from interrupting useful work.
- Docker Container Keeps Restarting: What to Check First A practical Docker restart-loop troubleshooting guide covering exit codes, command failures, environment mistakes, health checks, and what to inspect first.
While AdSense review is pending, related guides are shown instead of ads.