Java Troubleshooting Guide: Where to Start With OOM, Thread Pools, and Runtime Pressure
When a Java service starts failing under load, the hardest part is usually not finding a tuning flag. It is deciding whether the incident is truly memory pressure, queue backlog, thread contention, or a runtime that is already overloaded before the crash appears.
The short version: start with the visible bottleneck, not with a favorite JVM fix. Then move into the narrowest guide that matches what you can actually observe.
Use this page as the Java troubleshooting hub on this blog.
Start with the first visible bottleneck
Useful first questions:
- are you seeing
OutOfMemoryError? - is a thread pool queue growing continuously?
- are GC pauses suddenly much longer?
- does CPU stay hot while throughput falls?
- do thread dumps show blocked progress or deadlock?
- is the service saturating before it fully crashes?
That framing keeps you from raising heap when backlog is the real issue, or tuning executors when the process is actually failing due to memory-area pressure.
A quick JVM triage command set
When you need a fast first look, these commands often help choose the next branch:
jcmd <pid> GC.heap_info
jcmd <pid> Thread.print
jcmd <pid> VM.native_memory summary
You are not trying to solve the whole incident from one command. You are trying to decide whether the strongest signal is memory pressure, queue growth, hot threads, or blocked progress.
A simple Java triage map
Use this as the fastest first split:
- OOM variants, retained objects, memory-area pressure: start with OOM
- queue growth, busy workers, saturation: start with thread-pool backlog
- pause spikes, allocation churn, retained heap: start with GC pauses
- hot threads, retry loops, contention, wasted work: start with JVM CPU
- blocked locks, waiting cycles, no forward progress: start with deadlock
This map is intentionally simple. It is better to pick one strong branch than to half-debug three categories at once.
When the problem is probably memory pressure
Start with Java OutOfMemoryError when the symptom looks like:
- heap, metaspace, or native pressure is visible
- the exact OOM variant matters
- large collections, caches, or retained payloads look suspicious
- class metadata growth seems abnormal
If the retained-object story is still unclear, continue with Java Heap Dump.
When the problem is probably queue backlog or executor saturation
Start with Java Thread Pool Queue Keeps Growing when the symptom looks like:
- executor queues keep growing
- workers stay busy or blocked
- throughput falls before memory becomes the loudest signal
For adjacent branches, continue with:
When the problem is probably long GC pauses or retained heap
Start with Java GC Pauses Too Long when the symptom looks like:
- pause spikes are easier to see than a clean crash
- allocation churn changed after traffic or payload shifts
- old generation growth looks suspicious
If you need retained-object evidence rather than pause symptoms, move next to Java Heap Dump.
When the problem is probably hot CPU or contention
Start with Java JVM CPU High when the symptom looks like:
- CPU remains hot while throughput gets worse
- hot threads matter more than raw host metrics
- retries, spinning, contention, or wasted work are likely
If lock contention is especially visible, continue with Java Thread Contention High.
When the problem is probably stalled progress
Start with Java Thread Deadlock when the service looks stuck rather than simply slow.
This branch fits when:
- thread dumps show waiting cycles
- lock ownership matters more than throughput metrics
- the service stops making forward progress
A practical incident order
When the right branch is not obvious, use this order:
- identify the first visible bottleneck
- decide whether the system is crashing, stalling, or saturating
- separate memory-area pressure from backlog pressure
- separate hot-CPU contention from blocked-progress deadlock
- move into the narrowest guide that matches the strongest symptom
This reduces the chance of tuning the wrong layer first.
FAQ
Q. Is this a JVM tuning guide?
No. It is a symptom-first routing guide.
Q. What if queue growth and memory pressure happen together?
Start with the symptom that appeared first, then compare the paired guide immediately after.
Q. What if GC pauses got worse but OOM never happened?
That is still a valid memory-pressure branch. Start with GC pauses, then move to heap-dump analysis if needed.
Q. When should I check deadlock before CPU?
When forward progress stops and thread dumps point to waiting cycles rather than simple hot loops.
Read Next
- If memory pressure is the clearest symptom, continue with Java OutOfMemoryError.
- If queue backlog is easier to see than memory pressure, continue with Java Thread Pool Queue Keeps Growing.
- If pause spikes are more visible than crashes, continue with Java GC Pauses Too Long.
- If hot threads or contention dominate the incident, continue with Java JVM CPU High.
Related Posts
- Java OutOfMemoryError
- Java Thread Pool Queue Keeps Growing
- Java GC Pauses Too Long
- Java Heap Dump
- Java JVM CPU High
- Java Thread Contention High
- Java Thread Deadlock
- Java Connection Pool Exhausted
While AdSense review is pending, related guides are shown instead of ads.
Start Here
Continue with the core guides that pull steady search traffic.
- Middleware Troubleshooting Guide: Redis vs RabbitMQ vs Kafka A practical middleware troubleshooting guide for developers covering when to reach for Redis, RabbitMQ, or Kafka symptoms first, and which problem patterns usually belong to each tool.
- Kubernetes CrashLoopBackOff: What to Check First A practical Kubernetes CrashLoopBackOff troubleshooting guide covering startup failures, probe issues, config mistakes, and what to inspect first.
- Kafka Consumer Lag Increasing: Troubleshooting Guide A practical Kafka consumer lag troubleshooting guide covering what lag usually means, which consumer metrics to check first, and how poll timing, processing speed, and fetch patterns affect lag.
- Kafka Rebalancing Too Often: Common Causes and Fixes A practical Kafka troubleshooting guide covering why consumer groups rebalance too often, what poll timing and group protocol settings matter, and how to stop rebalances from interrupting useful work.
- Docker Container Keeps Restarting: What to Check First A practical Docker restart-loop troubleshooting guide covering exit codes, command failures, environment mistakes, health checks, and what to inspect first.
While AdSense review is pending, related guides are shown instead of ads.