Java Troubleshooting Guide: Where to Start With OOM, Thread Pools, and Runtime Pressure
Last updated on

Java Troubleshooting Guide: Where to Start With OOM, Thread Pools, and Runtime Pressure


When a Java service starts failing under load, the hardest part is usually not finding a tuning flag. It is deciding whether the incident is truly memory pressure, queue backlog, thread contention, or a runtime that is already overloaded before the crash appears.

The short version: start with the visible bottleneck, not with a favorite JVM fix. Then move into the narrowest guide that matches what you can actually observe.

Use this page as the Java troubleshooting hub on this blog.


Start with the first visible bottleneck

Useful first questions:

  • are you seeing OutOfMemoryError?
  • is a thread pool queue growing continuously?
  • are GC pauses suddenly much longer?
  • does CPU stay hot while throughput falls?
  • do thread dumps show blocked progress or deadlock?
  • is the service saturating before it fully crashes?

That framing keeps you from raising heap when backlog is the real issue, or tuning executors when the process is actually failing due to memory-area pressure.

A quick JVM triage command set

When you need a fast first look, these commands often help choose the next branch:

jcmd <pid> GC.heap_info
jcmd <pid> Thread.print
jcmd <pid> VM.native_memory summary

You are not trying to solve the whole incident from one command. You are trying to decide whether the strongest signal is memory pressure, queue growth, hot threads, or blocked progress.

A simple Java triage map

Use this as the fastest first split:

  • OOM variants, retained objects, memory-area pressure: start with OOM
  • queue growth, busy workers, saturation: start with thread-pool backlog
  • pause spikes, allocation churn, retained heap: start with GC pauses
  • hot threads, retry loops, contention, wasted work: start with JVM CPU
  • blocked locks, waiting cycles, no forward progress: start with deadlock

This map is intentionally simple. It is better to pick one strong branch than to half-debug three categories at once.

When the problem is probably memory pressure

Start with Java OutOfMemoryError when the symptom looks like:

  • heap, metaspace, or native pressure is visible
  • the exact OOM variant matters
  • large collections, caches, or retained payloads look suspicious
  • class metadata growth seems abnormal

If the retained-object story is still unclear, continue with Java Heap Dump.

When the problem is probably queue backlog or executor saturation

Start with Java Thread Pool Queue Keeps Growing when the symptom looks like:

  • executor queues keep growing
  • workers stay busy or blocked
  • throughput falls before memory becomes the loudest signal

For adjacent branches, continue with:

When the problem is probably long GC pauses or retained heap

Start with Java GC Pauses Too Long when the symptom looks like:

  • pause spikes are easier to see than a clean crash
  • allocation churn changed after traffic or payload shifts
  • old generation growth looks suspicious

If you need retained-object evidence rather than pause symptoms, move next to Java Heap Dump.

When the problem is probably hot CPU or contention

Start with Java JVM CPU High when the symptom looks like:

  • CPU remains hot while throughput gets worse
  • hot threads matter more than raw host metrics
  • retries, spinning, contention, or wasted work are likely

If lock contention is especially visible, continue with Java Thread Contention High.

When the problem is probably stalled progress

Start with Java Thread Deadlock when the service looks stuck rather than simply slow.

This branch fits when:

  • thread dumps show waiting cycles
  • lock ownership matters more than throughput metrics
  • the service stops making forward progress

A practical incident order

When the right branch is not obvious, use this order:

  1. identify the first visible bottleneck
  2. decide whether the system is crashing, stalling, or saturating
  3. separate memory-area pressure from backlog pressure
  4. separate hot-CPU contention from blocked-progress deadlock
  5. move into the narrowest guide that matches the strongest symptom

This reduces the chance of tuning the wrong layer first.

FAQ

Q. Is this a JVM tuning guide?

No. It is a symptom-first routing guide.

Q. What if queue growth and memory pressure happen together?

Start with the symptom that appeared first, then compare the paired guide immediately after.

Q. What if GC pauses got worse but OOM never happened?

That is still a valid memory-pressure branch. Start with GC pauses, then move to heap-dump analysis if needed.

Q. When should I check deadlock before CPU?

When forward progress stops and thread dumps point to waiting cycles rather than simple hot loops.

Start Here

Continue with the core guides that pull steady search traffic.