Java Troubleshooting Guide: Where to Start With OOM, Thread Pools, and Runtime Pressure
Last updated on

Java Troubleshooting Guide: Where to Start With OOM, Thread Pools, and Runtime Pressure


The hardest part of a Java incident is usually not remembering one more JVM flag. It is deciding whether the visible bottleneck is memory pressure, queue backlog, hot-thread contention, or a runtime that was already saturating before the crash appeared.

This guide is the routing page for Java incidents on this blog. It helps you decide:

  • whether the visible problem belongs mostly to the OOM branch, the executor-backlog branch, or the GC and CPU branches
  • what to inspect in the first few minutes so you can choose the next article quickly
  • which false starts often waste time during Java troubleshooting

The short version: route by the strongest visible bottleneck first, not by the most familiar tuning habit.


When this hub is the right starting point

Use this page when:

  • OutOfMemoryError is visible
  • a thread-pool queue keeps growing
  • GC pauses suddenly become much longer
  • CPU stays hot while throughput falls
  • thread dumps show blocked progress or waiting cycles
  • the service looks saturated before it fully crashes

At that point, the most valuable thing is usually choosing the right branch, not applying a fix immediately.

A five-minute JVM triage pass

jcmd <pid> GC.heap_info
jcmd <pid> Thread.print
jcmd <pid> VM.native_memory summary

VM.native_memory summary is especially helpful when NMT is enabled. The point of this command set is not to solve the incident from one screen. The point is to decide which branch deserves the next thirty minutes.

  • strong heap or memory-area pressure: start with memory
  • blocked threads and waiting cycles: start with deadlock or contention
  • hot threads and saturated executors: start with CPU or backlog

Three questions to ask early

1. Is the system crashing, stalling, or saturating?

An OOM, a deadlock, and a saturated executor can all look like “the service is bad,” but the first guide should not be the same.

2. Is the dominant shape memory growth or queue growth?

Large memory use does not automatically mean memory is the root cause. Queue backlog often pushes memory upward as a secondary symptom.

3. Is the visible pain mostly pause time or contention?

Long GC pauses and hot CPU can both hurt throughput. The right first branch depends on which one leads the incident.

When the memory branch should lead

Start by taking a heap dump to identify where OOM originated when:

  • heap, metaspace, or native memory pressure is visible
  • the exact OOM variant matters
  • large collections, caches, or retained payloads look suspicious
  • class-metadata growth looks abnormal

If the retained-object story is still unclear, continue immediately by analyzing a heap dump to inspect the reference tree.

When the backlog or executor branch should lead

Start by checking your thread pool and queue settings when:

  • executor queues keep growing
  • workers stay busy or blocked
  • backlog stands out before memory becomes the loudest signal

Useful adjacent guides:

This branch is really about explaining why accepted work is not becoming completed work quickly enough.

When the GC or retained-heap branch should lead

Start with Java GC Pauses Too Long when:

  • pause spikes are more visible than a clean crash
  • traffic or payload changes increased allocation churn
  • old-generation growth looks suspicious

If you need retained-object evidence rather than pause symptoms, move next to analyzing a heap dump to spot long-lived large objects.

When the CPU or contention branch should lead

Start by getting a thread dump to see where hot threads are spending their time when:

  • CPU remains hot while throughput gets worse
  • hot threads matter more than raw host-level metrics
  • retries, spinning, contention, or wasted work look likely

If lock contention is especially obvious, find the exact lock acquisition bottleneck and review your concurrency design.

When the deadlock or stalled-progress branch should lead

Start by looking for lock ordering inversions or deadlocks when the service looks stuck rather than simply slow.

This branch fits when:

  • thread dumps show waiting cycles
  • lock ownership matters more than throughput metrics
  • the service stops making real forward progress

In that situation, unblocking progress matters more than tuning throughput.

Common wrong starts

Raising heap when backlog is the real driver

If queue growth is leading the incident, more heap may only hide the symptom longer.

Treating every throughput drop as a CPU problem

Long GC pauses can feel like hot-CPU incidents from the outside, even when the first branch should be memory and allocation behavior.

Tweaking executors while deadlock signals are already visible

If thread dumps show waiting cycles, pool-size changes can make the picture noisier instead of clearer.

A very short routing map

  • OOM or memory-area pressure is clearest: start with memory
  • queue growth is clearest: start with thread-pool backlog
  • pause spikes dominate: start with GC
  • hot threads or contention dominate: start with CPU
  • forward progress has nearly stopped: start with deadlock

Then compare one adjacent branch immediately after. Java incidents often overlap across backlog, memory, and lock behavior.

A practical incident order

  1. Write down the first bottleneck that became obvious.
  2. Decide whether the system is crashing, stalling, or saturating.
  3. Separate memory-area pressure from backlog pressure.
  4. Separate hot-thread contention from blocked-progress deadlock.
  5. Read the narrowest guide that matches the strongest symptom, then compare one neighboring branch.

That order usually reduces the chance of tuning the wrong layer first.

FAQ

Q. Is this a JVM tuning guide?

No. It is a symptom-first troubleshooting hub.

Q. What if queue growth and memory pressure appear together?

Start with the symptom that became visible first, then compare the adjacent guide right after.

Q. What if GC pauses got worse but OOM never happened?

That is still a valid memory-pressure branch. Start with GC pauses, then move to heap-dump analysis if needed.

Q. When should I check deadlock before CPU?

When forward progress has mostly stopped and thread dumps point to waiting cycles rather than just hot loops.

  • If memory pressure is the clearest symptom, prioritize finding the OOM root cause via heap dumps.
  • If queue backlog is easiest to see or thread starvation is suspected, review the Java ForkJoinPool Starvation case study.
  • If pause spikes are more visible than crashes, continue with Java GC Pauses Too Long.
  • If hot threads or contention dominate, use thread dumps to isolate the exact bottleneck.

Start Here

Continue with the core guides that pull steady search traffic.

Sponsored