Java JVM CPU High: What to Check First
Last updated on

Java JVM CPU High: What to Check First


When Java CPU usage goes high, the easiest mistake is to treat it as one generic scaling problem. In reality, high CPU can come from very different sources: real application work, garbage collection pressure, retry loops, lock contention, or threads that keep waking up without making progress.

The short version: start with hot threads, then compare them with GC activity from the same incident window. Host-level CPU tells you that pressure exists. Hot thread stacks tell you whether the CPU is being spent on useful work, memory cleanup, retries, or contention.

If you want the wider Java routing view first, step back to the Java Troubleshooting Guide.


Start with hot threads, not only host CPU

Machine-level CPU charts are helpful, but they do not tell you where the pressure is coming from.

The first practical split is:

  • CPU spent in application code
  • CPU spent in garbage collection
  • CPU wasted in retries, spins, or coordination

That distinction matters because each path leads to a different fix.


What high CPU usually looks like in production

This symptom often appears alongside:

  • request latency spikes
  • queue growth or backlog
  • high allocation rates
  • thread contention or blocked-worker side effects
  • burst traffic that never fully settles back down

Sometimes the service is simply busy with legitimate work. But just as often, the CPU rise is a side effect of wasted work or a bottleneck somewhere else.


Common causes

1. Busy request paths or tight loops

Application code may be doing far more work than expected after:

  • traffic growth
  • payload size changes
  • a new feature rollout
  • accidental quadratic behavior

If hot stacks point to request parsing, serialization, filtering, sorting, or repeated transformations, the CPU rise may be real useful work that grew beyond assumptions.

2. GC overhead is too visible

Allocation churn can turn memory pressure into CPU pressure.

This often happens when:

  • large temporary objects are created rapidly
  • request fan-out creates many short-lived allocations
  • caches or queues retain more than expected
  • heap pressure forces frequent collections

In these cases, the application may look CPU-bound even though the deeper driver is memory behavior.

3. Lock contention, retries, or spin loops waste CPU

Not all CPU is productive.

Threads may repeatedly:

  • wake up and retry
  • poll a shared state
  • contend for the same lock
  • spin on availability checks

That can produce high CPU without proportional throughput.

4. Backlog pressure moves CPU to the wrong layer

If queues grow, workers saturate, or connection waits cascade, the system may spend more CPU on:

  • scheduling
  • retries
  • timeouts
  • queue management

The visible CPU spike then hides the real bottleneck.

5. Too many threads are fighting over shared state

Increasing thread count can sometimes worsen CPU behavior.

More threads may mean:

  • more context switching
  • more monitor contention
  • more failed acquisition attempts
  • more GC pressure from queued or duplicated work

This is why thread count is rarely the first setting to change.


A practical debugging order

1. Capture hot threads during the incident

Start with the threads actually consuming CPU.

Useful commands include:

top -H -p <pid>
jcmd <pid> Thread.print

Matching a hot OS thread to a Java stack is often the fastest way to move from “CPU is high” to an actual culprit.

2. Compare CPU spikes with GC activity

Look at:

  • GC frequency
  • GC pause timing
  • allocation rate
  • old generation pressure

If CPU spikes line up with GC churn, heap behavior is likely part of the story.

3. Check for retries, polling, and contention

Ask:

  • are there loops that keep checking state?
  • are timeouts triggering retry storms?
  • are many threads contending for one monitor?

If throughput does not rise with CPU, wasted work should move up the suspect list.

4. Compare queue growth and latency with CPU rise

If queue depth and latency rise before CPU does, then high CPU may be the effect of overload rather than the original cause.

This is especially common when:

  • executors are saturated
  • callers retry aggressively
  • downstream systems slow down

5. Only tune threads or heap after the source is clear

If the problem is real application work, scaling may help.

If the problem is wasted work or memory churn, scaling alone may just make the incident more expensive.


Example: hot loop hidden as “high CPU”

while (!ready.get()) {
    // keep checking
}

This code may look harmless in a small test, but under production load it can keep cores busy without doing useful work.

A better pattern usually involves:

  • waiting on a proper signal
  • backing off instead of spinning
  • reducing needless polling

What to change after you find the pattern

If hot stacks point to real work

Optimize the expensive path or scale the service intentionally.

If hot stacks point to GC

Reduce allocation churn, inspect retention, and follow the memory path before changing random heap flags.

If hot stacks point to retries or spins

Reduce waste first. Backoff, deduplicate, or redesign the coordination path.

If hot stacks point to contention

Shorten critical sections and revisit shared-state design before adding threads.

If CPU rises behind queue growth

Treat the queue and backpressure problem first, because CPU may be downstream of that incident.


A useful incident question

Ask this:

Is the CPU being spent on useful application work, memory cleanup, or work that should not exist at all?

That question is much more actionable than “Why is CPU high?”


FAQ

Q. Does high CPU always mean not enough threads?

No. Extra threads can make contention, scheduling overhead, and GC pressure worse.

Q. What is the fastest first step?

Capture hot threads and compare them with GC activity from the same incident window.

Q. Should I scale the service first?

Only after you know whether the CPU rise comes from real work, wasted work, or memory pressure.

Q. Can queue backlog cause CPU spikes too?

Yes. Retries, scheduling churn, and coordination overhead can all rise when the system falls behind.


Sources:

Start Here

Continue with the core guides that pull steady search traffic.