When a Java service stops making progress, the problem may be a real thread deadlock, heavy lock contention that looks similar, or a queue stalled behind a small number of blocked workers. Those situations can feel identical from the outside, but they do not have the same fix.
The short version: separate true deadlock from generic waiting before you restart the process. A real deadlock means at least two execution paths are waiting on each other in a cycle. High contention and worker starvation may look frozen too, but the thread states tell a different story.
If you want the wider Java routing view first, step back to the Java Troubleshooting Guide.
Start with thread state and lock ownership
Deadlock is fundamentally about a wait cycle.
That makes these artifacts more useful than request latency alone:
- thread dumps
- monitor ownership
- lock ordering
- repeated blocked/waiting relationships
If you do not capture that state before restart, you often lose the only clear proof of what happened.
What deadlock usually looks like
In production, a suspected deadlock often appears as:
- requests hanging indefinitely
- worker threads staying blocked for the same long interval
- queues growing behind a small set of stuck threads
- very low throughput even when CPU is not especially high
- repeated thread dumps showing the same wait relationships
If the same threads keep waiting on the same locks across multiple dumps, a true cycle becomes much more plausible.
Common causes
1. Lock ordering is inconsistent
This is the classic cause.
One code path acquires locks in one order, while another path acquires the same locks in the opposite order.
synchronized (a) {
synchronized (b) {
// work
}
}
If another path takes b and then a, deadlock only needs the right timing to appear.
2. Multiple locks are held across broad critical sections
Even if lock ordering is mostly safe, large synchronized scopes increase the chance that two flows overlap badly.
This becomes more dangerous when:
- several locks are nested
- critical sections do more than state mutation
- shared objects are touched by many request paths
3. Blocking work happens inside synchronized sections
I/O or slow work inside a lock does not automatically create a true deadlock, but it can make contention and waiting cascades much worse.
It also makes the incident harder to interpret because many threads pile up behind one stalled path.
4. Worker starvation hides the real issue
Queues may keep growing because only a few threads are stuck while the rest wait behind them.
In that case, operators may diagnose “deadlock” when the real issue is:
- too few free workers
- same-pool nested waits
- one stuck dependency path blocking everyone else
5. It is heavy contention, not deadlock
This is a very common false alarm.
High contention can make a service look frozen even when there is no strict cyclic wait. Progress is still possible, just painfully slow.
A practical debugging order
1. Capture thread dumps from the incident window
Take more than one if possible.
You want to know whether the same threads remain:
BLOCKEDWAITINGTIMED_WAITING
around the same locks and owners across time.
2. Identify owner, waiting, and blocked relationships
For each suspicious lock, find:
- which thread owns it
- which thread is waiting for it
- whether the owner is itself waiting on another lock
This is how a cycle becomes visible.
3. Check lock order across code paths
Search for the synchronized or lock-taking paths involved and compare their acquisition order.
If the same pair of locks is taken in different order in different code paths, the root cause becomes much clearer.
4. Compare blocked threads with queue growth and worker starvation
If backlog is growing but only a few threads are truly stuck, you may be looking at downstream blocking or executor starvation rather than a lock cycle.
5. Only restart after you preserve enough evidence
Restarting may restore availability, but it also destroys the state you need to fix the actual problem.
If the service must be restarted, capture as much thread and lock state as you can first.
Example: opposite lock ordering
// path 1
synchronized (a) {
synchronized (b) {
update();
}
}
// path 2
synchronized (b) {
synchronized (a) {
update();
}
}
This code may run for a long time without incident. Then one day the timing lines up under load and the deadlock finally appears.
That is why “it worked in tests” does not rule out lock-order bugs.
What to change after you confirm the issue
Enforce one lock order everywhere
This is the most direct fix for classic deadlock.
Reduce nested locking
If too many paths hold multiple locks at once, simplify ownership and critical sections.
Move slow work outside synchronized sections
Even when it is not the root deadlock, slow work inside locks magnifies incidents.
Separate deadlock from starvation patterns
If the real issue is pool starvation or queue buildup, fix that path instead of focusing only on monitor cycles.
Add better diagnostic hooks
Thread dumps, blocked thread metrics, and lock-related incident playbooks reduce guesswork the next time this happens.
A useful incident question
Ask this:
Are two or more threads waiting on each other in a stable cycle, or are many threads simply piling up behind one slow or contended path?
That distinction changes the fix completely.
FAQ
Q. Is every blocked thread a deadlock?
No. Many incidents are contention or starvation rather than a true cyclic wait.
Q. What is the fastest first step?
Take thread dumps and look for repeated waiting cycles around the same locks.
Q. Should I restart immediately?
Only after you capture enough state to understand whether the problem is deadlock, contention, or backlog.
Q. Can CPU still be high during deadlock investigations?
Yes. Some threads may still spin, retry, or process backlog while the truly deadlocked threads remain stuck.
Read Next
- If backlog is easier to see than lock ownership, continue with Java Thread Pool Queue Keeps Growing.
- If spinning threads and runtime heat are more obvious than blocked monitors, compare with Java JVM CPU High.
- If the issue looks more like hot locking than a true cycle, check Java Thread Contention High.
- If you want the broader routing view, return to the Java Troubleshooting Guide.
Related Posts
- Java JVM CPU High
- Java Thread Pool Queue Keeps Growing
- Java Thread Contention High
- Java Troubleshooting Guide
Sources:
- https://docs.oracle.com/en/java/javase/21/troubleshoot/
- https://docs.oracle.com/javase/8/docs/api/java/lang/Thread.html
While AdSense review is pending, related guides are shown instead of ads.
Start Here
Continue with the core guides that pull steady search traffic.
- Middleware Troubleshooting Guide: Redis vs RabbitMQ vs Kafka A practical middleware troubleshooting guide for developers covering when to reach for Redis, RabbitMQ, or Kafka symptoms first, and which problem patterns usually belong to each tool.
- Kubernetes CrashLoopBackOff: What to Check First A practical Kubernetes CrashLoopBackOff troubleshooting guide covering startup failures, probe issues, config mistakes, and what to inspect first.
- Kafka Consumer Lag Increasing: Troubleshooting Guide A practical Kafka consumer lag troubleshooting guide covering what lag usually means, which consumer metrics to check first, and how poll timing, processing speed, and fetch patterns affect lag.
- Kafka Rebalancing Too Often: Common Causes and Fixes A practical Kafka troubleshooting guide covering why consumer groups rebalance too often, what poll timing and group protocol settings matter, and how to stop rebalances from interrupting useful work.
- Docker Container Keeps Restarting: What to Check First A practical Docker restart-loop troubleshooting guide covering exit codes, command failures, environment mistakes, health checks, and what to inspect first.
While AdSense review is pending, related guides are shown instead of ads.