When you capture a Java heap dump, the hardest part is often deciding whether you are looking at a true leak, a workload burst, or a backlog that temporarily retains too many objects. A heap dump gives you a real object graph, but it still needs interpretation.
The short version: start with retained memory, not raw object count. The biggest class by instance count is not always the real problem. The more useful question is which objects dominate retained heap and why those objects are still reachable.
If you want the wider Java routing view first, step back to the Java Troubleshooting Guide.
Start with retained memory, not raw count
A class with many instances is not automatically the culprit.
What matters more is:
- retained size
- dominator relationships
- reference chains
- whether the objects should still be alive at all
That is why dominators and paths to GC roots are usually more valuable than a simple histogram.
What heap dumps are most useful for
Heap dumps help most when you need to answer questions like:
- what is retaining the most memory right now?
- is this growth coming from cache, backlog, or leaked references?
- are objects staying alive longer than expected?
- is this consistent with OOM or long GC pauses?
A dump taken at the right moment can replace guessing with a concrete retention path.
Common causes
1. Large collections keep growing
This is one of the most common findings.
Maps, lists, queues, and caches can dominate heap when entries are:
- never evicted
- consumed too slowly
- duplicated across requests
- larger than expected
The memory issue may not be a leak in the classic sense. It may simply be unbounded retention.
2. Backlog retains request data
Queued work can keep payloads, contexts, responses, and closures alive much longer than intended.
If the system is falling behind, the heap dump may show the queue symptom more clearly than the original performance bottleneck.
3. Reference chains prevent cleanup
Objects that should be collectible may still be reachable through:
- singletons
- static holders
- thread locals
- listener registries
- caches with no real expiration
This is where paths to GC roots become especially valuable.
4. The snapshot was taken at the wrong moment
A dump captured during a short burst may show temporary pressure rather than a long-term leak.
That is why one dump is useful, but comparing timing with traffic and multiple dumps is often better.
5. Large retained graphs are only part of the story
Sometimes the dump shows large retained structures, but the real incident started elsewhere:
- queue buildup
- slow downstream dependencies
- traffic spikes
- retry storms
The dump still helps, but only if you read it in operational context.
A practical debugging order
1. Identify the largest retained objects and dominators
Start with what dominates retained heap, not what merely has many instances.
This tells you where to spend your attention first.
2. Inspect the reference path that keeps them alive
Ask:
- which object owns this graph?
- should that owner still be reachable?
- is the reference intentional or accidental?
This step is often where the incident shifts from “memory is high” to a real code path.
3. Compare caches, queues, and large collections with expected size
Do not just ask whether they are large. Ask whether they are large relative to design expectations.
For example:
- is the queue size normal for current load?
- is the cache bounded?
- did a collection grow after a deployment?
4. Compare heap dump timing with traffic or deployment changes
The same retained graph can mean very different things depending on when the dump was taken.
A snapshot captured during a brief burst is not interpreted the same way as one taken after hours of steady growth.
5. Move back to pauses or OOM when the dump confirms retention
If the same retained graph is stretching GC pauses or pushing the service toward failure, connect the evidence back to the operational symptom.
Example: queue retention disguised as a leak
jcmd <pid> GC.heap_dump heap.hprof
Suppose the dump shows many request payload objects retained by tasks in a thread pool queue. That can look like a classic memory leak at first, but the real issue may be that workers slowed down and backlog kept those objects alive too long.
This is why a heap dump should be read together with queue and throughput signals.
What to change after you find the retained graph
If a cache or collection is unbounded
Add real limits, eviction, or lifecycle control.
If backlog retains too much data
Reduce queue buildup and fix the throughput bottleneck that keeps tasks waiting.
If thread locals or static references hold data
Tighten cleanup and ownership boundaries.
If the dump reflects a short burst
Confirm with later snapshots before labeling it a true leak.
If the same graph keeps growing across dumps
Treat it as a strong leak or retention signal and trace that owner path directly.
A useful incident question
Ask this:
Which object graph owns the most retained memory, and should that ownership still exist at this point in the request or task lifecycle?
That question usually leads to a real fix much faster than staring at class counts.
FAQ
Q. Does a large object count mean a leak?
Not always. Retained size and reachability matter more than count alone.
Q. Should I take several dumps?
Yes, if you need to compare whether the same retained graph keeps growing over time.
Q. What is the fastest first step?
Find the biggest dominators and the reference path that keeps them alive.
Q. Can a heap dump show backlog rather than a true leak?
Yes. Queued work often retains real memory pressure without being a classic forever leak.
Read Next
- If retained objects are also stretching pause time, continue with Java GC Pauses Too Long.
- If the same retained graph is now pushing the service into failure, compare with Java OutOfMemoryError.
- If queue growth appears to be the owner path, check Java Thread Pool Queue Keeps Growing.
- If you want the wider routing view again, return to the Java Troubleshooting Guide.
Related Posts
- Java GC Pauses Too Long
- Java OutOfMemoryError
- Java Thread Pool Queue Keeps Growing
- Java Troubleshooting Guide
Sources:
While AdSense review is pending, related guides are shown instead of ads.
Start Here
Continue with the core guides that pull steady search traffic.
- Middleware Troubleshooting Guide: Redis vs RabbitMQ vs Kafka A practical middleware troubleshooting guide for developers covering when to reach for Redis, RabbitMQ, or Kafka symptoms first, and which problem patterns usually belong to each tool.
- Kubernetes CrashLoopBackOff: What to Check First A practical Kubernetes CrashLoopBackOff troubleshooting guide covering startup failures, probe issues, config mistakes, and what to inspect first.
- Kafka Consumer Lag Increasing: Troubleshooting Guide A practical Kafka consumer lag troubleshooting guide covering what lag usually means, which consumer metrics to check first, and how poll timing, processing speed, and fetch patterns affect lag.
- Kafka Rebalancing Too Often: Common Causes and Fixes A practical Kafka troubleshooting guide covering why consumer groups rebalance too often, what poll timing and group protocol settings matter, and how to stop rebalances from interrupting useful work.
- Docker Container Keeps Restarting: What to Check First A practical Docker restart-loop troubleshooting guide covering exit codes, command failures, environment mistakes, health checks, and what to inspect first.
While AdSense review is pending, related guides are shown instead of ads.