When goroutine count keeps climbing and never settles down, the real problem is usually not that goroutines “leak” magically. The real problem is that some work started, got stuck, and never reached a clean exit path.
That is why goroutine leak incidents often show up together with timeouts, queue growth, memory pressure, or shutdown problems. The goroutines are only the visible symptom. The root cause is usually blocked communication, missing cancellation, or long-lived workers with no clear stop rule.
This guide focuses on the practical path:
- how to confirm that you are looking at a real leak instead of a temporary spike
- how to inspect where goroutines are stuck
- how to fix the most common leak patterns in real Go services
The short version: first confirm that the count stays elevated after load falls, then inspect what those goroutines are waiting on, and finally trace whether channel ownership, cancellation, or worker shutdown is broken.
If you want the wider Go routing view first, step back to the Golang Troubleshooting Guide.
What a goroutine leak usually means
A goroutine leak usually means one of these situations:
- a goroutine is blocked on a send or receive that will never complete
- a background task keeps running after the request that started it is gone
- a loop driven by a ticker, watcher, or retry path has no real exit condition
- a worker pool or queue consumer has no clean shutdown path
In all of those cases, the process is still alive and the code still looks “normal” from a distance. The leak appears because more and more goroutines accumulate in the same waiting state over time.
That is also why a single goroutine dump is often more useful than staring at the total count alone. Count tells you that something is wrong. State tells you what is wrong.
Confirm that it is a real leak first
A temporary spike is not the same as a leak.
Before changing code, check whether goroutine count rises during a burst and then falls back, or whether it keeps climbing even after traffic drops. A healthy service may briefly create many goroutines under load and still recover normally.
runtime.NumGoroutine() is the fastest first signal:
package main
import (
"log"
"runtime"
"time"
)
func logGoroutineCount() {
for range time.Tick(30 * time.Second) {
log.Printf("goroutines=%d", runtime.NumGoroutine())
}
}
This does not tell you the cause, but it helps answer the first question: does the service recover after the burst, or does the baseline keep drifting upward?
If the count keeps rising after work completes, a leak becomes much more likely.
Where to inspect stuck goroutines
Once the count looks suspicious, inspect what the waiting goroutines are doing.
In practice, the most useful tools are:
runtime.NumGoroutine()for trend monitoringnet/http/pproffor goroutine and blocking profiles- stack dumps during or shortly after the incident
A small pprof setup is often enough:
package main
import (
"log"
"net/http"
_ "net/http/pprof"
)
func main() {
go func() {
log.Println(http.ListenAndServe("localhost:6060", nil))
}()
}
Then capture the goroutine profile:
go tool pprof http://localhost:6060/debug/pprof/goroutine
If you prefer a quick text view first, open:
curl http://localhost:6060/debug/pprof/goroutine?debug=2
What matters most is repetition. If many goroutines are stuck in the same function, channel operation, or wait path, that stack is usually the leak family worth chasing first.
The official Go diagnostics docs are a good companion when you want a broader debugging map: https://go.dev/doc/diagnostics.
Common leak patterns to check
1. Blocked send with no reliable receiver
One of the most common leak patterns is a goroutine trying to send into a channel that no longer has an active receiver.
func startWorker(ch chan<- int) {
go func() {
result := expensiveWork()
ch <- result
}()
}
This looks harmless, but if the receiver exits early or the channel is no longer drained, that send can block forever.
Check:
- who owns closing the channel
- whether the receiver can return before reading
- whether the send should respect
ctx.Done()
A safer pattern often looks like this:
func startWorker(ctx context.Context, ch chan<- int) {
go func() {
result := expensiveWork()
select {
case ch <- result:
case <-ctx.Done():
return
}
}()
}
2. Missing cancellation in background work
Leak incidents often happen when request-scoped work launches background goroutines that do not inherit cancellation.
func handle() {
go syncData(context.Background())
}
If that work should end when the request ends, context.Background() is usually the wrong starting point.
Use the request context, propagate it through dependent calls, and make the worker exit when cancellation fires:
func handle(ctx context.Context) {
go syncData(ctx)
}
func syncData(ctx context.Context) {
for {
select {
case <-ctx.Done():
return
default:
doOneStep()
}
}
}
3. Ticker or watcher loops with no stop path
Long-lived loops are easy to forget because they are meant to survive for a while. The leak starts when “for a while” quietly becomes forever.
func startWatcher(ctx context.Context) {
ticker := time.NewTicker(10 * time.Second)
go func() {
for range ticker.C {
refresh()
}
}()
}
Two things are wrong here:
- the ticker is never stopped
- the goroutine never exits on cancellation
The safer version wires both:
func startWatcher(ctx context.Context) {
ticker := time.NewTicker(10 * time.Second)
go func() {
defer ticker.Stop()
for {
select {
case <-ctx.Done():
return
case <-ticker.C:
refresh()
}
}
}()
}
4. Worker shutdown and queue drain bugs
Wait groups, worker pools, and queue consumers can also leak when shutdown order is incomplete.
Typical failure patterns are:
- producers continue sending after workers began shutting down
- workers wait forever on a queue that is never closed
- shutdown waits on workers, but workers are blocked on resources that shutdown already removed
When you see many goroutines waiting in worker functions, compare the startup path and the shutdown path side by side. In leak bugs, the startup story is often clear, but the ownership of stopping is vague.
A simple debugging order that works well in production
When the incident is active, this order is usually enough to narrow it down:
- confirm that goroutine count stays elevated after load drops
- capture one or two goroutine dumps close to the incident
- group repeated stacks by waiting location
- inspect channel ownership and cancellation paths for those stacks
- review long-lived loops, worker lifecycles, and recent concurrency changes
This order works because it prevents a common mistake: jumping straight into code review before confirming where the stuck goroutines actually are.
If the same service is also timing out, compare the incident with Golang Context Deadline Exceeded. Timeout-heavy incidents often overlap with missing cancellation or blocked dependencies.
How to tell a temporary spike from a growing baseline
A healthy service can create extra goroutines during:
- fan-out requests
- batch processing
- connection churn
- short retry storms
That alone is not a leak.
A leak is more likely when:
- the count remains elevated after the burst is gone
- each traffic wave leaves the baseline a little higher
- shutdown becomes slow because work never drains
- the same stack traces appear repeatedly across dumps
If you only look at one moment, you may confuse a burst with a leak. If you compare the count before load, during load, and several minutes after load, the pattern becomes much clearer.
FAQ
Q. Is every goroutine spike a leak?
No. Healthy services can spike during bursts and recover afterward. The key question is whether the baseline settles back down.
Q. Where do leaks usually hide?
Most often in blocked channels, missing context cancellation, long-lived loops, and worker shutdown paths.
Q. What should I capture first during an incident?
Start with goroutine count and one goroutine dump. That usually gives enough signal to decide which code path to inspect next.
Read Next
- If you want the Go routing view first, go back to the Golang Troubleshooting Guide.
- If timeouts are the more visible symptom, go next to Golang Context Deadline Exceeded.
- If you want to compare a different concurrency saturation pattern, open Java Thread Pool Queue Growing.
Related Posts
- Golang Troubleshooting Guide
- Golang Context Deadline Exceeded
- Java Thread Pool Queue Growing
- RabbitMQ Queue Keeps Growing
Sources:
While AdSense review is pending, related guides are shown instead of ads.
Start Here
Continue with the core guides that pull steady search traffic.
- Middleware Troubleshooting Guide: Redis vs RabbitMQ vs Kafka A practical middleware troubleshooting guide for developers covering when to reach for Redis, RabbitMQ, or Kafka symptoms first, and which problem patterns usually belong to each tool.
- Kubernetes CrashLoopBackOff: What to Check First A practical Kubernetes CrashLoopBackOff troubleshooting guide covering startup failures, probe issues, config mistakes, and what to inspect first.
- Kafka Consumer Lag Increasing: Troubleshooting Guide A practical Kafka consumer lag troubleshooting guide covering what lag usually means, which consumer metrics to check first, and how poll timing, processing speed, and fetch patterns affect lag.
- Kafka Rebalancing Too Often: Common Causes and Fixes A practical Kafka troubleshooting guide covering why consumer groups rebalance too often, what poll timing and group protocol settings matter, and how to stop rebalances from interrupting useful work.
- Docker Container Keeps Restarting: What to Check First A practical Docker restart-loop troubleshooting guide covering exit codes, command failures, environment mistakes, health checks, and what to inspect first.
While AdSense review is pending, related guides are shown instead of ads.