Golang Goroutine Leak: How to Find It
Last updated on

Golang Goroutine Leak: How to Find It


When goroutine count keeps climbing and never settles down, the real problem is usually not that goroutines “leak” magically. The real problem is that some work started, got stuck, and never reached a clean exit path.

That is why goroutine leak incidents often show up together with timeouts, queue growth, memory pressure, or shutdown problems. The goroutines are only the visible symptom. The root cause is usually blocked communication, missing cancellation, or long-lived workers with no clear stop rule.

This guide focuses on the practical path:

  • how to confirm that you are looking at a real leak instead of a temporary spike
  • how to inspect where goroutines are stuck
  • how to fix the most common leak patterns in real Go services

The short version: first confirm that the count stays elevated after load falls, then inspect what those goroutines are waiting on, and finally trace whether channel ownership, cancellation, or worker shutdown is broken.

If you want the wider Go routing view first, step back to the Golang Troubleshooting Guide.


What a goroutine leak usually means

A goroutine leak usually means one of these situations:

  • a goroutine is blocked on a send or receive that will never complete
  • a background task keeps running after the request that started it is gone
  • a loop driven by a ticker, watcher, or retry path has no real exit condition
  • a worker pool or queue consumer has no clean shutdown path

In all of those cases, the process is still alive and the code still looks “normal” from a distance. The leak appears because more and more goroutines accumulate in the same waiting state over time.

That is also why a single goroutine dump is often more useful than staring at the total count alone. Count tells you that something is wrong. State tells you what is wrong.


Confirm that it is a real leak first

A temporary spike is not the same as a leak.

Before changing code, check whether goroutine count rises during a burst and then falls back, or whether it keeps climbing even after traffic drops. A healthy service may briefly create many goroutines under load and still recover normally.

runtime.NumGoroutine() is the fastest first signal:

package main

import (
	"log"
	"runtime"
	"time"
)

func logGoroutineCount() {
	for range time.Tick(30 * time.Second) {
		log.Printf("goroutines=%d", runtime.NumGoroutine())
	}
}

This does not tell you the cause, but it helps answer the first question: does the service recover after the burst, or does the baseline keep drifting upward?

If the count keeps rising after work completes, a leak becomes much more likely.


Where to inspect stuck goroutines

Once the count looks suspicious, inspect what the waiting goroutines are doing.

In practice, the most useful tools are:

  • runtime.NumGoroutine() for trend monitoring
  • net/http/pprof for goroutine and blocking profiles
  • stack dumps during or shortly after the incident

A small pprof setup is often enough:

package main

import (
	"log"
	"net/http"
	_ "net/http/pprof"
)

func main() {
	go func() {
		log.Println(http.ListenAndServe("localhost:6060", nil))
	}()
}

Then capture the goroutine profile:

go tool pprof http://localhost:6060/debug/pprof/goroutine

If you prefer a quick text view first, open:

curl http://localhost:6060/debug/pprof/goroutine?debug=2

What matters most is repetition. If many goroutines are stuck in the same function, channel operation, or wait path, that stack is usually the leak family worth chasing first.

The official Go diagnostics docs are a good companion when you want a broader debugging map: https://go.dev/doc/diagnostics.


Common leak patterns to check

1. Blocked send with no reliable receiver

One of the most common leak patterns is a goroutine trying to send into a channel that no longer has an active receiver.

func startWorker(ch chan<- int) {
	go func() {
		result := expensiveWork()
		ch <- result
	}()
}

This looks harmless, but if the receiver exits early or the channel is no longer drained, that send can block forever.

Check:

  • who owns closing the channel
  • whether the receiver can return before reading
  • whether the send should respect ctx.Done()

A safer pattern often looks like this:

func startWorker(ctx context.Context, ch chan<- int) {
	go func() {
		result := expensiveWork()
		select {
		case ch <- result:
		case <-ctx.Done():
			return
		}
	}()
}

2. Missing cancellation in background work

Leak incidents often happen when request-scoped work launches background goroutines that do not inherit cancellation.

func handle() {
	go syncData(context.Background())
}

If that work should end when the request ends, context.Background() is usually the wrong starting point.

Use the request context, propagate it through dependent calls, and make the worker exit when cancellation fires:

func handle(ctx context.Context) {
	go syncData(ctx)
}

func syncData(ctx context.Context) {
	for {
		select {
		case <-ctx.Done():
			return
		default:
			doOneStep()
		}
	}
}

3. Ticker or watcher loops with no stop path

Long-lived loops are easy to forget because they are meant to survive for a while. The leak starts when “for a while” quietly becomes forever.

func startWatcher(ctx context.Context) {
	ticker := time.NewTicker(10 * time.Second)
	go func() {
		for range ticker.C {
			refresh()
		}
	}()
}

Two things are wrong here:

  • the ticker is never stopped
  • the goroutine never exits on cancellation

The safer version wires both:

func startWatcher(ctx context.Context) {
	ticker := time.NewTicker(10 * time.Second)
	go func() {
		defer ticker.Stop()
		for {
			select {
			case <-ctx.Done():
				return
			case <-ticker.C:
				refresh()
			}
		}
	}()
}

4. Worker shutdown and queue drain bugs

Wait groups, worker pools, and queue consumers can also leak when shutdown order is incomplete.

Typical failure patterns are:

  • producers continue sending after workers began shutting down
  • workers wait forever on a queue that is never closed
  • shutdown waits on workers, but workers are blocked on resources that shutdown already removed

When you see many goroutines waiting in worker functions, compare the startup path and the shutdown path side by side. In leak bugs, the startup story is often clear, but the ownership of stopping is vague.


A simple debugging order that works well in production

When the incident is active, this order is usually enough to narrow it down:

  1. confirm that goroutine count stays elevated after load drops
  2. capture one or two goroutine dumps close to the incident
  3. group repeated stacks by waiting location
  4. inspect channel ownership and cancellation paths for those stacks
  5. review long-lived loops, worker lifecycles, and recent concurrency changes

This order works because it prevents a common mistake: jumping straight into code review before confirming where the stuck goroutines actually are.

If the same service is also timing out, compare the incident with Golang Context Deadline Exceeded. Timeout-heavy incidents often overlap with missing cancellation or blocked dependencies.


How to tell a temporary spike from a growing baseline

A healthy service can create extra goroutines during:

  • fan-out requests
  • batch processing
  • connection churn
  • short retry storms

That alone is not a leak.

A leak is more likely when:

  • the count remains elevated after the burst is gone
  • each traffic wave leaves the baseline a little higher
  • shutdown becomes slow because work never drains
  • the same stack traces appear repeatedly across dumps

If you only look at one moment, you may confuse a burst with a leak. If you compare the count before load, during load, and several minutes after load, the pattern becomes much clearer.


FAQ

Q. Is every goroutine spike a leak?

No. Healthy services can spike during bursts and recover afterward. The key question is whether the baseline settles back down.

Q. Where do leaks usually hide?

Most often in blocked channels, missing context cancellation, long-lived loops, and worker shutdown paths.

Q. What should I capture first during an incident?

Start with goroutine count and one goroutine dump. That usually gives enough signal to decide which code path to inspect next.


Sources:

Start Here

Continue with the core guides that pull steady search traffic.