When Gunicorn workers keep restarting, the real issue may be timeout pressure, memory growth, boot-time failure, deliberate recycle settings, or a signal path that keeps killing workers under load.
That is why restart incidents are easy to misread. Some restarts are expected because of configured limits. Others signal a real runtime problem. If you do not separate those two first, you can spend time debugging healthy recycle behavior while the real issue is elsewhere.
This guide focuses on the practical path:
- how to separate boot failures, runtime restarts, and deliberate worker recycle
- what restart timing tells you about the likely cause
- what to inspect first in timeout, memory, and startup paths
The short version: first determine whether the restart happens at boot, during runtime, or on an expected recycle schedule, then compare timing with traffic, memory, and timeout-heavy paths before changing worker settings.
If you want the broader Python routing view first, go to the Python Troubleshooting Guide.
Start with restart timing
When do workers restart?
- immediately on boot
- after traffic spikes
- after a fixed time or request count
- after memory climbs
That timing usually points to the correct branch much faster than reading one stack trace in isolation.
It helps separate:
- startup failure
- runtime instability
- expected recycle behavior
Without that split, teams often treat every restart as an application crash when some are actually configured restarts.
Boot failure versus runtime restart is the first big branch
If workers restart immediately on boot, suspect:
- import failures
- config mistakes
- environment mismatch
- app startup paths that fail before readiness
If workers restart only after traffic or memory changes, suspect:
- worker timeout
- memory pressure
- request-path instability
- signal or platform restarts under load
Those branches lead to very different fixes.
Common causes to check
1. Worker timeout
Requests or upstream dependencies exceed the allowed worker window.
Typical clues:
- restarts happen under traffic spikes
- timeout-heavy endpoints dominate logs
- restarts appear after long requests or blocked dependencies
In that case the restart is not random. The worker simply cannot finish within the configured budget.
2. Memory pressure
Workers are recycled or killed when memory climbs too far.
This often looks like:
- restart timing follows memory growth
- one worker class or endpoint allocates more than expected
- memory-heavy requests make the pattern worse over time
This is why worker restarts and Python memory incidents often overlap.
3. Boot-time import or config failure
Workers restart because they never become healthy.
Common patterns:
- import-time exceptions
- missing env vars
- startup code that depends on unavailable services
- config changes that break worker initialization
When this is the branch, runtime traffic analysis will not help much. The failure happens before the worker is truly serving.
4. Deliberate recycle mistaken for failure
Some restarts are expected because of Gunicorn settings or platform behavior.
That can happen with:
- worker recycle policies
- request-count based limits
- platform restarts
- deployment restarts misread as application instability
The key question is whether the restart is harmful and unexpected, or simply visible.
A practical debugging order
When workers keep restarting, this order usually helps most:
- identify whether restart happens at boot, runtime, or expected recycle points
- compare restart timing with traffic and memory shape
- inspect timeout-heavy request paths
- inspect boot logs and recent import/config changes
- decide whether the issue is startup failure, runtime pressure, or normal recycle
This order matters because it prevents two common mistakes:
- tuning worker counts before understanding restart timing
- blaming Gunicorn settings when the real issue is app startup or request runtime behavior
If CPU or memory pressure is part of the same incident, compare with Python CPU Usage High and Python Memory Usage High.
A small example that still needs the real branch
gunicorn app:app --workers 4 --timeout 30
Slow startup, memory spikes, import failures, or aggressive timeouts can all make workers restart in a loop.
The command itself does not tell you which one is happening. The useful signal comes from timing, logs, and what the worker was doing right before the restart.
A good question for every restart incident
For any restart pattern, ask:
- what was the worker doing right before restart
- was it handling traffic, starting up, or waiting
- did memory or timeout pressure rise first
- would this restart still happen with no user traffic
This framing helps because Gunicorn incidents are often timing incidents before they become configuration incidents.
FAQ
Q. Does a restarting worker always mean Gunicorn is broken?
No. The worker may be restarting because of application startup failure, timeout-heavy paths, memory pressure, or even expected recycle behavior.
Q. What should I inspect first?
Restart timing comes first. It usually tells you whether to debug boot, runtime request paths, or expected recycle settings.
Q. Why do traffic spikes often line up with worker restarts?
Because traffic spikes expose timeout-heavy endpoints, memory-heavy paths, and blocked dependencies much more aggressively.
Read Next
- If you want the broader Python routing view first, go to the Python Troubleshooting Guide.
- If CPU pressure is part of the same incident, compare with Python CPU Usage High.
- If memory pressure is part of the same incident, compare with Python Memory Usage High.
Related Posts
Sources:
While AdSense review is pending, related guides are shown instead of ads.
Start Here
Continue with the core guides that pull steady search traffic.
- Middleware Troubleshooting Guide: Redis vs RabbitMQ vs Kafka A practical middleware troubleshooting guide for developers covering when to reach for Redis, RabbitMQ, or Kafka symptoms first, and which problem patterns usually belong to each tool.
- Kubernetes CrashLoopBackOff: What to Check First A practical Kubernetes CrashLoopBackOff troubleshooting guide covering startup failures, probe issues, config mistakes, and what to inspect first.
- Kafka Consumer Lag Increasing: Troubleshooting Guide A practical Kafka consumer lag troubleshooting guide covering what lag usually means, which consumer metrics to check first, and how poll timing, processing speed, and fetch patterns affect lag.
- Kafka Rebalancing Too Often: Common Causes and Fixes A practical Kafka troubleshooting guide covering why consumer groups rebalance too often, what poll timing and group protocol settings matter, and how to stop rebalances from interrupting useful work.
- Docker Container Keeps Restarting: What to Check First A practical Docker restart-loop troubleshooting guide covering exit codes, command failures, environment mistakes, health checks, and what to inspect first.
While AdSense review is pending, related guides are shown instead of ads.