When a readiness probe keeps failing, the pod may be alive but Kubernetes will not send traffic to it. That usually means the problem is not “the pod is dead.” It is that the probe does not match actual startup timing, endpoint behavior, or dependency expectations.
The short version: compare probe behavior with real app behavior. A readiness probe should reflect when the app can safely serve traffic, not just when the process happens to be running.
Quick Answer
If a Kubernetes readiness probe keeps failing, first determine whether the app is actually unhealthy or whether the probe is checking the wrong endpoint, port, protocol, or timing window. In practice, most incidents come from startup taking longer than the probe allows, a stale probe path, dependency checks that are too strict, or thresholds that are tighter than real production latency.
What to Check First
- does the probe hit the same path, port, and protocol the app really serves?
- did failures start after a deploy, framework change, or config change?
- is the container healthy but still marked unready?
- do logs show startup delay, dependency timeout, or slow warm-up?
- should this really be a
startupProbeproblem rather than a readiness problem?
Start with probe behavior versus real app behavior
If the path, timeout, or expected response does not match reality, the pod stays unready even when the process is technically alive.
That means you need to compare:
- what the readiness probe actually calls
- what the app really exposes
- how long startup actually takes
- whether readiness depends on external systems too early
This is usually more useful than staring at the pod phase alone.
What readiness failure usually means
In practice, repeated readiness failure often means:
- the wrong path or port is probed
- startup takes longer than the probe allows
- the readiness check depends on too much
- thresholds are too aggressive
- the app is alive but not yet safe for traffic
The key is to separate “wrong probe” from “correct probe revealing a real dependency problem.”
Common causes
1. The probe path or port is wrong
The app may be:
- listening on a different port
- exposing a different endpoint
- returning a different status code than the probe expects
This is common after container or framework changes.
2. Startup takes longer than the probe allows
If the app is slow to initialize, readiness can fail for timing reasons rather than real unhealthiness.
3. The probe depends on external systems too early
If readiness checks databases, queues, or upstream services too aggressively, a temporary external issue can keep the pod unready even when the app itself is mostly fine.
4. Timeouts and thresholds are unrealistic
Aggressive timeouts, short intervals, and strict thresholds can mark healthy pods as unready.
5. The app behavior changed but the probe did not
Sometimes the service evolved, but the readiness contract stayed stuck in an older assumption.
That drift creates noisy and confusing readiness incidents.
A quick triage table
| Symptom | Most likely cause | Check first |
|---|---|---|
| App works with direct curl but readiness still fails | wrong path, port, or probe type | probe config versus actual listener |
| Pod becomes ready only long after startup | timing window too short | initialDelaySeconds, timeoutSeconds, startupProbe |
| Readiness fails during dependency slowness | readiness is coupled too tightly to upstream systems | logs and dependency latency |
| Pod flips between ready and unready | thresholds are too strict for normal latency | probe timing and threshold values |
| Service has no endpoints but container is running | readiness mismatch, not a crash | endpoint behavior inside the pod |
A practical debugging order
1. Confirm what the readiness probe actually calls
Start with the exact path, port, method, and timing.
2. Test the path and port from the pod context
This tells you whether the endpoint really behaves the way the probe expects from inside the container environment.
3. Compare startup time with probe timing
If the app needs more warm-up time, the probe may be correct in shape but wrong in timing.
4. Reduce unnecessary dependency checks in readiness when appropriate
Readiness should protect traffic routing, but it should not make the pod hostage to every temporary upstream wobble unless that is truly required.
5. Verify the pod becomes ready when the probe matches real behavior
This final step confirms whether the issue was probe design, startup timing, or a deeper runtime problem.
Quick commands
kubectl describe pod <pod> -n <ns>
kubectl logs <pod> -n <ns>
kubectl exec -it <pod> -n <ns> -- sh
These show what probe is being called, what the app says in logs, and whether the endpoint really responds from inside the pod.
Look for wrong path or port assumptions, slow startup timing, and readiness checks that depend on external systems too early.
What to change after you find the mismatch
If the path or port is wrong
Align the probe with the actual endpoint the app exposes.
If startup is too slow for the probe
Adjust probe timing so Kubernetes waits for realistic warm-up.
If readiness checks too much
Reduce dependency coupling so temporary upstream issues do not unnecessarily block traffic.
If thresholds are too strict
Tune intervals and failure thresholds to reflect real operating conditions.
If app behavior drifted
Update the readiness contract so it still matches how the service actually becomes traffic-safe.
A useful incident question
Ask this:
What exact condition should make this pod safe to receive traffic, and does the current readiness probe actually test that condition accurately?
That question usually makes probe design problems obvious.
Bottom Line
Readiness failures are usually routing problems before they are restart problems. Start by proving whether the probe matches real application behavior, then decide whether to fix endpoint design, probe timing, or a dependency that keeps the app temporarily unready. If you only loosen thresholds without finding the mismatch, the incident usually comes back on the next deploy or traffic spike.
FAQ
Q. Is readiness the same as liveness?
No. Readiness controls whether traffic should reach the pod, while liveness controls whether the pod should be restarted.
Q. What is the fastest first step?
Check the exact path, port, and timing that the readiness probe uses.
Q. If the pod is alive, shouldn’t it be ready?
Not always. Alive only means the process exists. Ready means it is safe to receive traffic.
Q. Should readiness depend on every external dependency?
Usually not unless the service truly cannot serve safely without that dependency.
Read Next
- If the service still cannot send traffic even with healthy pods, compare with Kubernetes Service Has No Endpoints.
- If the pod is crashing rather than just staying unready, continue with Kubernetes CrashLoopBackOff.
- For the broader infrastructure archive, browse the Infra category.
Related Posts
- Kubernetes Service Has No Endpoints
- Kubernetes CrashLoopBackOff
- Kubernetes OOMKilled
- Infra category archive
Sources:
While AdSense review is pending, related guides are shown instead of ads.
Start Here
Continue with the core guides that pull steady search traffic.
- Middleware Troubleshooting Guide: Redis vs RabbitMQ vs Kafka A practical middleware troubleshooting guide for developers covering when to reach for Redis, RabbitMQ, or Kafka symptoms first, and which problem patterns usually belong to each tool.
- Kubernetes CrashLoopBackOff: What to Check First A practical Kubernetes CrashLoopBackOff troubleshooting guide covering startup failures, probe issues, config mistakes, and what to inspect first.
- Kafka Consumer Lag Increasing: Troubleshooting Guide A practical Kafka consumer lag troubleshooting guide covering what lag usually means, which consumer metrics to check first, and how poll timing, processing speed, and fetch patterns affect lag.
- Kafka Rebalancing Too Often: Common Causes and Fixes A practical Kafka troubleshooting guide covering why consumer groups rebalance too often, what poll timing and group protocol settings matter, and how to stop rebalances from interrupting useful work.
- Docker Container Keeps Restarting: What to Check First A practical Docker restart-loop troubleshooting guide covering exit codes, command failures, environment mistakes, health checks, and what to inspect first.
While AdSense review is pending, related guides are shown instead of ads.