Kubernetes Readiness Probe Failed: What to Check First
Last updated on

Kubernetes Readiness Probe Failed: What to Check First


When a readiness probe keeps failing, the pod may be alive but Kubernetes will not send traffic to it. That usually means the problem is not “the pod is dead.” It is that the probe does not match actual startup timing, endpoint behavior, or dependency expectations.

The short version: compare probe behavior with real app behavior. A readiness probe should reflect when the app can safely serve traffic, not just when the process happens to be running.


Quick Answer

If a Kubernetes readiness probe keeps failing, first determine whether the app is actually unhealthy or whether the probe is checking the wrong endpoint, port, protocol, or timing window. In practice, most incidents come from startup taking longer than the probe allows, a stale probe path, dependency checks that are too strict, or thresholds that are tighter than real production latency.

What to Check First

  • does the probe hit the same path, port, and protocol the app really serves?
  • did failures start after a deploy, framework change, or config change?
  • is the container healthy but still marked unready?
  • do logs show startup delay, dependency timeout, or slow warm-up?
  • should this really be a startupProbe problem rather than a readiness problem?

Start with probe behavior versus real app behavior

If the path, timeout, or expected response does not match reality, the pod stays unready even when the process is technically alive.

That means you need to compare:

  • what the readiness probe actually calls
  • what the app really exposes
  • how long startup actually takes
  • whether readiness depends on external systems too early

This is usually more useful than staring at the pod phase alone.


What readiness failure usually means

In practice, repeated readiness failure often means:

  • the wrong path or port is probed
  • startup takes longer than the probe allows
  • the readiness check depends on too much
  • thresholds are too aggressive
  • the app is alive but not yet safe for traffic

The key is to separate “wrong probe” from “correct probe revealing a real dependency problem.”


Common causes

1. The probe path or port is wrong

The app may be:

  • listening on a different port
  • exposing a different endpoint
  • returning a different status code than the probe expects

This is common after container or framework changes.

2. Startup takes longer than the probe allows

If the app is slow to initialize, readiness can fail for timing reasons rather than real unhealthiness.

3. The probe depends on external systems too early

If readiness checks databases, queues, or upstream services too aggressively, a temporary external issue can keep the pod unready even when the app itself is mostly fine.

4. Timeouts and thresholds are unrealistic

Aggressive timeouts, short intervals, and strict thresholds can mark healthy pods as unready.

5. The app behavior changed but the probe did not

Sometimes the service evolved, but the readiness contract stayed stuck in an older assumption.

That drift creates noisy and confusing readiness incidents.

A quick triage table

SymptomMost likely causeCheck first
App works with direct curl but readiness still failswrong path, port, or probe typeprobe config versus actual listener
Pod becomes ready only long after startuptiming window too shortinitialDelaySeconds, timeoutSeconds, startupProbe
Readiness fails during dependency slownessreadiness is coupled too tightly to upstream systemslogs and dependency latency
Pod flips between ready and unreadythresholds are too strict for normal latencyprobe timing and threshold values
Service has no endpoints but container is runningreadiness mismatch, not a crashendpoint behavior inside the pod

A practical debugging order

1. Confirm what the readiness probe actually calls

Start with the exact path, port, method, and timing.

2. Test the path and port from the pod context

This tells you whether the endpoint really behaves the way the probe expects from inside the container environment.

3. Compare startup time with probe timing

If the app needs more warm-up time, the probe may be correct in shape but wrong in timing.

4. Reduce unnecessary dependency checks in readiness when appropriate

Readiness should protect traffic routing, but it should not make the pod hostage to every temporary upstream wobble unless that is truly required.

5. Verify the pod becomes ready when the probe matches real behavior

This final step confirms whether the issue was probe design, startup timing, or a deeper runtime problem.


Quick commands

kubectl describe pod <pod> -n <ns>
kubectl logs <pod> -n <ns>
kubectl exec -it <pod> -n <ns> -- sh

These show what probe is being called, what the app says in logs, and whether the endpoint really responds from inside the pod.

Look for wrong path or port assumptions, slow startup timing, and readiness checks that depend on external systems too early.


What to change after you find the mismatch

If the path or port is wrong

Align the probe with the actual endpoint the app exposes.

If startup is too slow for the probe

Adjust probe timing so Kubernetes waits for realistic warm-up.

If readiness checks too much

Reduce dependency coupling so temporary upstream issues do not unnecessarily block traffic.

If thresholds are too strict

Tune intervals and failure thresholds to reflect real operating conditions.

If app behavior drifted

Update the readiness contract so it still matches how the service actually becomes traffic-safe.


A useful incident question

Ask this:

What exact condition should make this pod safe to receive traffic, and does the current readiness probe actually test that condition accurately?

That question usually makes probe design problems obvious.

Bottom Line

Readiness failures are usually routing problems before they are restart problems. Start by proving whether the probe matches real application behavior, then decide whether to fix endpoint design, probe timing, or a dependency that keeps the app temporarily unready. If you only loosen thresholds without finding the mismatch, the incident usually comes back on the next deploy or traffic spike.


FAQ

Q. Is readiness the same as liveness?

No. Readiness controls whether traffic should reach the pod, while liveness controls whether the pod should be restarted.

Q. What is the fastest first step?

Check the exact path, port, and timing that the readiness probe uses.

Q. If the pod is alive, shouldn’t it be ready?

Not always. Alive only means the process exists. Ready means it is safe to receive traffic.

Q. Should readiness depend on every external dependency?

Usually not unless the service truly cannot serve safely without that dependency.


Sources:

Start Here

Continue with the core guides that pull steady search traffic.