Redis latency spikes become hard to debug when teams treat every slowdown as the same class of incident. In one case the server is busy running expensive commands. In another, Redis is fast but the host, hypervisor, or network path is already noisy. In another, persistence or memory pressure adds pauses that only show up during certain windows.
The short version: split the incident into command cost, environment baseline, network path, and persistence or memory side effects before changing Redis settings. That branching step is what keeps a latency investigation from turning into random tuning.
Start by separating four different latency buckets
Redis latency usually comes from one or more of these buckets:
- expensive Redis commands
- intrinsic environment or host latency
- network and client round-trip delay
- persistence, memory pressure, or swapping side effects
If you do not separate those buckets early, almost every attempted fix becomes guesswork. A command tuning change will not solve noisy virtualization, and a networking change will not fix a blocking Lua script or a big-key delete.
Slow commands are only one part of the story
Slow commands are a common cause, so SLOWLOG is still one of the first tools to inspect.
But a clean slowlog does not mean Redis is healthy. Latency spikes can still come from:
- fork or rewrite activity during persistence
- host-level jitter or virtualization overhead
- memory pressure and swapping
- too many client round trips
That is why a good incident review has to branch instead of assuming every spike is a command problem.
Use latency monitoring when you need event-level clues
Redis exposes latency monitoring for cases where the system feels slow but the application logs do not clearly tell you why.
A practical start is:
redis-cli CONFIG SET latency-monitor-threshold 100
redis-cli LATENCY LATEST
redis-cli LATENCY HISTORY command
redis-cli LATENCY DOCTOR
This helps when spikes are real but not obviously tied to one query path. If a team says “the cache feels random” or “p95 jumps for a few minutes,” latency monitor often gives you the first useful time-aligned clue.
Do not ignore intrinsic latency and networking
Redis documentation emphasizes that your operating system, hypervisor, and network create a baseline you cannot beat.
Ask:
- is Redis running in a noisy virtualized environment?
- is the client far from the Redis node?
- are too many sequential round trips happening?
- would pipelining reduce visible delay?
Sometimes Redis is not the bottleneck. The surrounding path is. That distinction matters because teams often increase Redis resources when the real problem is chatty client behavior or a poor runtime environment.
Persistence, swapping, and memory pressure can create ugly spikes
Redis latency often gets worse when:
- memory is tight
- the kernel swaps
- persistence fork and rewrite work overlap with traffic
- disk behavior becomes slow or unstable
If spikes line up with save or rewrite windows, compare the incident with Redis Persistence Latency. If memory is also rising, compare it with Redis Memory Usage High.
These are classic cases where the visible symptom is “Redis latency,” but the operational cause is broader system work happening around Redis.
Big keys often turn normal commands into spike generators
One oversized key can make ordinary reads, writes, deletes, expirations, and rewrites far more expensive than expected.
That is why a team may think “Redis is randomly spiking” when the real story is:
- one key family became too large
- one feature touched that family in a burst
- normal commands suddenly became expensive
If a spike clusters around one feature or one key family, Redis Big Keys is often the next best guide.
A practical debugging order
Use this order during an incident:
- inspect
SLOWLOG - inspect latency monitor events
- compare spikes against persistence and save windows
- check whether memory pressure or swapping is involved
- test whether host or network baseline is already too high
This sequence usually gets you closer to the real cause than changing timeouts or Redis config blindly.
A quick command set for the first 10 minutes
redis-cli SLOWLOG GET 10
redis-cli LATENCY LATEST
redis-cli INFO memory
redis-cli INFO persistence
redis-cli INFO stats
Read those outputs together instead of one by one. A slowlog-heavy incident points toward command cost. Clean slowlog plus persistence activity points somewhere else. Rising memory pressure plus latency events often means the spike is part of a broader resource issue.
What teams often miss
Latency spikes are often mixed incidents.
For example:
- one command family becomes slower
- the host baseline is already noisy
- persistence windows make the peak worse
In those cases, looking for only one root cause makes the incident feel more mysterious than it really is. Redis can be both a victim and a contributor in the same outage window.
A practical question to keep asking
During a spike, do not ask only “what command was slow?” Ask “which layer became slow first?”
That framing helps separate:
- a Redis execution problem
- an environment baseline problem
- a persistence or memory-side effect problem
That is often the difference between fixing the actual bottleneck and only tuning the most visible symptom.
FAQ
Q. Are Redis latency spikes always caused by Redis commands?
No. They can also come from networking, the operating system, swapping, or persistence side effects.
Q. What is the fastest first step?
Inspect SLOWLOG, then compare it with latency monitor events from the same incident window.
Q. When should I suspect big keys?
When spikes cluster around one feature, one key family, or one command path touching unusually large data.
Q. If slowlog is clean, can Redis still feel slow?
Yes. Persistence, host latency, and memory pressure can all create visible spikes.
Read Next
- If you want the command-level path, continue with Redis Slowlog Guide.
- If you suspect data shape is the real cause, continue with Redis Big Keys.
- If the timing matches save or rewrite activity, continue with Redis Persistence Latency.
Related Posts
Sources:
- https://redis.io/docs/latest/operate/oss_and_stack/management/optimization/latency/
- https://redis.io/docs/latest/operate/oss_and_stack/management/optimization/latency-monitor/
While AdSense review is pending, related guides are shown instead of ads.
Start Here
Continue with the core guides that pull steady search traffic.
- Middleware Troubleshooting Guide: Redis vs RabbitMQ vs Kafka A practical middleware troubleshooting guide for developers covering when to reach for Redis, RabbitMQ, or Kafka symptoms first, and which problem patterns usually belong to each tool.
- Kubernetes CrashLoopBackOff: What to Check First A practical Kubernetes CrashLoopBackOff troubleshooting guide covering startup failures, probe issues, config mistakes, and what to inspect first.
- Kafka Consumer Lag Increasing: Troubleshooting Guide A practical Kafka consumer lag troubleshooting guide covering what lag usually means, which consumer metrics to check first, and how poll timing, processing speed, and fetch patterns affect lag.
- Kafka Rebalancing Too Often: Common Causes and Fixes A practical Kafka troubleshooting guide covering why consumer groups rebalance too often, what poll timing and group protocol settings matter, and how to stop rebalances from interrupting useful work.
- Docker Container Keeps Restarting: What to Check First A practical Docker restart-loop troubleshooting guide covering exit codes, command failures, environment mistakes, health checks, and what to inspect first.
While AdSense review is pending, related guides are shown instead of ads.