If you run MySQL with read replicas, you will eventually run into replication lag. A write succeeds on the primary, but the replica has not caught up yet, so users may not see the data they just saved.
In this post, we will cover:
- what replication lag is
- why it happens
- how it affects user experience
- what to inspect first
The core idea is that replication lag is not just an internal database delay. It directly affects read consistency and product behavior.
What is replication lag?
Replication lag is the gap between when a change is written on the primary and when that change becomes visible on a replica.
In simple terms:
- the write already succeeded
- but the replica is still behind
That difference can surface directly in user-visible reads.
Why does it matter?
When lag is noticeable, users can experience:
- recently updated data not appearing
- list and detail views disagreeing
- inconsistent reads within one workflow
So this is not only about “some delay.” It can become a trust and correctness issue.
Why does replication lag happen?
Common causes include:
- too much change volume for replicas to apply
- long transactions or large batch jobs
- slow queries consuming replica resources
- I/O or network bottlenecks
- underpowered replica instances
So lag is often connected to broader workload and resource problems, not only to the replication mechanism itself.
What should you inspect first?
A practical sequence is:
- identify when lag grows
- check write load and batch jobs at that time
- inspect slow queries on replicas
- review CPU, disk, and network bottlenecks
- review read-routing strategy in the app
The key question is not only “is the database slow?” but “why can the replica not keep up?”
How should the application respond?
If every read is blindly routed to replicas, consistency issues become more visible. That is why some systems use patterns like:
- read from primary right after writes
- keep strong reads for critical flows
- use replicas for less sensitive traffic
So replication lag is both a database operations issue and an application routing design issue.
Common misunderstandings
1. More replicas automatically fix lag
Not necessarily. If the underlying apply bottleneck remains, adding replicas may not solve the root problem.
2. Replication lag is only a DB team issue
Read routing and consistency expectations are tightly connected to application design.
3. Small lag is always harmless
Depending on the product, even small inconsistency windows can be very visible to users.
FAQ
Q. Which products are most sensitive to replication lag?
Systems where users expect immediate read-after-write consistency.
Q. Should I inspect the DB or the app first?
Both. The cause often lives in the DB workload, while the visible impact often depends on app routing.
Q. Can lag be eliminated completely?
It is difficult to guarantee zero lag in all cases, so teams often reduce impact through routing and consistency strategy.
Read Next
- For primary-side pressure and query inefficiency, continue with the MySQL Slow Query Guide.
- For workload pressure and saturation patterns, read the MySQL Too Many Connections Guide.
While AdSense review is pending, related guides are shown instead of ads.
Start Here
Continue with the core guides that pull steady search traffic.
- Middleware Troubleshooting Guide: Redis vs RabbitMQ vs Kafka A practical middleware troubleshooting guide for developers covering when to reach for Redis, RabbitMQ, or Kafka symptoms first, and which problem patterns usually belong to each tool.
- Kubernetes CrashLoopBackOff: What to Check First A practical Kubernetes CrashLoopBackOff troubleshooting guide covering startup failures, probe issues, config mistakes, and what to inspect first.
- Kafka Consumer Lag Increasing: Troubleshooting Guide A practical Kafka consumer lag troubleshooting guide covering what lag usually means, which consumer metrics to check first, and how poll timing, processing speed, and fetch patterns affect lag.
- Kafka Rebalancing Too Often: Common Causes and Fixes A practical Kafka troubleshooting guide covering why consumer groups rebalance too often, what poll timing and group protocol settings matter, and how to stop rebalances from interrupting useful work.
- Docker Container Keeps Restarting: What to Check First A practical Docker restart-loop troubleshooting guide covering exit codes, command failures, environment mistakes, health checks, and what to inspect first.
While AdSense review is pending, related guides are shown instead of ads.