Does MySQL Replication hamper the performance of my DB?
There are two major things a Master has to handle because of a Replication Setup
DISK I/O
The Master has to write every completed SQL transaction to the binary logs
NETWORK I/O
When a Slave is connected, the Master must play traffic cop.
- Master writes SQL statement to the most recent Binary Log
- Master polls for DB Connections emanating from Slaves
- For Each Slave DB Connection, the Master does the Following:
- Master receives request for the oldest SQL Statement from the Binary Log since the the Slave's last entry in its Relay Logs
- Master sends the following to the Slave's IO Thread
- Master Log Filename
- Master Log Position
- SQL Statement at the Master's Log Filename and Position
If a busy Master has Slaves with Seconds_Behind_Master : 0
, simply marvelous.
If either of these metrics become noticeable, you may need to do certain things:
- Scale up DB Server
- Place Binary Logs on a Separate Disk
- Faster Disk I/O
- RAID10 (rather than RAID5)
- Is raid 5 suitable for a mysql installation?
- Master - Slave MySQL setup on a VMWare Cloud — Is it needed? (ServerFault)
- More RAM
- Scale out the Topology
- Add More Slaves
- Make Each a Slave Master to Other Slaves
- Create Distribution Masters
Here are my other posts about setting up replication topologies
Aug 20, 2012
: Best solution for cross-datacenter MySQL master-slave replicationAug 20, 2012
: How to use MySQL Server 5.6 for GeoIP replication?Jun 25, 2012
: Mysql database replication on different vlan/subnet/another siteDec 19, 2011
: Best way to setup master to multi master replicationDec 17, 2011
: Mysql Master-Master Replication Topologies on >2 machinesAug 18, 2011
: With MySQL Replication, what level of resilience is possible?Jun 01, 2011
: What can we do in MySQL 5.0 Replication to address bandwidth concerns?Mar 29, 2011
: MySQL high availability, failover and replication with Latency
To give a simple answer to your question, "No, replication does not kill the performance of your master."
The impact on your master from replication is typically very small compared to all of the other things it has to do, because the master only really has to accomplish two significant things in a replication environment:
- formulate and write events to the binlog on the local hard drive, and
- send a copy of every event it writes to the binlog to every connected slave
I do not consider writing the binary log to be a cost of replication, because you should always have binary logging turned on even if you're not replicating. It's an extremely valuable troubleshooting and recovery tool.
The cost of sending the replication events to the slave(s) is also negligible, because the slave is responsible for maintaining a persistent TCP connection to the master, which only has to copy the data onto the socket as the events occur. Beyond that, the master neither knows nor cares whether or when the slave gets around to executing them.
A partial exception to that last statement is semi-synchronous replication, which is not the default. In this mode, the master waits for at least one slave to acknowledge receipt and persistent storage (though not actual execution) of the binary log events from each transaction, before the master returns control to the client on each commit.
But in every case, the master is not responsible for actually executing the updates on the slave -- it just sends one of two things to the slave: either a copy of the actual input query that ran (in statement-based mode) or the data for the rows actually inserted/updated/deleted by each query (in row-based mode). In mixed mode, the query optimizer will decide which format to use on a per-event basis.