Why do I sometimes see more than one
Binlog_Dump thread on the master after I have restarted the slave?
Binlog_Dump is a continuous process
that is handled by the server in the following way:
Catch up on the updates.
Once there are no more updates left,
go into pthread_cond_wait(), from which we can be awakened either by an update
or a kill.
On wake up, check the reason. If we
are not supposed to die, continue the Binlog_dump loop.
If there is some fatal error, such as
detecting a dead client, terminate the loop.
So if the slave thread stops on the
slave, the corresponding Binlog_Dump thread on the master will not notice it
until after at least one update to the master (or a kill), which is needed to
wake it up from pthread_cond_wait(). In the meantime, the slave could have
opened another connection, which resulted in another Binlog_Dump thread.
The above problem should not be
present in Version 3.23.26 and later versions. In Version 3.23.26 we added
server-id to each replication server, and now all the old zombie threads are
killed on the master when a new replication thread connects from the same slave
How do I rotate replication logs?
In Version 3.23.28 you should use
PURGE MASTER LOGS TO command after determining which logs can be deleted, and
optionally backing them up first. In earlier versions the process is much more
painful, and cannot be safely done without stopping all the slaves in the case
that you plan to re-use log names. You will need to stop the slave threads,
edit the binary log index file, delete all the old logs, restart the master,
start slave threads,and then remove the old log files.
How do I upgrade on a hot replication
setup?
If you are upgrading pre-3.23.26
versions, you should just lock the master tables, let the slave catch up, then
run FLUSH MASTER on the master, and FLUSH SLAVE on the slave to reset the logs,
then restart new versions of the master and the slave. Note that the slave can
stay down for some time - since the master is logging all the updates, the
slave will be able to catch up once it is up and can connect.
After 3.23.26, we have locked the
replication protocol for modifications, so you can upgrade masters and slave on
the fly to a newer 3.23 version and you can have different versions of MySQL
running on the slave and the master, as long as they are both newer than
3.23.26.
What issues should I be aware of when
setting up two-way replication?
MySQL replication currently does not
support any locking protocol between master and slave to guarantee the
atomicity of a distributed (cross-server) update. In in other words, it is
possible for client A to make an update to co-master 1, and in the meantime,
before it propagates to co-master 2, client B could make an update to co-master
2 that will make the update of client A work differently than it did on
co-master 1. Thus when the update of client A will make it to co-master 2, it
will produce tables that will be different than what you have on co-master 1,
even after all the updates from co-master 2 have also propagated. So you should
not co-chain two servers in a two-way replication relationship, unless you are
sure that you updates can safely happen in any order, or unless you take care
of mis-ordered updates somehow in the client code.
You must also realize that two-way
replication actually does not improve performance very much, if at all, as far
as updates are concerned. Both servers need to do the same amount of updates
each, as you would have one server do. The only difference is that there will
be a little less lock contention, because the updates originating on another
server will be serialized in one slave thread. This benefit, though, might be
offset by network delays.
How can I use replication to improve
performance of my system?
You should set up one server as the
master, and direct all writes to it, and configure as many slaves as you have
the money and rackspace for, distributing the reads among the master and the
slaves. You can also start the slaves with --skip-bdb, --low-priority-updates
and --delay-key-write-for-all-tables to get speed improvements for the slave.
In this case the slave will use non-transactional MyISAM tables instead of BDB
tables to get more speed.
What should I do to prepare my client
code to use performance-enhancing replication?
A: If the part of your code that is
responsible for database access has been properly abstracted/modularized,
converting it to run with the replicated setup should be very smooth and easy -
just change the implementation of your database access to read from some slave
or the master, and to awlays write to the master. If your code does not have
this level of abstraction, setting up a replicated system will give you an
opportunity/motivation to it clean up. You should start by creating a wrapper
library /module with the following functions:
safe_writer_connect()
safe_reader_connect()
safe_reader_query()
safe_writer_query()
safe_ means that the function will
take care of handling all the error conditions.
You should then convert your client
code to use the wrapper library. It may be a painful and scary process at
first, but it will pay off in the long run. All applications that follow the
above pattern will be able to take advantage of one-master/many slaves
solution. The code will be a lot easier to maintain, and adding troubleshooting
options will be trivial. You will just need to modify one or two functions, for
example, to log how long each query took, or which query, among your many
thousands, gave you an error. If you have written a lot of code already, you
may want to automate the conversion task by using Monty's replace utility,
which comes with the standard distribution of MySQL, or just write your own
Perl script. Hopefully, your code follows some recognizable pattern. If not,
then you are probably better off re-writing it anyway, or at least going
through and manually beating it into a pattern.
Note that, of course, you can use
different names for the functions. What is important is having unified
interface for connecting for reads, connecting for writes, doing a read, and
doing a write.
When and how much can MySQL
replication improve the performance of my system?
MySQL replication is most beneficial
for a system with frequent reads and not so frequent writes. In theory, by
using a one master/many slaves setup you can scale by adding more slaves until
you either run out of network bandwidth, or your update load grows to the point
that the master cannot handle it.
In order to determine how many slaves
you can get before the added benefits begin to level out, and how much you can
improve performance of your site, you need to know your query patterns, and
empirically (by benchmarking) determine the relationship between the throughput
on reads (reads per second, or max_reads) and on writes max_writes) on a
typical master and a typical slave. The example below will show you a rather
simplified calculation of what you can get with replication for our imagined
system.
Let's say our system load consists of
10% writes and 90% reads, and we have determined that max_reads = 1200 - 2 *
max_writes, or in other words, our system can do 1200 reads per second with no
writes, our average write is twice as slow as average read, and the
relationship is linear. Let us suppose that our master and slave are of the
same capacity, and we have N slaves and 1 master. Then we have for each server
(master or slave):
reads = 1200 - 2 * writes (from
bencmarks)
reads = 9* writes / (N + 1) (reads
split, but writes go to all servers)
9*writes/(N+1) + 2 * writes = 1200
writes = 1200/(2 + 9/(N+1)
So if N = 0, which means we have no
replication, our system can handle 1200/11, about 109 writes per second (which
means we will have 9 times as many reads due to the nature of our application).
If N = 1, we can get up to 184 writes
per second.
If N = 8, we get up to 400.
If N = 17, 480 writes.
Eventually as N approaches infinity
(and our budget negative infinity), we can get very close to 600 writes per
second, increasing system throughput about 5.5 times. However, with only 8
servers, we increased it almost 4 times already.
Note that our computations assumed
infinite network bandwidth, and neglected several other factors that could turn
out to be signficant on your system. In many cases, you may not be able to make
a computation similar to the one above that will accurately predict what will
happen on your system if you add N replication slaves. However, answering the
following questions should help you decided whether and how much, if at all,
the replication will improve the performance of your system:
What is the read/write ratio on your
system?
How much more write load can one
server handle if you reduce the reads?
How many slaves do you have bandwidth
for on your network?
How can I use replication to provide
redundancy/high availability?
With the currently available features,
you would have to set up a master and a slave (or several slaves), and write a
script that will monitor the master to see if it is up, and instruct your
applications and the slaves of the master change in case of failure. Some
suggestions:
To tell a slave to change the master
use the CHANGE MASTER TO command.
A good way to keep your applications
informed where the master is by having a dynamic DNS entry for the master. With
bind you can use nsupdate to dynamically update your DNS.
You should run your slaves with the
log-bin option and without log-slave-updates. This way the slave will be ready
to become a master as soon as you issue STOP SLAVE; RESET MASTER, and CHANGE
MASTER TO on the other slaves. It will also help you catch spurious updates
that may happen because of misconfiguration of the slave (ideally, you want to
configure access rights so that no client can update the slave, except for the
slave thread) combined with the bugs in your client programs (they should never
update the slave directly).
We are currently working on
intergrating an automatic master election system into MySQL, but until it is
ready, you will have to create your own monitoring tools.
No comments:
Post a Comment