How to recover MariaDB Galera Cluster after full crash?
Pre-Recovery Settings:
- Ensure that the MYSQL_HOME path is being exported in the .profile. If the MySQL install is in a different location, then make that change to the MYSQL_HOME.(Example: MYSQL_HOME=/path/to/mysql)
Crash Recovery Steps:
Find a valid seqno. Look at the grastate.dat file on each server to see which machine has the most current data. The node with the biggest seqno is the node with the current data.
Next, look at three grastate.dat files.
a) Node0: This grastate.dat shows a graceful shutdown. Note the seqno. We are looking for the node with the largest seqno.
/var/lib/mysql/grastate.dat
version: 2.1
uuid: cbd332a9-f617-11e2-b77d-3ee9fa637069
seqno: 43760
b) Node1: This grastate.dat file shows -1 in the seqno. This node crashed during transaction processing. Start this node using the wsrep-recover option. MySQL will store the last committed GTID in the InnoDB data header.
/var/lib/mysql/grastate.dat
version: 2.1
uuid: cbd332a9-f617-11e2-b77d-3ee9fa637069
seqno: -1
c) Node2: This grastate.dat file has no seqno or group ID. This node crashed during DDL.
/var/lib/mysql/grastate.dat
version: 2.1
uuid: 00000000-0000-0000-0000-000000000000
seqno: -1
- Next, recover the node with the uuid, but no seqno. To get the seqno, use the --wsrep-recover option. To recover the seqno:
/path/to/mysql/bin/mysqld --wsrep-recover. Mysqld will read the InnoDB header files and shutdown immediately. The last wsrep position is printed in mysqld.log file.
Example: 140716 12:55:45 [Note] WSREP: Found saved state: cbd332a9- f617-11e2-b77d-3ee9fa637069:36742
Look at the seqno from Node0 (seqno: 43760) and Node1 (seqno: -1). Node0 has the current snapshot of data and should be started first.
On Node0, issue this command to start the node:
a) nohup /path/to/mysql/bin/mysqld_safe -- wsrep_cluster_address=gcomm:// &; wait for this node to come online.
b) Then start Node1, and Node2. These two nodes should be started one at a time, and can be started as you normally would.
c) Once all three nodes are up and in a primary state, restart Node0 in the normal fashion (so it comes up as part of the entire cluster, not just a bootstrap).
- If Node1 or Node2 had the highest seqno, then that Node would be booted as the bootstrap, and you would allow the remaining nodes to start up one at a time (connecting to the Node with the highest seqno).