problems with node joining the cluster when using sst:xtrabackup (galera)
The problem was that there was a directory of database backups (dbexport
) in MariaDB's data directory (probably /var/lib/mysql/
). When doing the SST, the provider scans the data directory to find the files to send. It saw the directory and assumed that it was for a database since that's what the directories in the data directory are for. Removing the backup directory fixed the problem. As a best practice, don't change anything in /var/lib/
; programs usually keep their data files in there and messing with them can cause problems like this.
After the main problem was resolved, a new message was noticed in the logs:
[Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (359350ee-5c63-11e3-0800-6673d15135cd): 1 (Operation not permitted) at galera/src/replicator_str.cpp:prepare_for_IST():442. IST will be unavailable.
This message is normal. When a node joins a galera cluster it will try to perform an IST (Incremental State Transfer) instead of the full SST (State Snapshot Transfer). If the node was previously part of the cluster and the difference between the state it had when it left and the current state of the cluster is small enough, IST is available which just transfers the differences between the node's current state and the cluster's state. This is much faster than transferring all of the data. If the node was previously part of the cluster but left long time ago, it will need to do an SST. In this case the, joining node's state UUID was 00000000-0000-0000-0000-000000000000
which basically means it is new to the cluster. I run a MariaDB/galera cluster and this message annoys me whenever IST is not available. It would be nice if it wasn't a warning and was reworded. I'm not sure why Operation not permitted
is in there, but it's nothing to worry about.
Additionally, it is recommended that you run an odd number of nodes to prevent split brain conditions. If possible, you should add another MariaDB server to the cluster or run garbd if you cannot. garbd
acts as a node in the cluster without being a database server. It allows you to have an odd number of nodes without needed to have another database server.