SQL_SLAVE_SKIP_COUNTER = 1 fails, setting @@gtid_slave_pos used to skip a given GTID position
I found the following worked for me. This does not restore a slave into state that is an exact replica of master. There will be data differences. I will use pt-table-sync to fix those.
1. Restart Replication without GTID method
2. Stop Parallel slave threads
3. Enable GTID replication
4. Using percona-toolkit pt-slave-restart to skip past all the errors.
1. Restart Replication without GTID method Using master binglog position
CHANGE MASTER TO MASTER_HOST='12.34.56.789',MASTER_USER='slave_user', MASTER_PASSWORD='password', MASTER_LOG_FILE='mysql-bin.000001', MASTER_LOG_POS= 107;
This is well documented, Please google and find instructions.
2. Stop Parallel slave threads
This was part of the problem as seen in the original question.
ERROR 1966 (HY000): When using parallel replication and GTID with multiple replication domains, @@sql_slave_skip_counter cannot be used. Instead, setting @@gtid_slave_pos explicitly can be used to skip to after a given GTID position.
I want to be able to skip events and not worry about trying to figure out or increase the GTID position for everyone.
MariaDB [(none)]> stop slave;
Query OK, 0 rows affected (0.35 sec)
MariaDB [(none)]> set global slave_parallel_threads = 0;
Query OK, 0 rows affected (0.00 sec)
MariaDB [(none)]> set global slave_parallel_mode = none;
Query OK, 0 rows affected (0.00 sec)
MariaDB [(none)]> Start SLAVE;
Query OK, 0 rows affected (0.00 sec)
Now if I check Parallel slave threads I see
MariaDB [(none)]> show slave status \G
*************************** 1. row ***************************
..........
Parallel_Mode: none
I can reverse this process to re-enable Parallel slave threads when I am done. And I know that GTID is working.
3. Enable GTID replication
I can now try restarting the slave with GTID enabled.
On the master
MariaDB [(none)]> SHOW MASTER STATUS\G
*************************** 1. row ***************************
File: mariadb-bin.000001
Position: 510
Binlog_Do_DB:
Binlog_Ignore_DB:
1 row in set (0.00 sec)
SELECT BINLOG_GTID_POS('mariadb-bin.000001', 510);
+--------------------------------------------+
| BINLOG_GTID_POS('mariadb-bin.000001', 510) |
+--------------------------------------------+
| 1-101-1 |
+--------------------------------------------+
1 row in set (0.00 sec)
On the slave
STOP SLAVE;
SET GLOBAL gtid_slave_pos = '1-101-1';
CHANGE MASTER TO master_use_gtid=slave_pos;
START SLAVE;
Now when I check the slave it has some events to skip to get back into the same state as the master.
Last_Error: An attempt was made to binlog GTID 1-1050-5004291 which would create an out-of-order sequence number with existing GTID 1-1050-5004322, and gtid strict mode is enabled.
MariaDB [(none)]> show slave status \G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Log_File: binary.000599
Read_Master_Log_Pos: 364810491
Relay_Log_File: tmsdb-relay-bin.001240
Relay_Log_Pos: 716
Relay_Master_Log_File: binary.000599
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 1950
Last_Error: An attempt was made to binlog GTID 1-1050-5004291 which would create an out-of-order sequence number with existing GTID 1-1050-5004322, and gtid strict mode is enabled.
Skip_Counter: 0
Exec_Master_Log_Pos: 286447058
Relay_Log_Space: 78364447
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 1950
Last_SQL_Error: An attempt was made to binlog GTID 1-1050-5004291 which would create an out-of-order sequence number with existing GTID 1-1050-5004322, and gtid strict mode is enabled.
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1050
Master_SSL_Crl:
Master_SSL_Crlpath:
Using_Gtid: Slave_Pos
Gtid_IO_Pos: 1-1050-5005223,2-1051-379101,3-1010-3273
Replicate_Do_Domain_Ids:
Replicate_Ignore_Domain_Ids:
Parallel_Mode: none
1 row in set (0.00 sec)
4. Using percona-toolkit pt-slave-restart to skip past all the errors
sudo yum install http://www.percona.com/downloads/percona-release/redhat/0.1-4/percona-release-0.1-4.noarch.rpm
sudo yum search percona-toolkit
pt-slave-restart will skip all the events need to get the slave into a working state.
# pt-slave-restart
2017-12-22T13:39:59 tmsdb-relay-bin.001240 716 1950
2017-12-22T13:40:00 tmsdb-relay-bin.001240 69702 1950
2017-12-22T13:40:00 tmsdb-relay-bin.001240 97912 1950
2017-12-22T13:40:00 tmsdb-relay-bin.001240 98144 1950
2017-12-22T13:40:00 tmsdb-relay-bin.001240 363903 1950
2017-12-22T13:40:00 tmsdb-relay-bin.001240 364135 1950
2017-12-22T13:40:00 tmsdb-relay-bin.001240 712776 1950
2017-12-22T13:40:00 tmsdb-relay-bin.001240 713008 1950
2017-12-22T13:40:00 tmsdb-relay-bin.001240 759737 1950
2017-12-22T13:40:00 tmsdb-relay-bin.001240 827932 1950
2017-12-22T13:40:00 tmsdb-relay-bin.001240 828164 1950
2017-12-22T13:40:00 tmsdb-relay-bin.001240 934851 1950
2017-12-22T13:40:00 tmsdb-relay-bin.001240 952088 1950
2017-12-22T13:40:00 tmsdb-relay-bin.001240 952320 1950
2017-12-22T13:40:00 tmsdb-relay-bin.001240 1084249 1950
2017-12-22T13:40:00 tmsdb-relay-bin.001240 1084481 1950
2017-12-22T13:40:00 tmsdb-relay-bin.001240 1351188 1950
2017-12-22T13:40:00 tmsdb-relay-bin.001240 1351420 1950
2017-12-22T13:40:00 tmsdb-relay-bin.001240 1621561 1950
2017-12-22T13:40:00 tmsdb-relay-bin.001240 1693920 1950
2017-12-22T13:40:00 tmsdb-relay-bin.001240 1711677 1950
2017-12-22T13:40:00 tmsdb-relay-bin.001240 1711909 1950
2017-12-22T13:40:00 tmsdb-relay-bin.001240 1880931 1950
2017-12-22T13:40:00 tmsdb-relay-bin.001240 1881163 1950
2017-12-22T13:40:00 tmsdb-relay-bin.001240 1916544 1950
2017-12-22T13:40:00 tmsdb-relay-bin.001240 2124672 1950
2017-12-22T13:40:01 tmsdb-relay-bin.001240 2124904 1950
2017-12-22T13:40:01 tmsdb-relay-bin.001240 2125136 1950
2017-12-22T13:40:01 tmsdb-relay-bin.001240 2452030 1950
2017-12-22T13:40:01 tmsdb-relay-bin.001240 2452262 1950
2017-12-22T13:40:01 tmsdb-relay-bin.001240 2819749 1950
2017-12-22T13:40:01 tmsdb-relay-bin.001240 2819981 1950
Now when I check my slave status
MariaDB [(none)]> show slave status \G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: masterhost
Master_User: maxscale
Master_Port: 3306
Connect_Retry: 5
Master_Log_File: binary.000600
Read_Master_Log_Pos: 37801368
Relay_Log_File: tmsdb-relay-bin.001242
Relay_Log_Pos: 37801653
Relay_Master_Log_File: binary.000600
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 37801368
Relay_Log_Space: 37801991
Until_Condition: None
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Master_Server_Id: 1050
Using_Gtid: Slave_Pos
Gtid_IO_Pos: 1-1050-5014401,2-1051-379101,3-1010-3273
Parallel_Mode: none
1 row in set (0.00 sec)
Lastly I need to restart the server and make sure it is reboot safe, etc.