Example: how to migrate zookeeper
# https://gist.github.com/iameugenejo/d101d01e1002b95076d5b762d4b3405d
# migrating 3-node zookeeper cluster to a new 3-node cluster
# o1, o2, o3 = old zk nodes, assume o1 is the leader
# n1, n2, n3 = new zk nodes
# quorum: minimum number of healhty nodes
# 6-node cluster's quorum is 4
# 5-node cluster's quorum is 3
# 4-node cluster's quorum is 3
# 3-node cluster's quorum is 2
1. Make 3 new nodes to join the cluster one at a time, and let them complete replication:
- add n1 with (o1, o2, o3, n1) configured
- add n2 with (o1, o2, o3, n1, n2) configured
- update n1 with (o1, o2, o3, n1, n2) configuration, restart
- add n3 with (o1, o2, o3, n1, n2, n3) configured
- update n1 with (o1, o2, o3, n1, n2, n3) configuration, restart
- update n2 with (o1, o2, o3, n1, n2, n3) configuration, restart
- at this point, from o1, o2, o3's perspective, it's 3-node cluster with quorum count of 2 and 3 healhty nodes
- from n1, n2, n3's perspective, it's 6-node cluster with quorum count of 3 and 6 healhty nodes.
- from the clients' perspective, it's still connected to the old 3-node cluster
2. update the clients that use the zookeeper cluster with (n1, n2, n3).:
- from the clients' perspective, now the clients are connected to the 6-node cluster.
3. update o2, o3 with (o1, o2, o3, n1, n2, n3) and restart:
- from o1's perspective it's still a 3-node cluster
- from the rest of the nodes, it's a 6-node cluster
4. kill o1 to cause the leader re-election:
- 3-node cluster is now gone
- 6-node cluster still meets the quorum with 5 healhty nodes (5 >= 4).
- Note: somehow zookeeper gets stuck at this point when one of the new nodes become a leader (try echo "get /zookeeper/config" | zookeeper-shell localhost:2181")
- when it gets stuck, restart all new zookeeper nodes (n1, n2, 3)
5. update every nodes with (o2, o3, n1, n2, n3) configured, one at a time:
- the intermediate state with the 6-node cluster meets the quorum with 5 healhty nodes (5 >= 3)
- the final state with the 5-node cluster meets the quorum with 4 healhty nodes (5 >= 3)
6. kill o2 and update every nodes with (o3, n1, n2, n3) configured, one at a time:
- the intermediate state with the 5-node cluster meets the quorum with 4 healhty nodes (4 >= 3)
- the final state with the 4-node cluster meets the quorum with 4 healhty nodes (4 >= 3)
7. kill o3 and update every nodes with (n1, n2, n3) configured, one at a time:
- the intermediate state with the 4-node cluster meets the quorum with 3 healhty nodes (3 >= 2)
- the final state with the 3-node cluster meets the quorum with 3 healhty nodes (3 >= 2)