KVM+DRBD replicated between two active-passive servers with manual switching
Why not using things which have been checked by thousands of users and proved their reliability? You can just deploy free Hyper-V server with, for example, StarWind VSAN Free and get true HA without any issues. Check out this manual: https://www.starwindsoftware.com/resource-library/starwind-virtual-san-hyperconverged-2-node-scenario-with-hyper-v-server-2016
I have a very similar installation with the setup you described: a KVM server with a stanby replica via DRBD active/passive. To have a system as simple as possible (and to avoid any automatic split-brain, ie: due to my customer messing with the cluster network), I also ditched automatic cluster failover.
The system is 5+ years old and never gave me any problem. My volume setup is the following:
- a dedicated RAID volume for VM storage;
- a small overlay volume containing QEMU/KVM config files;
- bigger volumes for virtual disks;
- a DRBD resources managing the entire dedicated array block device.
I wrote some shell scripts to help me in case of failover. You can found them here
Please note that the system was architected for maximum performance, even at the expense of features as fast snapshots and file-based (rather than volume-based) virtual disks.
Rebuilding a similar, active/passive setup now, I would heavily lean toward using ZFS and continuous async replication via send/recv
. It is not real-time, block based replication, but it is more than sufficient for 90%+ case.
If realtime replication is really needed, I would use DRBD on top of a ZVOL + XFS; I tested such a setup + automatic pacemaker switch in my lab with great satisfaction, in fact. If using 3rdy part modules (as ZoL is) is not possible, I would use a DRBD resources on top of a lvmthin
volume + XFS.
You can totally setup DRBD and use it in a purely manual fashion. The process should not be complex at all. You would simply do what a Pacemaker or Rgmanager cluster does, but by hand. Essentially:
- Stop the VM on the active node
- Demote DRBD on the active node
- Promote DRBD on the peer node
- Start the VM on the peer node
Naturally, this will require that both nodes have the proper packages installed, and the VM's configurations and definition exist on both nodes.
I can assure that the Linux HA stack (corosync and pacemaker) are still actively developed and supported. Many guides are old, the software has been around for 10 years. When done properly, there are no major problems or issues. It is not abandoned, but it is no longer "new and exciting".