How to install Ubuntu 14.04/16.04 64-bit with a dual-boot RAID 1 partition on an UEFI/GPT system?

UPDATE: I have verified that the description below also works for Ubuntu 16.04. Other users have reported working on 17.10 and 18.04.1.

NOTE: This HOWTO will not give you LVM. If you want LVM too, try Install Ubuntu 18.04 desktop with RAID 1 and LVM on machine with UEFI BIOS instead.

After days of trying, I now have a working system! In brief, the solution consisted of the following steps:

  1. Boot using a Ubuntu Live CD/USB.
  2. Partitions the SSDs as required.
  3. Install missing packages (mdadm and grub-efi).
  4. Create the RAID partitions.
  5. Run the Ubiquity installer (but do not boot into the new system).
  6. Patch the installed system (initramfs) to enable boot from a RAIDed root.
  7. Populate the EFI partition of the first SSD with GRUB and install it into the EFI boot chain.
  8. Clone the EFI partition to the other SSD and install it into the boot chain.
  9. Done! Your system will now have RAID 1 redundancy. Note that nothing special needs to be done after e.g. a kernel update, as the UEFI partitions are untouched.

A key component of step 6 of the solution was a delay in the boot sequence that otherwise dumped me squarely to the GRUB prompt (without keyboard!) if either of the SSDs were missing.

Detailed HOWTO

1. Boot

Boot using EFI from the USB stick. Exactly how will vary by your system. Select Try ubuntu without installing.

Start a terminal emulator, e.g. xterm to run the commands below.

1.1 Login from another computer

While trying this out, I often found it easier to login from another, already fully configured computer. This simplified cut-and-paste of commands, etc. If you want to do the same, you can login via ssh by doing the following:

On the computer to be configured, install the openssh server:

sudo apt-get install openssh-server

Change password. The default password for user ubuntu is blank. You can probably pick a medium-strength password. It will be forgotten as soon as you reboot your new computer.

passwd

Now you can log into the ubuntu live session from another computer. The instructions below are for linux:

ssh -l ubuntu <your-new-computer>

If you get a warning about a suspected man-in-the-middle-attack, you need to clear the ssh keys used to identify the new computer. This is because openssh-server generates new server keys whenever it is installed. The command to use is typically printed and should look like

ssh-keygen -f <path-to-.ssh/known_hosts> -R <your-new-computer>

After executing that command, you should be able to login to the ubuntu live session.

2. Partition disks

Clear any old partitions and boot blocks. Warning! This will destroy data on your disks!

sudo sgdisk -z /dev/sda
sudo sgdisk -z /dev/sdb

Create new partitions on the smallest of your drives: 100M for ESP, 32G for RAID SWAP, rest for RAID root. If your sda drive is smallest, follow Section 2.1, otherwise Section 2.2.

2.1 Create partition tables (/dev/sda is smaller)

Do the following steps:

sudo sgdisk -n 1:0:+100M -t 1:ef00 -c 1:"EFI System" /dev/sda
sudo sgdisk -n 2:0:+32G -t 2:fd00 -c 2:"Linux RAID" /dev/sda
sudo sgdisk -n 3:0:0 -t 3:fd00 -c 3:"Linux RAID" /dev/sda

Copy partition table to other disk and regenerate unique UUIDs (will actually regenerate UUIDs for sda).

sudo sgdisk /dev/sda -R /dev/sdb -G

2.2 Create partition tables (/dev/sdb is smaller)

Do the following steps:

sudo sgdisk -n 1:0:+100M -t 1:ef00 -c 1:"EFI System" /dev/sdb
sudo sgdisk -n 2:0:+32G -t 2:fd00 -c 2:"Linux RAID" /dev/sdb
sudo sgdisk -n 3:0:0 -t 3:fd00 -c 3:"Linux RAID" /dev/sdb

Copy partition table to other disk and regenerate unique UUIDs (will actually regenerate UUIDs for sdb).

sudo sgdisk /dev/sdb -R /dev/sda -G

2.3 Create FAT32 file system on /dev/sda

Create FAT32 file system for the EFI partition.

sudo mkfs.fat -F 32 /dev/sda1
mkdir /tmp/sda1
sudo mount /dev/sda1 /tmp/sda1
sudo mkdir /tmp/sda1/EFI
sudo umount /dev/sda1

3. Install missing packages

The Ubuntu Live CD comes without two key packages; grub-efi and mdadm. Install them. (I'm not 100% sure grub-efi is needed here, but to maintain symmetry with the coming installation, bring it in as well.)

sudo apt-get update
sudo apt-get -y install grub-efi-amd64 # (or grub-efi-amd64-signed)
sudo apt-get -y install mdadm

You may need grub-efi-amd64-signed instead of grub-efi-amd64 if you have secure boot enabled. (See comment by Alecz.)

4. Create the RAID partitions

Create the RAID devices in degraded mode. The devices will be completed later. Creating a full RAID1 did sometimes give me problems during the ubiquity installation below, not sure why. (mount/unmount? format?)

sudo mdadm --create /dev/md0 --bitmap=internal --level=1 --raid-disks=2 /dev/sda2 missing
sudo mdadm --create /dev/md1 --bitmap=internal --level=1 --raid-disks=2 /dev/sda3 missing

Verify RAID status.

cat /proc/mdstat

Personalities : [raid1] 
md1 : active raid1 sda3[0]
      216269952 blocks super 1.2 [2/1] [U_]
      bitmap: 0/2 pages [0KB], 65536KB chunk

md0 : active raid1 sda2[0]
      33537920 blocks super 1.2 [2/1] [U_]
      bitmap: 0/1 pages [0KB], 65536KB chunk

unused devices: <none>

Partition the md devices.

sudo sgdisk -z /dev/md0
sudo sgdisk -z /dev/md1
sudo sgdisk -N 1 -t 1:8200 -c 1:"Linux swap" /dev/md0
sudo sgdisk -N 1 -t 1:8300 -c 1:"Linux filesystem" /dev/md1

5. Run the installer

Run the ubiquity installer, excluding the boot loader that will fail anyway. (Note: If you have logged in via ssh, you will probably want to to execute this on you new computer instead.)

sudo ubiquity -b

Choose Something else as the installation type and modify the md1p1 type to ext4, format: yes, and mount point /. The md0p1 partition will automatically be selected as swap.

Get a cup of coffee while the installation finishes.

Important: After the installation has finished, select Continue testing as the system is not boot ready yet.

Complete the RAID devices

Attach the waiting sdb partitions to the RAID.

sudo mdadm --add /dev/md0 /dev/sdb2
sudo mdadm --add /dev/md1 /dev/sdb3

Verify all RAID devices are ok (and optionally sync'ing).

cat /proc/mdstat

Personalities : [raid1] 
md1 : active raid1 sdb3[1] sda3[0]
      216269952 blocks super 1.2 [2/1] [U_]
      [>....................]  recovery =  0.2% (465536/216269952)  finish=17.9min speed=200000K/sec
      bitmap: 2/2 pages [8KB], 65536KB chunk

md0 : active raid1 sdb2[1] sda2[0]
      33537920 blocks super 1.2 [2/2] [UU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

unused devices: <none>

The process below may continue during the sync, including the reboots.

6. Configure the installed system

Set up for to enable chroot into the install system.

sudo -s
mount /dev/md1p1 /mnt
mount -o bind /dev /mnt/dev
mount -o bind /dev/pts /mnt/dev/pts
mount -o bind /sys /mnt/sys
mount -o bind /proc /mnt/proc
cat /etc/resolv.conf >> /mnt/etc/resolv.conf
chroot /mnt

Configure and install packages.

apt-get install -y grub-efi-amd64 # (or grub-efi-amd64-signed; same as in step 3)
apt-get install -y mdadm

If you md devices are still sync'ing, you may see occasional warnings like:

/usr/sbin/grub-probe: warning: Couldn't find physical volume `(null)'. Some modules may be missing from core image..

This is normal and can be ignored (see answer at bottom of this question).

nano /etc/grub.d/10_linux
# change quick_boot and quiet_boot to 0

Disabling quick_boot will avoid the Diskfilter writes are not supported bugs. Disabling quiet_boot is of personal preference only.

Modify /etc/mdadm/mdadm.conf to remove any label references, i.e. change

ARRAY /dev/md/0 metadata=1.2 name=ubuntu:0 UUID=f0e36215:7232c9e1:2800002e:e80a5599
ARRAY /dev/md/1 metadata=1.2 name=ubuntu:1 UUID=4b42f85c:46b93d8e:f7ed9920:42ea4623

to

ARRAY /dev/md/0 UUID=f0e36215:7232c9e1:2800002e:e80a5599
ARRAY /dev/md/1 UUID=4b42f85c:46b93d8e:f7ed9920:42ea4623

This step may be unnecessary, but I've seen some pages suggest that the naming schemes may be unstable (name=ubuntu:0/1) and this may stop a perfectly fine RAID device from assembling during boot.

Modify lines in /etc/default/grub to read

#GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"
GRUB_CMDLINE_LINUX=""

Again, this step may be unnecessary, but I prefer to boot with my eyes open...

6.1. Add sleep script

(It has been suggested by the community that this step might be unnecessary and can be replaced using GRUB_CMDLINE_LINUX="rootdelay=30" in /etc/default/grub. For reasons explained at the bottom of this HOWTO, I suggest to stick with the sleep script even though it is uglier than using rootdelay. Thus, we continue with our regular program...)

Create a script that will wait for the RAID devices to settle. Without this delay, mounting of root may fail due to the RAID assembly not being finished in time. I found this out the hard way - the problem did not show up until I had disconnected one of the SSDs to simulate disk failure! The timing may need to be adjusted depending on available hardware, e.g. slow external USB disks, etc.

Enter the following code into /usr/share/initramfs-tools/scripts/local-premount/sleepAwhile:

#!/bin/sh
echo
echo "sleeping for 30 seconds while udevd and mdadm settle down"
sleep 5
echo "sleeping for 25 seconds while udevd and mdadm settle down"
sleep 5
echo "sleeping for 20 seconds while udevd and mdadm settle down"
sleep 5
echo "sleeping for 15 seconds while udevd and mdadm settle down"
sleep 5
echo "sleeping for 10 seconds while udevd and mdadm settle down"
sleep 5
echo "sleeping for 5 seconds while udevd and mdadm settle down"
sleep 5
echo "done sleeping"

Make the script executable and install it.

chmod a+x /usr/share/initramfs-tools/scripts/local-premount/sleepAwhile
update-grub
update-initramfs -u

7. Enable boot from the first SSD

Now the system is almost ready, only the UEFI boot parameters need to be installed.

mount /dev/sda1 /boot/efi
grub-install --boot-directory=/boot --bootloader-id=Ubuntu --target=x86_64-efi --efi-directory=/boot/efi --recheck
update-grub
umount /dev/sda1

This will install the boot loader in /boot/efi/EFI/Ubuntu (a.k.a. EFI/Ubuntu on /dev/sda1) and install it first in the UEFI boot chain on the computer.

8. Enable boot from the second SSD

We're almost done. At this point, we should be able to reboot on the sda drive. Furthermore, mdadm should be able to handle failure of either the sda or sdb drive. However, the EFI is not RAIDed, so we need to clone it.

dd if=/dev/sda1 of=/dev/sdb1

In addition to installing the boot loader on the second drive, this will make the UUID of the FAT32 file system on the sdb1 partition (as reported by blkid) match that of sda1 and /etc/fstab. (Note however that the UUIDs for the /dev/sda1 and /dev/sdb1 partitions will still be different - compare ls -la /dev/disk/by-partuuid | grep sd[ab]1 with blkid /dev/sd[ab]1 after the install to check for yourself.)

Finally, we must insert the sdb1 partition into the boot order. (Note: This step may be unnecessary, depending on your BIOS. I have gotten reports that some BIOS' automatically generates a list of valid ESPs.)

efibootmgr -c -g -d /dev/sdb -p 1 -L "Ubuntu #2" -l '\EFI\ubuntu\grubx64.efi'

I did not test it, but it is probably necessary to have unique labels (-L) between the ESP on sda and sdb.

This will generate a printout of the current boot order, e.g.

Timeout: 0 seconds
BootOrder: 0009,0008,0000,0001,0002,000B,0003,0004,0005,0006,0007
Boot0000  Windows Boot Manager
Boot0001  DTO UEFI USB Floppy/CD
Boot0002  DTO UEFI USB Hard Drive
Boot0003* DTO UEFI ATAPI CD-ROM Drive
Boot0004  CD/DVD Drive 
Boot0005  DTO Legacy USB Floppy/CD
Boot0006* Hard Drive
Boot0007* IBA GE Slot 00C8 v1550
Boot0008* Ubuntu
Boot000B  KingstonDT 101 II PMAP
Boot0009* Ubuntu #2

Note that Ubuntu #2 (sdb) and Ubuntu (sda) are the first in the boot order.

Reboot

Now we are ready to reboot.

exit # from chroot
exit # from sudo -s
sudo reboot

The system should now reboot into Ubuntu (You may have to remove the Ubuntu Live installation media first.)

After boot, you may run

sudo update-grub

to attach the Windows boot loader to the grub boot chain.

Virtual machine gotchas

If you want to try this out in a virtual machine first, there are some caveats: Apparently, the NVRAM that holds the UEFI information is remembered between reboots, but not between shutdown-restart cycles. In that case, you may end up at the UEFI Shell console. The following commands should boot you into your machine from /dev/sda1 (use FS1: for /dev/sdb1):

FS0:
\EFI\ubuntu\grubx64.efi

The first solution in the top answer of UEFI boot in virtualbox - Ubuntu 12.04 might also be helpful.

Simulating a disk failure

Failure of either RAID component device can be simulated using mdadm. However, to verify that the boot stuff would survive a disk failure I had to shut down the computer and disconnecting power from a disk. If you do so, first ensure that the md devices are sync'ed.

cat /proc/mdstat 

Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
md1 : active raid1 sdb3[2] sda3[0]
      216269952 blocks super 1.2 [2/2] [UU]
      bitmap: 2/2 pages [8KB], 65536KB chunk

md0 : active raid1 sda2[0] sdb2[2]
      33537920 blocks super 1.2 [2/2] [UU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

unused devices: <none>

In the instructions below, sdX is the failed device (X=a or b) and sdY is the ok device.

Disconnect a drive

Shutdown the computer. Disconnect a drive. Restart. Ubuntu should now boot with the RAID drives in degraded mode. (Celebrate! This is what you were trying to achieve! ;)

cat /proc/mdstat 

Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
md1 : active raid1 sda3[0]
      216269952 blocks super 1.2 [2/1] [U_]
      bitmap: 2/2 pages [8KB], 65536KB chunk

md0 : active raid1 sda2[0]
      33537920 blocks super 1.2 [2/1] [U_]
      bitmap: 0/1 pages [0KB], 65536KB chunk

unused devices: <none>

Recover from a failed disk

This is the process to follow if you have needed to replace a faulty disk. If you want to emulate a replacement, you may boot into a Ubuntu Live session and use

dd if=/dev/zero of=/dev/sdX

to wipe the disk clean before re-rebooting into the real system. If you just tested the boot/RAID redundancy in the section above, you can skip this step. However, you must at least perform steps 2 and 4 below to recover full boot/RAID redundancy for your system.

Restoring the RAID+boot system after a disk replacement requires the following steps:

  1. Partition the new drive.
  2. Add partitions to md devices.
  3. Clone the boot partition.
  4. Add an EFI record for the clone.

1. Partition the new drive

Copy the partition table from the healthy drive:

sudo sgdisk /dev/sdY -R /dev/sdX

Re-randomize UUIDs on the new drive.

sudo sgdisk /dev/sdX -G

2. Add to md devices

sudo mdadm --add /dev/md0 /dev/sdX2
sudo mdadm --add /dev/md1 /dev/sdX3

3. Clone the boot partition

Clone the ESP from the healthy drive. (Careful, maybe do a dump-to-file of both ESPs first to enable recovery if you really screw it up.)

sudo dd if=/dev/sdY1 of=/dev/sdX1

4. Insert the newly revived disk into the boot order

Add an EFI record for the clone. Modify the -L label as required.

sudo efibootmgr -c -g -d /dev/sdX -p 1 -L "Ubuntu #2" -l '\EFI\ubuntu\grubx64.efi'

Now, rebooting the system should have it back to normal (the RAID devices may still be sync'ing)!

Why the sleep script?

It has been suggested by the community that adding a sleep script might be unnecessary and could be replaced by using GRUB_CMDLINE_LINUX="rootdelay=30" in /etc/default/grub followed by sudo update-grub. This suggestion is certainly cleaner and does work in a disk failure/replace scenario. However, there is a caveat...

I disconnected my second SSD and found out that with rootdelay=30, etc. instead of the sleep script:
1) The system does boot in degraded mode without the "failed" drive.
2) In non-degraded boot (both drives present), the boot time is reduced. The delay is only perceptible with the second drive missing.

1) and 2) sounded great until I re-added my second drive. At boot, the RAID array failed to assemble and left me at the initramfs prompt without knowing what to do. It might have been possible to salvage the situation by a) booting to the Ubuntu Live USB stick, b) installing mdadm and c) re-assembling the array manually but...I messed up somewhere. Instead, when I re-ran this test with the sleep script (yes, I did start the HOWTO from the top for the nth time...), the system did boot. The arrays were in degraded mode and I could manually re-add the /dev/sdb[23] partitions without any extra USB stick. I don't know why the sleep script works whereas the rootdelay doesn't. Perhaps mdadm gets confused by two, slightly out-of-sync component devices, but I thought mdadm was designed to handle that. Anyway, since the sleep script works, I'm sticking to it.

It could be argued that removing a perfectly healthy RAID component device, re-booting the RAID to degraded mode and then re-adding the component device is an unrealistic scenario: The realistic scenario is rather that one device fails and is replaced by a new one, leaving less opportunity for mdadm to get confused. I agree with that argument. However, I don't know how to test how the system tolerates a hardware failure except to actually disable some hardware! And after testing, I want to get back to a redundant, working system. (Well, I could attach my second SSD to another machine and swipe it before I re-add it, but that's not feasible.)

In summary: To my knowledge, the rootdelay solution is clean, faster than the sleep script for non-degraded boots, and should work for a real drive failure/replace scenario. However, I don't know a feasible way to test it. So, for the time being, I will stick to the ugly sleep script.