Ubuntu 24.04, Dracut and Native ZFS Encryption

Ubuntu 24.04, Dracut and Native ZFS Encryption

The Situation

Recently I got interested in using systemd-cryptenroll to setup automatic unlocking of my Ubuntu ZFS root filesystem. systemd-cryptenroll provides very nice support for a range of unlocking measures,

initramfs-tools doesn't provide systemd-cryptsetup, but dracut does. So setting up your disks with systemd-cryptenroll just means you can't use any of the advanced unlock features like tpm2 or fido.

Ubuntu includes dracut, so we can just install that:

apt install -y dracut zfs-dracut

And it'll work on right?

The Problem

If you just had ZFS without encryption, or had ZFS sitting over a LUKS device this would probably work. But it doesn't work with ZFS with native encryption, and you get dumped to the emergency shell instead.

If you run journalctl -a and poke around, the problem becomes obvious - dracut tries to mount the ZFS root and fails because the file system key isn't available under /run/keystore/rpool.

The problem is a kind of interesting sequencing issue: the zfs-dracut modules know how to mount a filesystem, and they know how to look for and wait for a key-store to become available at a mountpath...but dracut has no idea how or where to find and mount the LUKS-encrypted ZFS volume which Ubuntu's regular store uses to hold the key.

Specifically: on a default out of the box Ubuntu with ZFS and native encryption install, you'll have a layout like this:

NAME                                               USED  AVAIL  REFER  MOUNTPOINT
bpool                                              167M  1.59G    96K  /boot
bpool/BOOT                                         166M  1.59G    96K  none
bpool/BOOT/ubuntu_sqtmt2                           166M  1.59G  87.4M  /boot
rpool                                             4.38G  85.2G   192K  /
rpool/ROOT                                        4.34G  85.2G   192K  none
rpool/ROOT/ubuntu_sqtmt2                          4.34G  85.2G  3.12G  /
rpool/ROOT/ubuntu_sqtmt2/srv                       192K  85.2G   192K  /srv
rpool/ROOT/ubuntu_sqtmt2/usr                       656K  85.2G   192K  /usr
rpool/ROOT/ubuntu_sqtmt2/usr/local                 464K  85.2G   384K  /usr/local
rpool/ROOT/ubuntu_sqtmt2/var                      1.13G  85.2G   192K  /var
rpool/ROOT/ubuntu_sqtmt2/var/games                 192K  85.2G   192K  /var/games
rpool/ROOT/ubuntu_sqtmt2/var/lib                  1.11G  85.2G   971M  /var/lib
rpool/ROOT/ubuntu_sqtmt2/var/lib/AccountsService   456K  85.2G   268K  /var/lib/AccountsService
rpool/ROOT/ubuntu_sqtmt2/var/lib/NetworkManager    472K  85.2G   260K  /var/lib/NetworkManager
rpool/ROOT/ubuntu_sqtmt2/var/lib/apt              84.3M  85.2G  84.0M  /var/lib/apt
rpool/ROOT/ubuntu_sqtmt2/var/lib/dpkg             83.2M  85.2G  59.8M  /var/lib/dpkg
rpool/ROOT/ubuntu_sqtmt2/var/log                  16.6M  85.2G  15.6M  /var/log
rpool/ROOT/ubuntu_sqtmt2/var/mail                  192K  85.2G   192K  /var/mail
rpool/ROOT/ubuntu_sqtmt2/var/snap                 2.22M  85.2G  2.22M  /var/snap
rpool/ROOT/ubuntu_sqtmt2/var/spool                 356K  85.2G   252K  /var/spool
rpool/ROOT/ubuntu_sqtmt2/var/www                   192K  85.2G   192K  /var/www
rpool/USERDATA                                    11.1M  85.2G   192K  none
rpool/USERDATA/home_fxfmf6                        3.44M  85.2G  3.44M  /home
rpool/USERDATA/root_fxfmf6                         388K  85.2G   388K  /root
rpool/USERDATA/will_w6bp5u                        7.14M  85.2G  3.73M  /home/will
rpool/keystore                                    22.5M  85.3G  16.5M  -

The bottom line - this one:

rpool/keystore                                    22.5M  85.3G  16.5M  -

is a ZFS volume which holds an ext4 filesystem, which has a single file in it - system.key which is specified as the encryption key for the entire rpool system (i.e. everywhere except this volume is setup this way).

$ zfs get keylocation rpool
NAME   PROPERTY     VALUE                                  SOURCE
rpool  keylocation  file:///run/keystore/rpool/system.key  local

We need this path to start existing while the ZFS mount scripts are looking for it - or before.

The Solution

The only reference I could find to this bug for Ubuntu specifically was here, and I've posted this solution there too.

I tried a number of approaches, but the punchline seems to be a combination of two problems:

  1. dracut doesn't include /etc/crypttab unless you specify --host-only - which under this configuration causes it to then not include the LUKS decryption at all because it can't tell we'll need it.

  2. Even if dracut did include this, it can't properly detect root in /etc/fstab, and has no knowledge of how to determine what the LUKS volume is or where it should mount it.

It feels like zfs-dracut should be able to solve this problem for itself - i.e. at dracut time if you see a ZFS root, check for a file-based keylocation parameter, and then check for the mountpoint of that, then check if the mountpoint is on a device which in turn is on a LUKS device - but that's a heck'in chain of causality not handled yet.

In the mean time, the fix is to create two files:

cat << EOF > /etc/dracut.conf.d/00-crypttab.conf
install_items+=" /etc/crypttab "
EOF

This forces /etc/crypttab into the initramfs that dracut builds. This means dracut will detect and try to mount whatever is in it at the time. Your /etc/crypttab will then need the following line:

keystore-rpool /dev/zvol/rpool/keystore

If you want to use tpm or fido (as I do then it would look like):

keystore-rpool /dev/zvol/rpool/keystore - tpm-device=auto,fido2-device=auto

The second file you must create tells dracut to force a line into the /etc/fstab file it creates, but it does this by using an undocumented feature - dracut has the command line --mount option which will do this, and that option is simply modifying shell variables which is how the dracut.conf file works. So by inspecting dracut's script, we can see the following is the correct way to add fstab lines:

cat << EOF > /etc/dracut.conf.d/01-keystore-rpool-mnt.conf
fstab_lines+=" /dev/mapper/keystore-rpool /run/keystore/rpool auto "

Whitespace and truncanting the ordering and fsck'ing numbers is mandatory.

Run dracut -f to rebuild your initramfs and then reboot. If you have done nothing else, you should just be prompted by systemd-cryptsetup to enter your passed for /run/keystore/rpool and when it unlocks ZFS will mount and eveything will boot successfully. This is a lot faster then clevis-initramfs I find.

Enrolling Your TPM

Unfortunately due to the weird behavior of dracut with regards to host-only we also need to add another override file here:

cat << EOF > /etc/dracut.conf.d/02-crypt-libs.conf
add_dracutmodules+=" tpm2-tss fido2 "

You need to run dracut -f and reboot before the next step - this is so your PCR registers will be correct.

Enrolling your TPM is simple enough if it's setup - the following command is appropriate:

systemd-cryptenroll --tpm2-device=auto --tpm2-pcsr=0+1+2+3+4+7+8+9+14 /dev/zvol/rpool/keystore

This is IMO the minimum you should do, but it will break pretty easily (rebuilding with dracut, or a kernel upgrade) will disable it. From the Linux TPM PCR registry.

PCR's 0-7 are fairly boring system level stuff (7 is the activation of Secure Boot). But you want 8 and 9 specifically since they determine if your TPM can just be bypassed - without 8 or 9, an attacker can replace your kernel, initramfs, or just mess with command line settings in grub and still have the TPM unlock the volume (and thus probably give them access to all your files).

Of course if you're just trying to protect the hard disk against being readable once you take it out of the computer, sealing against the TPM is fine (in fact PCR 7 - the default, might be all you need against a not particularly determined attacker).

This does of course leave some problems with using an initramfs at all: if the initramfs drops you into a root shell for any reason other then the TPM is unable to unlock the keys (because the initramfs has changed) then someone just got root access to your filesystem.

Enrolling Your YubiKey

Since your login is pretty likely to break if there's a surprise kernel upgrade, it's a good idea to also enroll a hardware token. There's a little bit of a conflict here in threat-model: the TPM notionally protects us against the "evil maid" attack - if someone comes in and messes with your laptop in a hotel room, then it'll refuse to decrypt anything...but then of course, you enter the password and run the compromised bootloader anyway possibly.

The Yubikey enrollment (or any FIDO2 authenticator) has the same problem - they might not get your password, but who knows what they did by putting a signed kernel and a custom initramfs onto your system partition. Just be aware of where the limits are here.

Enrollment is the same as before - plug in your token, and run:

systemd-cryptenroll --fido2-device=auto /dev/zvol/rpool/keystore

Then follow the prompts. In my case:

$#$ systemd-cryptenroll --fido2-device=auto /dev/zvol/rpool/keystore
🔐 Please enter current passphrase for disk /dev/zvol/rpool/keystore: •••••••••••••••
Initializing FIDO2 credential on security token.
👆 (Hint: This might require confirmation of user presence on security token.)
🔐 Please enter security token PIN: ••••••••                
Generating secret key on FIDO2 security token.
👆 In order to allow secret key generation, please confirm presence on security token.
New FIDO2 token enrolled as key slot 2.

This follows however your token is setup. Mine requires both a PIN and user presence for FIDO2, so I get prompted for those.

We can test this out by deliberalyte running dracut -f which will break our TPM config and then rebooting.

Remaining Problems

This gets you most of the way there, but not all the way - the problem with this approach is still a sequencing one, the ZFS mount scripts only wait around for you because they're expected udev to bring up a block device which will be correctly mounted. You have about 10-15 seconds with password auth before you'll get dumped to an emergency shell anyway.

That seems to be a ZFS mount script problem though - I suspect to fix it they'd have to - again - know that they're actually waiting on a password prompt and not just failing to mount.