Ubuntu 24.04, Dracut and Native ZFS Encryption

The Situation

Recently I got interested in using systemd-cryptenroll to setup automatic unlocking of my Ubuntu ZFS root filesystem. systemd-cryptenroll provides very nice support for a range of unlocking measures,

initramfs-tools doesn't provide systemd-cryptsetup, but dracut does. So setting up your disks with systemd-cryptenroll just means you can't use any of the advanced unlock features like tpm2 or fido.

Ubuntu includes dracut, so we can just install that:

apt install -y dracut zfs-dracut

And it'll work on right?

The Problem

If you just had ZFS without encryption, or had ZFS sitting over a LUKS device this would probably work. But it doesn't work with ZFS with native encryption, and you get dumped to the emergency shell instead.

If you run journalctl -a and poke around, the problem becomes obvious - dracut tries to mount the ZFS root and fails because the file system key isn't available under /run/keystore/rpool.

The problem is a kind of interesting sequencing issue: the zfs-dracut modules know how to mount a filesystem, and they know how to look for and wait for a key-store to become available at a mountpath...but dracut has no idea how or where to find and mount the LUKS-encrypted ZFS volume which Ubuntu's regular store uses to hold the key.

Specifically: on a default out of the box Ubuntu with ZFS and native encryption install, you'll have a layout like this:

NAME                                               USED  AVAIL  REFER  MOUNTPOINT
bpool                                              167M  1.59G    96K  /boot
bpool/BOOT                                         166M  1.59G    96K  none
bpool/BOOT/ubuntu_sqtmt2                           166M  1.59G  87.4M  /boot
rpool                                             4.38G  85.2G   192K  /
rpool/ROOT                                        4.34G  85.2G   192K  none
rpool/ROOT/ubuntu_sqtmt2                          4.34G  85.2G  3.12G  /
rpool/ROOT/ubuntu_sqtmt2/srv                       192K  85.2G   192K  /srv
rpool/ROOT/ubuntu_sqtmt2/usr                       656K  85.2G   192K  /usr
rpool/ROOT/ubuntu_sqtmt2/usr/local                 464K  85.2G   384K  /usr/local
rpool/ROOT/ubuntu_sqtmt2/var                      1.13G  85.2G   192K  /var
rpool/ROOT/ubuntu_sqtmt2/var/games                 192K  85.2G   192K  /var/games
rpool/ROOT/ubuntu_sqtmt2/var/lib                  1.11G  85.2G   971M  /var/lib
rpool/ROOT/ubuntu_sqtmt2/var/lib/AccountsService   456K  85.2G   268K  /var/lib/AccountsService
rpool/ROOT/ubuntu_sqtmt2/var/lib/NetworkManager    472K  85.2G   260K  /var/lib/NetworkManager
rpool/ROOT/ubuntu_sqtmt2/var/lib/apt              84.3M  85.2G  84.0M  /var/lib/apt
rpool/ROOT/ubuntu_sqtmt2/var/lib/dpkg             83.2M  85.2G  59.8M  /var/lib/dpkg
rpool/ROOT/ubuntu_sqtmt2/var/log                  16.6M  85.2G  15.6M  /var/log
rpool/ROOT/ubuntu_sqtmt2/var/mail                  192K  85.2G   192K  /var/mail
rpool/ROOT/ubuntu_sqtmt2/var/snap                 2.22M  85.2G  2.22M  /var/snap
rpool/ROOT/ubuntu_sqtmt2/var/spool                 356K  85.2G   252K  /var/spool
rpool/ROOT/ubuntu_sqtmt2/var/www                   192K  85.2G   192K  /var/www
rpool/USERDATA                                    11.1M  85.2G   192K  none
rpool/USERDATA/home_fxfmf6                        3.44M  85.2G  3.44M  /home
rpool/USERDATA/root_fxfmf6                         388K  85.2G   388K  /root
rpool/USERDATA/will_w6bp5u                        7.14M  85.2G  3.73M  /home/will
rpool/keystore                                    22.5M  85.3G  16.5M  -

The bottom line - this one:

rpool/keystore                                    22.5M  85.3G  16.5M  -

is a ZFS volume which holds an ext4 filesystem, which has a single file in it - system.key which is specified as the encryption key for the entire rpool system (i.e. everywhere except this volume is setup this way).

$ zfs get keylocation rpool
NAME   PROPERTY     VALUE                                  SOURCE
rpool  keylocation  file:///run/keystore/rpool/system.key  local

We need this path to start existing while the ZFS mount scripts are looking for it - or before.

The Solution

The only reference I could find to this bug for Ubuntu specifically was here, and I've posted this solution there too.

I tried a number of approaches, but the punchline seems to be a combination of two problems:

dracut doesn't include /etc/crypttab unless you specify --host-only - which under this configuration causes it to then not include the LUKS decryption at all because it can't tell we'll need it.
Even if dracut did include this, it can't properly detect root in /etc/fstab, and has no knowledge of how to determine what the LUKS volume is or where it should mount it.

It feels like zfs-dracut should be able to solve this problem for itself - i.e. at dracut time if you see a ZFS root, check for a file-based keylocation parameter, and then check for the mountpoint of that, then check if the mountpoint is on a device which in turn is on a LUKS device - but that's a heck'in chain of causality not handled yet.

In the mean time, the fix is to create two files:

cat << EOF > /etc/dracut.conf.d/00-crypttab.conf
install_items+=" /etc/crypttab "
EOF

This forces /etc/crypttab into the initramfs that dracut builds. This means dracut will detect and try to mount whatever is in it at the time. Your /etc/crypttab will then need the following line:

keystore-rpool /dev/zvol/rpool/keystore

If you want to use tpm or fido (as I do then it would look like):

keystore-rpool /dev/zvol/rpool/keystore - tpm-device=auto,fido2-device=auto

The second file you must create tells dracut to force a line into the /etc/fstab file it creates, but it does this by using an undocumented feature - dracut has the command line --mount option which will do this, and that option is simply modifying shell variables which is how the dracut.conf file works. So by inspecting dracut's script, we can see the following is the correct way to add fstab lines:

cat << EOF > /etc/dracut.conf.d/01-keystore-rpool-mnt.conf
fstab_lines+=" /dev/mapper/keystore-rpool /run/keystore/rpool auto "

Whitespace and truncanting the ordering and fsck'ing numbers is mandatory.

Run dracut -f to rebuild your initramfs and then reboot. If you have done nothing else, you should just be prompted by systemd-cryptsetup to enter your passed for /run/keystore/rpool and when it unlocks ZFS will mount and eveything will boot successfully. This is a lot faster then clevis-initramfs I find.

Enrolling Your TPM

Unfortunately due to the weird behavior of dracut with regards to host-only we also need to add another override file here:

cat << EOF > /etc/dracut.conf.d/02-crypt-libs.conf
add_dracutmodules+=" tpm2-tss fido2 "

You need to run dracut -f and reboot before the next step - this is so your PCR registers will be correct.

Enrolling your TPM is simple enough if it's setup - the following command is appropriate:

systemd-cryptenroll --tpm2-device=auto --tpm2-pcsr=0+1+2+3+4+7+8+9+14 /dev/zvol/rpool/keystore

This is IMO the minimum you should do, but it will break pretty easily (rebuilding with dracut, or a kernel upgrade) will disable it. From the Linux TPM PCR registry.

PCR's 0-7 are fairly boring system level stuff (7 is the activation of Secure Boot). But you want 8 and 9 specifically since they determine if your TPM can just be bypassed - without 8 or 9, an attacker can replace your kernel, initramfs, or just mess with command line settings in grub and still have the TPM unlock the volume (and thus probably give them access to all your files).

Of course if you're just trying to protect the hard disk against being readable once you take it out of the computer, sealing against the TPM is fine (in fact PCR 7 - the default, might be all you need against a not particularly determined attacker).

This does of course leave some problems with using an initramfs at all: if the initramfs drops you into a root shell for any reason other then the TPM is unable to unlock the keys (because the initramfs has changed) then someone just got root access to your filesystem.

Enrolling Your YubiKey

Since your login is pretty likely to break if there's a surprise kernel upgrade, it's a good idea to also enroll a hardware token. There's a little bit of a conflict here in threat-model: the TPM notionally protects us against the "evil maid" attack - if someone comes in and messes with your laptop in a hotel room, then it'll refuse to decrypt anything...but then of course, you enter the password and run the compromised bootloader anyway possibly.

The Yubikey enrollment (or any FIDO2 authenticator) has the same problem - they might not get your password, but who knows what they did by putting a signed kernel and a custom initramfs onto your system partition. Just be aware of where the limits are here.

Enrollment is the same as before - plug in your token, and run:

systemd-cryptenroll --fido2-device=auto /dev/zvol/rpool/keystore

Then follow the prompts. In my case:

$#$ systemd-cryptenroll --fido2-device=auto /dev/zvol/rpool/keystore
🔐 Please enter current passphrase for disk /dev/zvol/rpool/keystore: •••••••••••••••
Initializing FIDO2 credential on security token.
👆 (Hint: This might require confirmation of user presence on security token.)
🔐 Please enter security token PIN: ••••••••                
Generating secret key on FIDO2 security token.
👆 In order to allow secret key generation, please confirm presence on security token.
New FIDO2 token enrolled as key slot 2.

This follows however your token is setup. Mine requires both a PIN and user presence for FIDO2, so I get prompted for those.

We can test this out by deliberalyte running dracut -f which will break our TPM config and then rebooting.

Remaining Problems

This gets you most of the way there, but not all the way - the problem with this approach is still a sequencing one, the ZFS mount scripts only wait around for you because they're expected udev to bring up a block device which will be correctly mounted. You have about 10-15 seconds with password auth before you'll get dumped to an emergency shell anyway.

That seems to be a ZFS mount script problem though - I suspect to fix it they'd have to - again - know that they're actually waiting on a password prompt and not just failing to mount.

Patching Ubuntu Packages with dgit and pbuilder

Will Rouesnel

2025-03-23 09:57

The Situation¶

What I want from my open-source operating system (Ubuntu) is a sensible way to make "many eyes make all bugs shallow" a reality. At least for the obvious stuff - like the UI.

To this end I need a couple of things:

the ability to get the source code
the ability to build and package the code
the ability to install that package onto my own system safely to test it - and uninstall it and revert back to the regular package if I cannot.
the ability to upstream my changes

This is, in my experience, hard with Ubuntu - or at least certainly not a streamlined process like I believe it should be.

Fortunately the situation today is a lot better then it was, but there are many false starts out there on the web. In this post I am recreating the notes and code which I used to do this again recently, along with the justifications why. There are obvious shortcomings, that aren't easy to fill (namely, rootless Podman should mean I can do this all without sudo except for the final sudo apt upgrade - but that's not the case today, not flexibly).

Problem¶

Our problem is that we want to be able to pull the source code of a given package - easily. We may need to pull the code of several packages so we can patch one, and rebuild many against it - so things we build need to become available to the next builds.

And we'd like to be able to track this in source control, and public PPAs from it if it proves useful. Because it's just the thing to do.

Tools¶

You will need the following:

dgit
pbuilder
devscripts

apt install -y dgit pbuilder devscripts

dgit is the real magic in this process - it takes a bunch of things which should be possible but are not talked about extensively online and makes them automated, and well documented - the dgit man pages are excellent.

The second part of this mix is pbuilder. pbuilder has a lot of variants, but when you get right down to it I've found that basic pbuilder is the easiest to get running: the tooling for things like cowbuilder qemubuilder and dbocker all tend to feel either under-developed (dbocker) or have limitations which limit easy usage (qemubuilder can't use bind mounts, so what we're doing here won't work.)

Setting up pbuilder¶

Getting pbuilder working smoothly on your system is key to making this a pleasant experience. Anyone can, with some effort, somehow get a package to build - but doing it in a way where you can get your patched code running quickly, and restore it just as easily is the key to making it accessible.

Testing environments are good and all, but for the "I just need this one feature" sort of user patch of system-level or networking software (in my case NetworkManager), it's too much work for too little reward and tends not to properly fix the problem. The hacker spirit needs a duct-taped proof before it's productionized (and for bugs the question is "have you fixed it in the circumstance - your system - where it comes up").

My solution for setting up pbuilder is in this Github repository here (by me).

In short, ./setup-pbuilder-repo here will install a .pbuilderrc file plus a bunch of hook scripts, and setup a local repository for apt with a signing key marked as pbuilder-repo. This is the standard chroot based classic pbuilder - there is some support for trying to use qemubuilder or cowbuilder, but in my experience both of these created more problems then they solved. Something clever could be done here, but for the time and effort you could also just run a full Ubuntu VM and do your building there - our goal is packaging, not reinventing containerization.

Simply clone the repository, run ./setup-pbuilder-repo and then run sudo pbuilder create to setup a local repository.

Building a local package which upgrades the existing one¶

With the pbuilder environment setup, the process for building a new version of a local package looks like this (pretty much from the dgit-user page):

Increment the version numbers by the amount needed (and write a new commit log):

gbp dch -S --local wrouesnel --since=dgit/dgit/jammy --ignore-branch --commit

Build with our pbuilder setup:

dgit pbuilder

If you get a message like:

Format `3.0 (quilt)', need to check/update patch stack
Would remove .idea/.gitignore
Would remove .idea/misc.xml
Would remove .idea/vcs.xml
Would remove configure~
Would remove install-sh~

dgit: error: tree contains uncommitted files (NB dgit didn't run rules clean)
dgit: If this is just missing .gitignore entries, use a different clean
dgit: mode, eg --clean=dpkg-source,no-check (-wdn/-wddn) to ignore them
dgit: or --clean=git (-wg/-wgf) to use `git clean' instead.

then as noted dgit is telling you the answer - you have extraneous files (common if like me you used a Jetbrains IDE to browse and edit). Run git --reset hard to get rid of fluff (modified autotools files usually) and then run:

dgit --clean=dpkg-source,no-check pbuilder

to build while ignoring anything extra.

If you're just building binaries for your local system, then with this setup this is enough. After the build you can apt upgrade and your new build will be installed.

Setting up a PPA¶

For ease of use (or if you're doing this because you want to apply it to a cluster) then first build and sign a source package:

First you'll want to declare a release for the version of your OS (I use Ubuntu so these will be Ubuntu specific):

gbp dch -R --local wrouesnel --since=dgit/dgit/jammy --ignore-branch --commit

This lets you edit the changelog (on my system despite my efforts dch is convinced it should use my old email address from somewhere and I cannot figure out from where). Note that I had a lot of trouble getting a perfectly clean history out of git automatically with gbp here - it liked to pick up it's own changelog commits as things it wanted to include - I assume I'm using it wrong somehow, but documentation on other tools is tricky.

Then, you want to build a signed source package for your changelog:

dgit --clean=dpkg-source,no-check -k${PPA_SIGNING_KEYID} build -S

where ${PPA_SIGNING_KEYID} is the key ID you registered with Launchpad.

Then just push it to the PPA:

dput ppa:${PPA_NAME} <your package.changes file>

If dput complains about missing or incorrect signatures (I had an issue with missing signatures despite the gbp commands above) then it's easily fixed by running the debsign command on the *.dsc and *.changes files:

debsign -k${PPA_SIGNING_KEYID} <your package name>.dsc
debsign -k${PPA_SIGNING_KEYID} <your package name>_source.changes

This will either sign it in place, or detect a signature and ask if you want to replace it (which is useful if you missigned initially). Then just retry the dput upload.

TPM Secured GPG Keys which never touch the hard disk

Will Rouesnel

2025-03-22 22:38

Background¶

gpg >= v2.3 has supported TPMs natively. This support works totally fine for some applications via the keytotpm function.

However this function is very specific in the operation it performs: the keytotpm function encrypts the key you have on-disk, in place, with the TPM's RSA2048 key. What this ensures is that if your disk and the computer's TPM are separated, the key is effectively unreadable. In fact if the key file is used anywhere but your computer, it is also unusable.

Combined with a password on your keyring, this is excellent protection against many attacks.

It does have one significant drawback though: (1) the key material, at some point, exists in an unprotected form on the computer.

Situation¶

Our situation then is that we would like to generate a TPM-backed key where the private key is never exposed at all. Essentially our presumed adversary might collect any unsecured key material on the PC at any point.

This isn't a super-practical scenario in a lot of cases: i.e. with this much control, you could also just use the key by whatever compromise and key-logging you deployed.

But there are some practical advantages here: we can generate and sign this key using another key by more trustworthy means, such as GPG keys held on a Yubikey or similar device.

It's also a practical way to grant access to key material for multiple processes - i.e. if we wanted to run a bunch of CI/CD runner processes where attestations of the runner identity were needed by client code - since all cryptography will be done by the TPM, and the key will never leave the TPM even to be used, we can use this as a reasonable proof that whatever else was done, it was done on a specific runner.

Solution¶

Our solution is obvious: we want to create a GPG key with private key material which is stored on the TPM and can never leave it. The tools to do this exist. We will do this by leveraging the PKCS11 system, and the various interop tools to run it.

Step 1: Initialize a new key in the TPM¶

We use tpm2_ptool (apt install libtpm2-pks11-tools) to setup a new token in the TPM:

# Initialize a new store. The store retains some data, but will not contain key material.
tpm2_ptool
# Prompt user for pin numbers - see below for explanation
read -r -s -p "Enter User PIN: " uspin
echo
read -r -s -p "Enter Mgmt PIN: " sopin
echo

# Add a new token to the store
tpm2_ptool addtoken --pid=1 --label="gpg" --userpin=$uspin --sopin=$sopin

# Add a new key to the token - this generates the private key
tpm2_ptool addkey --label="gpg" --key-label="gpg" --userpin=$uspin --algorithm=rsa2048

A note on pin numbers: you can leave the "User PIN" blank - this will enable using the key without prompting. Be aware that the consequences of this are that anyone with access to the motherboard of your PC though can use but not copy your key. This extends to multi-user systems where anyone in the tss group (on Ubuntu at least) will be able to do the same.

In a practical sense you should set the user pin to a short word - it's generally 4-6 characters, but they can be any characters. When you use the key, you enter the pin, and if someone tries to brute force it then the TPM will lockout the pin after a certain number of attempts. This is what the "Mgmt PIN" is for - which can be used to unlock a locked user PIN but not use the key the User PIN protects.

All of this is inherited from smartcard logic. In a practical sense, it would be safe to stick the User PIN in your system keyring, unlocked by your user account at boot, and not think about it.

Step 2: Generate and Sign an X509 certificate the key¶

To get this all working, we're leveraging a couple of different tools: apt install libtpm2-pkcs11-1 libtpm2-pkcs11-tools gnupg-pkcs11-scd gnutls-bin libnss3-tools p11-kit

You want to run p11-kit list-modules which will give you an output (on Ubuntu 24.04) which looks something like this:

$ p11-kit list-modules
module: p11-kit-trust
    path: /usr/lib/x86_64-linux-gnu/pkcs11/p11-kit-trust.so
    uri: pkcs11:library-description=PKCS%2311%20Kit%20Trust%20Module;library-manufacturer=PKCS%2311%20Kit
    library-description: PKCS#11 Kit Trust Module
    library-manufacturer: PKCS#11 Kit
    library-version: 0.25
    token: System Trust
        uri: pkcs11:model=p11-kit-trust;manufacturer=PKCS%2311%20Kit;serial=1;token=System%20Trust
        manufacturer: PKCS#11 Kit
        model: p11-kit-trust
        serial-number: 1
        hardware-version: 0.25
        flags:
              write-protected
              token-initialized
module: opensc-pkcs11
    path: /usr/lib/x86_64-linux-gnu/pkcs11/opensc-pkcs11.so
    uri: pkcs11:library-description=OpenSC%20smartcard%20framework;library-manufacturer=OpenSC%20Project
    library-description: OpenSC smartcard framework
    library-manufacturer: OpenSC Project
    library-version: 0.25
    token: OpenPGP card (User PIN)
        uri: pkcs11:model=PKCS%2315%20emulated;manufacturer=Yubico;serial=000618103012;token=OpenPGP%20card%20%28User%20PIN%29
        manufacturer: Yubico
        model: PKCS#15 emulated
        serial-number: 000618103012
        hardware-version: 3.4
        firmware-version: 3.4
        flags:
              rng
              login-required
              user-pin-initialized
              token-initialized
    token: OpenPGP card (User PIN (sig))
        uri: pkcs11:model=PKCS%2315%20emulated;manufacturer=Yubico;serial=000618103012;token=OpenPGP%20card%20%28User%20PIN%20%28sig%29%29
        manufacturer: Yubico
        model: PKCS#15 emulated
        serial-number: 000618103012
        hardware-version: 3.4
        firmware-version: 3.4
        flags:
              rng
              login-required
              user-pin-initialized
              token-initialized
module: tpm2_pkcs11
    path: /usr/lib/x86_64-linux-gnu/pkcs11/libtpm2_pkcs11.so
    uri: pkcs11:library-description=TPM2.0%20Cryptoki;library-manufacturer=tpm2-software.github.io
    library-description: TPM2.0 Cryptoki
    library-manufacturer: tpm2-software.github.io
    library-version: 1.9
    token: 
        uri: pkcs11:model=AMD%00%00%00%00%00%00%00%00%00%00%00%00%00;manufacturer=AMD;serial=0000000000000000;token=
        manufacturer: AMD
        model: AMD
        serial-number: 0000000000000000
        hardware-version: 1.38
        firmware-version: 3.37
        flags:
              rng
              login-required

The line we want is module tpm2_pkcs11 since that gives us the filepath to the TPM PKCS module. Run the following to store it:

tpm_lib="$(p11-kit list-modules | grep -A1 'module: tpm2_pkcs11' | tail -n1 | sed 's/^\s*//g' | cut -d' ' -f2)"
# Check this worked on your machine.
echo "$tpm_lib"

Our next step is to get the URIs we need - make sure you're in the same terminal session with all the environment variables we set above and run:

token_uri="$(p11tool --list-token-urls | grep token=gpg)"
private_uri="$(p11tool --list-privkeys --login --only-urls --set-pin=${uspin} ${private_uri})"

# Check the private key is accessible - this should display success
p11tool --test-sign --login --set-pin=${uspin} "${private_uri}"

What this has done is recorded the URLs we need from the PKCS libraries which represent the public and private keys in the TPM - access which is being provided by the tpm2_pkcs11 module.

At this point many guides refer to using OpenSSL to generate and sign certificates - I could not find a way to make this work on Ubuntu 22.04 or 24.04, hitting errors with the OpenSSL pkcs11 engine everytime. The working solution was to use certtool which is provided by gnutls-bin.

We'll be using it in self-signed mode: you could obviously use a CA and setup a more complicated system etc. but all of this is just to support GPG recognizing and being able to issue keys from the TPM - which the X509 story is a pre-requisite, but unused otherwise.

So firstly run this to setup your common name:

read -r -p "Enter Name (firstname lastname):" name
read -r -p "Enter Email (user@domain):" email

These are superfluous in many ways - we're just constructing a CN we'll recognized when importing later. In my application the goal was a GPG key which identified me and would be trust-signed by the keys on my YubiKey, which in turn would be signed by an offline master key (or one stored with a strong password and TPM backed on my main machine, with a paper backup somewhere).

We now need to emit a template file for the certificate (there's a lot you can do here - again, look it up if you want to use more X509 features):

template_ini="$(mktemp template.XXXXXXX.ini)"
cat << EOF > "$template.ini"
cn = "${name}"
serial = $(date --utc +%Y%m%d%H%M%S)
expiration_days = 365
email = "${email}"
signing_key
encryption_key
cert_signing_key
EOF

I'm not sure if these are optimal parameters for the application, but again, the GPG key will be totally independent of this certificate once created - this is a handle to use the TPM's key material.

Generate the new self-signed certificate:

GNUTLS_PIN="${uspin}" certtool --generate-self-signed --template "$template_ini" \
    --load-private "${private_uri}" --outfile "${name}.crt"

And then we can add the certificate to the TPM:

tpm2_ptool addcert --label=gpg --key-label=gpg "${name}.crt"

And that's a measure of success! At this point we have a certificate and key loaded securely onto the system TPM, protected by a PIN, and the private key is completely unexportable (barring your thoughts on either a backdoor existing, or someone decapping the chip and reading the bits out of the secure enclave at enormous time and expense - but we'll see GPG provides us a mitigation for that too).

Step 3: Configure GPG to use gnupg-pkcs-scd¶

gnupg-pkcs-scd is a deplacement scdaemon (Smart Card Daemon) for gpg which allows interfacing with the PKCS11 stack as a source of smart cards. Specifically - it emulates the OpenPGP card standard for GPG. This is very useful if you're using PIV mode on a YubiKey, CAC cards or other enterprisey things - it's also useful for what we're doing here.

So first - create the configuration file:

cat << EOF > "$HOME/.gnupg/gnupg-pkcs11.conf"
provider tpm
provider-tpm-library "${tpm_lib}"
EOF

Note: you can have multiple providers and gnupg-pkcs-scd will merge them and present them as a single smart card to GPG: this is super useful if you have say, a YubiKey in PIV mode you also want to use - checkout the man page for more information.

Configure GPG to use the daemon:

cat << EOF >> "$HOME/.gnupg/gpg-agent.conf"
scdaemon-program $(command -v gnupg-pkcs11-scd)
pinentry-program $(command -v pinentry-gnome3)
EOF

Restart the daemon:

systemctl --user restart gpg-agent.service
gpg --card-status

Step 4: Import the keys¶

The instructions here aren't great because GPG is very prompt driven (fairly on the assumption this is an important thing you're doing - but it can be scripted I just haven't yet).

Firstly you need to get a listing of your key-grips from the agent - note "KEY-FRIEDLY" is not my spelling mistake, the protocol really writes that:

gpg-agent --server gpg-connect-agent << EOF 2>/dev/null | grep KEY-FRIEDNLY
SCD LEARN
EOF

If you only have a few keys then you can get the exact key grip with this command:

gpg-agent --server gpg-connect-agent << EOF 2>/dev/null | grep KEY-FRIEDNLY | grep gpg | cut -d' ' -f3
SCD LEARN
EOF

But do check that looks sensible.

Now you just need to import the key: GPG does promise to do a "hands off" import with Option 14 in the following command, but it will fail to detect any key-usages. So run the following, and select Option 13 when prompted - then select the keygrip you found above.

You'll be prompted to select usages - you likely want the line to read Current allowed actions: Sign Certify Encrypt. You can customize according to intended use, but for RSA that list is fine. You might want to add Authenticate if you want to use this key for say, SSH authorization. See this InfoSec Stack Overflow post for an interesting discussion about this.

The punchline is that because the underlying key is RSA, it can soundly to everything we need - the Sign/Certify distinction is left over from the days of DSA keys, which can't do things like encryption.

One final thing is ideally you should set your key expiry to 364 days: because above we set the X509 certificate to expire after 365 days. Again: the certificate doesn't actually affect anything GPG wise since we just use it to get a keygrip to access the key - but you should set expiries (and keep them reasonably short) and if we're going to do that, we might as well line things up (plus if you do start using the X509 part, then it'll be sensible).

Conclusions¶

I sank way too many hours into this chasing down a lot of commands which just didn't work. As far as I know, on Ubuntu 24.04, these commands do and do have the intended effect. If I've missed any packages for commands here, use apt-file or similar to find the necessary packages and install them.

Cryptographic things like this alway eat time because once setup they will get out of your way but until they're setup there's quite a lot of "annoying to undo" things.

Should you do this? Probably not - the use case of protecting the key from global surveillance like this is pretty low. There are some other wrinkles: i.e. it would take some work to get "the same" GPG key back out of the TPM if you lost the certificate entry in the daemon itself - but there's no secret key in it, so you can back that up however you want without worry.

For most users keytotpm is a much better and more versatile solution. But this is also a decent entry into the world of TPM key manipulation and those libaries can be used for other things (i.e. OpenVPN or SSH).

SQLAlchemy Enums - Careful what goes into the database

Will Rouesnel

2024-04-25 21:54

The Situation¶

SQLAlchemy is an obvious choice when you need to throw together anything dealing with databases in Python. There might be other options, there might be faster options, but if you need it done then SQLAlchemy will do it for you pretty well and very ergonomically.

The problem I ran into recently is dealing with Python enums recently. Or more specifically: I had a user input problem which obviously turned into an enum application side - I had a limited set of inputs I wanted to allow, because those were what we supported - and I didn't want strings all through my code testing for values.

So on the client side it's obvious: check if the string matches an enum value, and use that. The enum would look something like below:

In [1]:

from enum import Enum

class Color(Enum):
    RED = "red"
    GREEN = "green"
    BLUE = "blue"

Now from this, we have our second problem: storing this in the database. We want to not do work here - that's we're using SQLAlchemy, so we can have our commmon problems handled. And so, SQLAlchemy helps us - here's automatic enum type handling for us.

Easy - so our model using the declarative syntax, and typehints can be written as follows:

In [2]:

import sqlalchemy
from sqlalchemy.orm import Mapped, DeclarativeBase, Session, mapped_column
from sqlalchemy import create_engine, select, text

class Base(DeclarativeBase):
    pass

class TestTable(Base):
    __tablename__ = "test_table"
    id: Mapped[int] = mapped_column(primary_key=True)
    value: Mapped[Color]

This is essentially identical to the documentation we see above. And, if we run this in a sample program - it works!

In [3]:

engine = create_engine("sqlite://")

Base.metadata.create_all(engine)

with Session(engine) as session:
    # Create normal values
    for enum_item in Color:
        session.add(TestTable(value=enum_item))
    session.commit()

# Now try and read the values back
with Session(engine) as session:
    records = session.scalars(select(TestTable)).all()
    for record in records:
        print(record.value)

Color.RED
Color.GREEN
Color.BLUE

Right? We stored some enum's to the database and retreived them in simple elegant code. This is exactly what we want...right?

But the question is...what did we actually store? Let's extend the program to do a raw query to read back that table...

In [5]:

from sqlalchemy import text

with engine.connect() as conn:
    print(conn.execute(text("SELECT * FROM test_table;")).all())

[(1, 'RED'), (2, 'GREEN'), (3, 'BLUE')]

Notice the tuples: the second column, we see "RED", "GREEN" and "BLUE"...but our enum defines our colors as RED is "red". What's going on? And is something wrong here?

Depending how you view the situation, yes, but also no - but it's likely this isn't what you wanted either.

The primary reason to use SQLAlchemy enum types is to take advantage of something like PostgreSQL supporting native enum types in the database. Everywhere else in SQLAlchemy, when we define a python class - like we do with TestTable above - we're not defining a Python object, we're defining a Python object which is describing the database objects we want and how they'll behave.

And so long as we're using things that come from SQLAlchemy - and under-the-hood SQLAlchemy is converting that enum.Enum to sqlalchemy.Enum - then this makes complete sense. The enum we declare is declaring what values we store, and what data value they map too...in the sense that we might use the data elsewhere, in our application. Basically our database will hold the symbolic value RED and we interpret that as meaning "red" - but we reserve the right to change that interpretation.

But if we're coming at this from a Python application perspective - i.e. the reason we made an enum - we likely have a different view of the problem. We're thinking "we want the data to look a particular way, and then to refer to it symbolically in code which we might change" - i.e. the immutable element is the data, the value, of the enum - because that's what we'll present to the user, but not what we want to have all over the application.

In isolation these are separate problems, but automatic enum handling makes the boundary here fuzzy: because while the database is defined in our code, from one perspective, it's also external to it - i.e. we may be writing code which is meant to simply interface with and understand a database not under our control. Basically, the enum.Enum object feels like it's us saying "this is how we'll interpret the external world" and not us saying "this is what the database looks like".

And in that case then, our view of what the enum is is probably more like "the enum is the internal symbolic representation of how we plan to consume database values" - i.e. we expect to map "red" to Color.RED from the database. Rather then reading the database and interpreting RED as "red".

Nobodies wrong - but you probably have your assumptions going into this (I know I did...but it compiled, it worked, and I never questioned it - and so long as I'm the sole owner, who cares right?)

The Problem¶

There are a few problems though with this interpretation. One is obvious: we're a simple, apparently safe refactor away from ruining our database schema and we might be aware of it. In the above, naive interpretation, changing Color.RED to Color.LEGACY_RED for example, is implying that RED is no longer a valid value in the database - which if we think of the enum as an application mapping to an external interface is something which might make sense.

This is the sort of change which crops up all the time. We know the string "red" is out there, hardcoded and compiled into a bunch of old systems so we can't just go and change a color name in the database. Or we're doing rolling deployments and we need consistency of values - or share the database or any number of other complex environment concerns. Either way: we want to avoid needlessly updating the database value - changing our code, but not an apparent variable constant - should be safe.

However we're not storing the data we think we are. We expected "red", "green" and "blue" and got "RED", "GREEN" and "BLUE". It's worth noting that the SQLAlchemy documentation leads you astray like this, since the second example showing using typing.Literal for the mapping uses the string assignments from the first (and neither shows a sample table result which makes it obvious on a quick read).

If we change a name in this enum, then the result is actually bad if we've used it anywhere - we stop being able to read models out of this table at all. So if we do the following:

In [6]:

class Color(Enum):
    LEGACY_RED = "red"
    GREEN = "green"
    BLUE = "blue"

Then try to read the models we've created, it won't work - in fact we can't read any part of that table anymore (this post is written as a Jupyter notebook so the redefinition below is needed to setup the SQLAlchemy model again)

In [8]:

class Base(DeclarativeBase):
    pass

class TestTable(Base):
    __tablename__ = "test_table"
    id: Mapped[int] = mapped_column(primary_key=True)
    value: Mapped[Color]

with Session(engine) as session:
    records = session.scalars(select(TestTable)).all()
    for record in records:
        print(record.value)

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/.local/lib/python3.10/site-packages/sqlalchemy/sql/sqltypes.py in _object_value_for_elem(self, elem)
   1608         try:
-> 1609             return self._object_lookup[elem]
   1610         except KeyError as err:

KeyError: 'RED'

The above exception was the direct cause of the following exception:

LookupError                               Traceback (most recent call last)
/tmp/ipykernel_69447/1820198460.py in <module>
      8 
      9 with Session(engine) as session:
---> 10     records = session.scalars(select(TestTable)).all()
     11     for record in records:
     12         print(record.value)

~/.local/lib/python3.10/site-packages/sqlalchemy/engine/result.py in all(self)
   1767 
   1768         """
-> 1769         return self._allrows()
   1770 
   1771     def __iter__(self) -> Iterator[_R]:

~/.local/lib/python3.10/site-packages/sqlalchemy/engine/result.py in _allrows(self)
    546         make_row = self._row_getter
    547 
--> 548         rows = self._fetchall_impl()
    549         made_rows: List[_InterimRowType[_R]]
    550         if make_row:

~/.local/lib/python3.10/site-packages/sqlalchemy/engine/result.py in _fetchall_impl(self)
   1674 
   1675     def _fetchall_impl(self) -> List[_InterimRowType[Row[Any]]]:
-> 1676         return self._real_result._fetchall_impl()
   1677 
   1678     def _fetchmany_impl(

~/.local/lib/python3.10/site-packages/sqlalchemy/engine/result.py in _fetchall_impl(self)
   2268             self._raise_hard_closed()
   2269         try:
-> 2270             return list(self.iterator)
   2271         finally:
   2272             self._soft_close()

~/.local/lib/python3.10/site-packages/sqlalchemy/orm/loading.py in chunks(size)
    217                     break
    218             else:
--> 219                 fetch = cursor._raw_all_rows()
    220 
    221             if single_entity:

~/.local/lib/python3.10/site-packages/sqlalchemy/engine/result.py in _raw_all_rows(self)
    539         assert make_row is not None
    540         rows = self._fetchall_impl()
--> 541         return [make_row(row) for row in rows]
    542 
    543     def _allrows(self) -> List[_R]:

~/.local/lib/python3.10/site-packages/sqlalchemy/engine/result.py in <listcomp>(.0)
    539         assert make_row is not None
    540         rows = self._fetchall_impl()
--> 541         return [make_row(row) for row in rows]
    542 
    543     def _allrows(self) -> List[_R]:

lib/sqlalchemy/cyextension/resultproxy.pyx in sqlalchemy.cyextension.resultproxy.BaseRow.__init__()

lib/sqlalchemy/cyextension/resultproxy.pyx in sqlalchemy.cyextension.resultproxy._apply_processors()

~/.local/lib/python3.10/site-packages/sqlalchemy/sql/sqltypes.py in process(value)
   1727                 value = parent_processor(value)
   1728 
-> 1729             value = self._object_value_for_elem(value)
   1730             return value
   1731 

~/.local/lib/python3.10/site-packages/sqlalchemy/sql/sqltypes.py in _object_value_for_elem(self, elem)
   1609             return self._object_lookup[elem]
   1610         except KeyError as err:
-> 1611             raise LookupError(
   1612                 "'%s' is not among the defined enum values. "
   1613                 "Enum name: %s. Possible values: %s"

LookupError: 'RED' is not among the defined enum values. Enum name: color. Possible values: LEGACY_RED, GREEN, BLUE

Even though we did a proper refactor, we can no longer read this table - in fact we can't even read part of it without using raw SQL and giving up on our models entirely. Obviously if we were writing an application, we've just broken all our queries - but not because we messed anything up, but because we thought we were making a code change when in reality we were making a data change.

This behavior also makes it pretty much impossible to handle externally managed schemas or existing schemas - we don't really want our enum to have to follow someone else's data scheme, even if they're well behaved.

Finally it also hightlights another danger we've walked into: what if we try to read this column, and there are values there we don't recognize? We would also get the same error - in this case, RED is unknown because we removed it. But if a new version of our application comes along and has inserted ORANGE then we'd also have the same problem - we've lost backwards and forwards compatibility, in a way which doesn't necessarily show up easily. There's just no easy way to deal with these LookupError validation problems when we're loading large chunks of models - they happen at the wrong part of the stack

The Solution¶

Doing the obvious thing here got us a working applications with a bunch of technical footguns - which is unfortunate, but it does work. There are plenty of situations where we'd never encounter these though - although many more where we might. So what should we do instead?

To get the behavior we expected when we used an enum we can do the following in our model definition:

In [11]:

class Base(DeclarativeBase):
    pass

class TestTable(Base):
    __tablename__ = "test_table"
    id: Mapped[int] = mapped_column(primary_key=True)
    value: Mapped[Color] = mapped_column(sqlalchemy.Enum(Color, values_callable=lambda t: [ str(item.value) for item in t ]))

Notice the values_callable parameter. The order returned here should be the order our enum returns (and it is - it's simply passed our Enum object) - and returns the list of values which should be assigned in the database for it. In this case we simply do a Python string conversion of the enum value (which will just return the literal string - but if you were doing something ill-advised like mixing in numbers, then this makes it sensible for the DB).

When we run this with a new database, we now see that we get what we expected in the underlying table:

In [13]:

engine = create_engine("sqlite://")

Base.metadata.create_all(engine)

with Session(engine) as session:
    # Create normal values
    for enum_item in Color:
        session.add(TestTable(value=enum_item))
    session.commit()

# Now try and read the values back
with Session(engine) as session:
    records = session.scalars(select(TestTable)).all()
    print("We restored the following values in code...")
    for record in records:
        print(record.value)

print("But the underlying table contains...")
with engine.connect() as conn:
    print(conn.execute(text("SELECT * FROM test_table;")).all())

We restored the following values in code...
Color.LEGACY_RED
Color.GREEN
Color.BLUE
But the underlying table contains...
[(1, 'red'), (2, 'green'), (3, 'blue')]

Perfect. Now if we're connecting to an external database, or a schema we don't control, everything works great. But what about when we have unknown values? What happens then? Well we haven't fixed that, but we're much less likely to encounter it by accident now. Of course it's worth noting, SQLAlchemy also doesn't validate the inputs we put into this model against the enum before we write it either. So if we do this, then we're back to it not working:

In [15]:

with Session(engine) as session:
    session.add(TestTable(value="reed"))
    session.commit()

In [16]:

# Now try and read the values back
with Session(engine) as session:
    records = session.scalars(select(TestTable)).all()
    print("We restored the following values in code...")
    for record in records:
        print(record.value)

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/.local/lib/python3.10/site-packages/sqlalchemy/sql/sqltypes.py in _object_value_for_elem(self, elem)
   1608         try:
-> 1609             return self._object_lookup[elem]
   1610         except KeyError as err:

KeyError: 'reed'

The above exception was the direct cause of the following exception:

LookupError                               Traceback (most recent call last)
/tmp/ipykernel_69447/3460624042.py in <module>
      1 # Now try and read the values back
      2 with Session(engine) as session:
----> 3     records = session.scalars(select(TestTable)).all()
      4     print("We restored the following values in code...")
      5     for record in records:

~/.local/lib/python3.10/site-packages/sqlalchemy/engine/result.py in all(self)
   1767 
   1768         """
-> 1769         return self._allrows()
   1770 
   1771     def __iter__(self) -> Iterator[_R]:

~/.local/lib/python3.10/site-packages/sqlalchemy/engine/result.py in _allrows(self)
    546         make_row = self._row_getter
    547 
--> 548         rows = self._fetchall_impl()
    549         made_rows: List[_InterimRowType[_R]]
    550         if make_row:

~/.local/lib/python3.10/site-packages/sqlalchemy/engine/result.py in _fetchall_impl(self)
   1674 
   1675     def _fetchall_impl(self) -> List[_InterimRowType[Row[Any]]]:
-> 1676         return self._real_result._fetchall_impl()
   1677 
   1678     def _fetchmany_impl(

~/.local/lib/python3.10/site-packages/sqlalchemy/engine/result.py in _fetchall_impl(self)
   2268             self._raise_hard_closed()
   2269         try:
-> 2270             return list(self.iterator)
   2271         finally:
   2272             self._soft_close()

~/.local/lib/python3.10/site-packages/sqlalchemy/orm/loading.py in chunks(size)
    217                     break
    218             else:
--> 219                 fetch = cursor._raw_all_rows()
    220 
    221             if single_entity:

~/.local/lib/python3.10/site-packages/sqlalchemy/engine/result.py in _raw_all_rows(self)
    539         assert make_row is not None
    540         rows = self._fetchall_impl()
--> 541         return [make_row(row) for row in rows]
    542 
    543     def _allrows(self) -> List[_R]:

~/.local/lib/python3.10/site-packages/sqlalchemy/engine/result.py in <listcomp>(.0)
    539         assert make_row is not None
    540         rows = self._fetchall_impl()
--> 541         return [make_row(row) for row in rows]
    542 
    543     def _allrows(self) -> List[_R]:

lib/sqlalchemy/cyextension/resultproxy.pyx in sqlalchemy.cyextension.resultproxy.BaseRow.__init__()

lib/sqlalchemy/cyextension/resultproxy.pyx in sqlalchemy.cyextension.resultproxy._apply_processors()

~/.local/lib/python3.10/site-packages/sqlalchemy/sql/sqltypes.py in process(value)
   1727                 value = parent_processor(value)
   1728 
-> 1729             value = self._object_value_for_elem(value)
   1730             return value
   1731 

~/.local/lib/python3.10/site-packages/sqlalchemy/sql/sqltypes.py in _object_value_for_elem(self, elem)
   1609             return self._object_lookup[elem]
   1610         except KeyError as err:
-> 1611             raise LookupError(
   1612                 "'%s' is not among the defined enum values. "
   1613                 "Enum name: %s. Possible values: %s"

LookupError: 'reed' is not among the defined enum values. Enum name: color. Possible values: red, green, blue

Broken again.

So how do we fix this?

Handling Unknown Values¶

All the cases we've seen of LookupErrors are essentially a problem that we have no unknown value handler - ultimately in all applications where the value could change - which I would argue should always be considered to be all of them - we in fact should have had an option which specified handling an unknown one.

At this point we need to subclass the SQLAlchemy Enum type, and specify that directly - which do like so:

In [25]:

import typing as t

class EnumWithUnknown(sqlalchemy.Enum):
    def __init__(self, *enums, **kw: t.Any):
        super().__init__(*enums, **kw)
        # SQLAlchemy sets the _adapted_from keyword argument sometimes, which contains a reference to the original type - but won't include
        # original keyword arguments, so we need to handle that here.
        self._unknown_value = kw["_adapted_from"]._unknown_value if "_adapted_from" in kw else kw.get("unknown_value",None)
        if self._unknown_value is None:
            raise ValueError("unknown_value should be a member of the enum")
    
    # This is the function which resolves the object for the DB value
    def _object_value_for_elem(self, elem):
        try:
            return self._object_lookup[elem]
        except LookupError:
            return self._unknown_value

And then we can use this type like follows:

In [26]:

class Color(Enum):
    UNKNOWN = "unknown"
    LEGACY_RED = "red"
    GREEN = "green"
    BLUE = "blue"

class Base(DeclarativeBase):
    pass

class TestTable(Base):
    __tablename__ = "test_table"
    id: Mapped[int] = mapped_column(primary_key=True)
    value: Mapped[Color] = mapped_column(EnumWithUnknown(Color, values_callable=lambda t: [ str(item.value) for item in t ], 
                                                         unknown_value=Color.UNKNOWN))

Let's run that against the database we just inserted reed into:

In [27]:

# Now try and read the values back
with Session(engine) as session:
    records = session.scalars(select(TestTable)).all()
    print("We restored the following values in code...")
    for record in records:
        print(record.value)

We restored the following values in code...
Color.LEGACY_RED
Color.GREEN
Color.BLUE
Color.UNKNOWN

And fixed! We obviously have changed our application logic, but this is now much safer and code which will work as we expect it too in all circumstances.

From a practical perspective we've had to expand our design space to assume indeterminate colors can exist - which might be awkward, but the trade-off is robustness: our application logic can now choose how it handles "unknown" - we could crash if we wanted, but we can also choose just to ignore those records we don't understand or display them as "unknown" and prevent user interaction or whatever else we want.

Discussion¶

This is an interesting case where in my opinion the "default" design isn't what you would want, but the logic for it is actually sound. SQLAlchemy models define databases - they are principally built on assuming you are describing the actual state of a database, with constraints provided by a database - i.e. in a database with first-class enumeration support, some of the tripwires here just wouldn't work without a schema upgrade.

Conversely, if you did a schema upgrade, your old applications still wouldn't know how to parse new values unless you did everything perfectly in lockstep - which in my experience isn't reality.

Basically it's an interesting case where everything is justifiably right, but leaves some design footguns lying around which might be a bit of a surprise (hence this post). The kicker for me is the effect on using session.scalar calls to return models - since unless we're querying more specifically, having unknown values we can't handle in tables leads to being unable to list any elements on the table ergonomically.

Conclusions¶

Think carefully before using automagic enum methods in SQLAlchemy. What you want to do now is likely subtly wrong, and while there's a simple and elegant way to use enum.Enum with SQLAlchemy, the magic will give you working code quickly but with potentially nasty problems from subtle bugs or data mismatches later.

Listings¶

The full listing for the code samples here can also be found here.

DHCP Fixed IPs and ESPHome

Will Rouesnel

2023-12-27 21:34

DHCP Fixed IPs and ESPHome¶

The Problem¶

My Home Assistant installation runs in Docker, and ESPHome runs in a separate docker container. I use a separate Wifi SSID for my random ESP devices to give them some isolation from my main network, so mDNS doesn't work.

ESPHome however, loves mDNS - to discover and install devices.

I've just bought a bunch of the Athom Smart Plugs, and want to rename some of their outputs to get sensible labels - as well as generally just manage them.

ESPHome's Config Files¶

ESPHome is actually very well documented but it can be hard to figure out what it's documenting sometimes, since there's a combination of device and environment information in it's YAML config files. This is fine - it's a matter of approach - ESPHome likes to think of your environment as a dynamic thing.

For our purposes the issue is we need to make sure ESPHome knows to connect to our devices at their DHCP fixed IP addresses - and to do this we need the wifi.use_address setting - documented here.

This setting is how we solve the problem: we're not going to set a static IP on the ESPHome device itself (since we're letting DHCP handle that via a static reserved - i.e. a fixed IP in Unifi where I'm actually doing this). Instead, we're just telling ESPHome how to contact this specific device at it's static IP (or DNS name, but I'm choosing not to trust those on my local networks for IOT stuff.)

Importantly: wifi.use_address isn't a setting which gets configured on the device. It's local to the ESPHome application - all it does is says "use this IP address to communicate with the device". i.e. you can have a device which currently has a totally different IP address to the one you're configuring, and as long as you set use_address to the current value it's on, ESPHome will update it. This is very useful if you're changing IP addresses around, or only have a DNS name or something.

The other important thing to note about this solution is that when you're not using mDNS, you're going to want to set the environment variable ESPHOME_DASHBOARD_USE_PING=1 on the ESPhome dashboard process. This simply tells the dash to use ICMP ping to determine device availability, rather then mDNS, to have your devices show up properly as Online (though it doesn't much affect usability if you don't).

The Solution¶

User Level¶

To implement this solution for each of my smart devices, I have a stack of YAML files which layer up to provide the necessary functionality following some conventions.

At the top-level is the "user" level - one specific device on the network. After it's booted and been initially joined to my IOT SSID, it gets a YAML file named after it that looks like this.

# sp-attic-ventilation.yaml
packages:
  athom.smart-plug-v2: !include .common.athom-smartplug-v2.yaml

esphome:
  name: "sp-attic-ventilation"
  friendly_name: "Attic Ventilation"
  name_add_mac_suffix: false

wifi:
  use_address: 192.168.210.66

There's not much here - just the IP address which I assigned, plus a name which is the same as the hostname I assigned which follows the nominal convention of <device-type-abbreviation>-<location>-<controlled device>. So smartplug - sp, located in the attic, controlling the ventilation. You don't have to do this - but it helps. Then we include the friendly name - this will appear in Home Assistant, and disable adding the MAC suffix - this is a handy default when you're installing and configuring multiple devices initially using fallback APs.

The important part here is to note the include file: ESPHome's web interface will automatically hide a file named secrets.yaml as well as any files prefixed with . which is a convenient way to manage templates and packages.

Device Common Files¶

The next step up in the stack is a device-common file. Athom Technology publish these on their Github account. This sort of thing is why I love Athom and ESPHome - because we can customize this to work how we want it too. The default smart plug listing is here, but we're going to customize it though not extensively - namely we're adding this line:

packages:
  home: !include .home.yaml

I've included my full listing here (note the removed "time" section).

The Home File¶

The Home file is the apex of my little ESPHome config stack. In short it's the definition of things which I want to be always true about ESP devices in my home. All of the settings here can be overridden in downstream files if needed, but it's how we get a very succinct config. There's not a lot here but it does capture the important stuff:

# Home-specific features
mdns:
  disabled: false

web_server:
  port: 80

# Common security parameters for all ESPHome devices.
wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password

  domain: !secret domain
  
  ap:
    password: !secret fallback_wifi_password

ota:
  password: !secret ota_password

time:
  - platform: sntp
    id: my_time
    timezone: Australia/Sydney
    servers:
    - !secret ntp_server1

This file extensively references into secrets.yaml, which is templated by my Ansible deployment playbook for ESPHome (which in turn uses my Keepass database for these values). It mostly sets up the critical things I always want on my smart devices: namely, the onboard HTTP server should always be available (life-saver for debugging and a fallback for control - every ESP chip I have seems to run it fine).

One of the crucial things I do is hard code the wifi parameters: the reason I do this is because for as many devices as possible I disable persistent storage to protect the ESP write flash. It's enabled for the smart plugs because they don't change state very often, but for something like a light controller it's a waste of flash cycles. But this does mean that if the wifi settings are configured via the fallback AP mode, they'll be lost if there's a power cut - and then all my devices will turn on AP mode and need to be reconfigured.

This is also the reason you definitely want to configure wifi.ap.password: because if your devices are unable to connected to your wifi (by default for 1 minute), or don't persist settings and are down, then the first thing they'll do (and out of the box Athom devices do this becaue obviously you need to configure them yourself) is open a public wifi network to let them be configured by just any random passer-by. The consequences of this range from someone having some fun toggling a button to someone implanting an advanced persistent threat.

For much the same reason, you should also configure an over-the-air password - ota.password. There's a difference between control of a device and being able to flash firmware, so this should be enforced. This value lives in my password manager, so I'll always have it around.

Beyond that there's just convenience: i.e. I force NTP to point to the Unifi router on my network so everyone has a common agreement on the definition of time.

Alternatives¶

Static IPs¶

ESPHome does have full support for static IPs via the wifi.manual_ip parameter. It would be entirely valid to take our wifi section from above and change it to look like this:

wifi:
  use_address: 192.168.210.66
  manual_ip:
    static_ip: 192.168.210.66
    subnet: 255.255.255.0
    gateway: 192.168.210.1
    dns1: 192.168.210.1

This device would work just fine on a network without DHCP - it would come up, grab an IP and be happy. The reason I don't do this is convenience of management: having the devices send DHCPDISCOVER packets is a nice way to make sure they're alive, and turns control of the isolated network segment they're on more over to my Unifi Router, which is what I want. If I want to re-ip a network, then updating static address allocations centrally is more convenient (you do have to coordinate rebooting the devices, but they will "get it").

You could obviously do all sorts of fancy scripting around this, but all of that is a lot of work for a very limited gain.

Enable mDNS¶

ESPHome uses mDNS extensively, and even with an isolated network you can make it work: my Home Assistant and ESPHome docker containers have IP addresses on that network segment so they can talk to these devices, and as a result they can also receive mDNS from them provided I configure it to be bridged properly.

The reason not to for me is ultimately just that keeping track of a list of IPs is simple: whereas mDNS in more complicated network arrangements like mine is not, and the complexity just isn't worth it - once configured, I never have to really think about these devices. I've lost my Unifi router config and just restored it from a backup and everything was fine. My configs are tracked in Git, my passwords in Keepass - rebuilding this environment is straightforward.

Conclusions¶

If you're trying to figure out how to flash an ESPDevice, you need to set wifi.use_address to the known IP of the device.

In an environment with DHCP Fixed IP addresses, this means you'll include this value in your ESPHome YAML config files, and it should match your static reservations.

A convenient way to do this is to layer your ESPHome YAML files, with your vendor/device-type files in the middle of the "stack".

Logitech G815 Review / Impressions

Will Rouesnel

2022-10-22 21:14

Logitech G815 Review / Impressions¶

I recently decided I wanted to upgrade my keyboard. I had two principle goals: the first was to find a production keyboard I could still buy. My former go to was the Logitech K740 (Logitech Illuminated Keyboard) which had been out of production for a very long time. The last time I tried to replace one I ended up buying about 3 keyboards off eBay before I suceeded in getting what I was actually after.

With that one now on the way out due to the key caps breaking off on frequently used keys like the backspace, and some suspected trouble with key registration it seemed like it was finally time to choose a new keyboard and adapt to it. The typing experience and it's ergonomics has become important to me, between age and profession, so it's a big decision.

Why a mechanical keyboard?¶

I've been curious to try a mechanical keyboard essentially due to hype, although there is some solid logic behind it. My K740s have failed due to the scissor-type plastic (nylon) mechanism failing, and once it goes there's nothing you can do. They also build up dust underneath the keys, but removing the key caps is not super-well supported - and I've lived with a very fiddly backspace for a while now, as well as some problems with key registration if I don't hit the larger keys (backspace, tab, enter) suitably dead-center.

To be clear: these are emergent problems - as new, the keyboards were solid but they failed in a predictable way.

So what I'm looking for by going with a mechanical keyboard is improved durability for key registration, and a nice typing experience. With the G815 I'm buying a gaming keyboard, but I'm buying it because I want good key registration for typing.

G815: First impressions - there's an ergonomics change¶

The K740 is a very thin keyboard, with a built in palm rest. It is 9.3mm thick - that is incredibly slender, and no mechanical keyboard is going to beat that. The G815/915 series is the thinnest mechanical keyboard on the market at 22mm thick, but that's still more then double. Up front: It's noticeable, my typing position was substantially changed.

The G815 doesn't come with a palm rest out of the box: people have said they don't think it needs it, I would disagree. The first thing I found myself doing was raising my arm rests to get my hands flat to the keyboard. It's what I'm doing while typing this review. I'll be buying a palm rest soon and updating this post when I do.

The G Keys¶

The bigger issue I found, which I did not see talked about before buying in the reviews and is probably universal to this type of gaming keyboard design is the addition of the G keys to the left hand side of the keyboard.

I did not realize this before I bought the keyboard because it's a habit I do without thinking about it, but I essentially use my left hand to find the top-left of the keyboard when typing with my pinky finger. On a regular keyboard, holding the top-left of the chassis like this works fine because it's pretty well lined up with escape and the top row of number keys.

The addition of the G keys however changes the ergonomics of this in a big way - my initial attempts at typing were frustrated and difficult because all my instincts about where the keys are were wrong: I'm so used to using that pinky to control where the top of the keyboard is that it was very difficult to adapt without it. If you are considering this keyboard, or any gaming style keyboard with extra left hand macro keys, you would be well advised to really check if this is something you're doing: it was a huge surprise to me, and the change in how I type is, as of writing (so about 45 minutes after unboxing it) still feeling rough. I'm expecting to adapt, but I'm also feeling a muscle strain in my left arm due to the new typing position so it's not an easy adaptation, and as noted above may involve more peripherals to get it comfortable.

I strongly encourage not underestimating this - this is a peripheral I use for 8 hours a day for my job. It's function and whether it causes muscle strain is vital.

The Key Action¶

Mechnical keyboards are all about the key action of th keyboard. I can't give any advice here: YouTube will show you people using it, how it sounds and tell you how it feels but it is something which needs to be experienced for yourself. I can say that despite my complaints about the additional G keys, and the fact it's not as thin as the K740, the "Linear" type key model fo the G815 feels great to type on when you're in the zone on it. The action is smooth, comfortable and feels solid - this is consistent with some other reviews which noted that the Linear key switches tended to feel the best after a little while of typing, and this I can believe.

Some very good advice when you get into reviewing keyboards and other "things you never think about" is that almost all of them can be criticized - perfect doesn't exist, and the criticisms always feel louder then the good points. The most I can add here is, if you can use one in person, then that's the best way to explore the space (this is an expensive keyboard, so just buying a whole lot of them - as I suspect gets most YouTubers into making YouTube videos about keyboards - is a danger).

Conclusions - we'll see¶

It's no fun getting a fairly expensive new thing and feeling "hmmm" about how well it works. The G keys might be the real problem here - that change in typing experience was a huge surprise to me, so if you find this review then that's my core take away: be wary of layout changes like that. There is a numpad-less variant of the G815 which can be had, but I like my media keys and numpad so that's why I bought the larger one. If you don't need or want a numpad, then I'd recommend that one at the present time - no G keys means no problems.

I'm hoping at the moment I'll adapt to the G keys: their potential utility is high (though you can't program them on Linux), but if I could buy a full-size variant without them tomorrow I'd do it and not bother with the adaptation.

But the keys feel great to use, so hence the conclusion: we'll see.

Conclusions Update (same day) - went back to the K740¶

This is probably a good gaming keyboard.

I say that because I'm sure the G keys are effective for gaming purposes. But for the way I type, which is not true touch typing, the presence of the G keys and the offset they introduce had two pronounced effects: (1) it was almost impossible for me to re-centre my typing of the keyboard when I moved my hands away without a pronounced and noticeable process of feeling out where the top-left edge of the keyboard is.

The problem of key-centering was replicable with my wife, who has much smaller hands, typing on the keyboard - she found the same subtle problem trying to line up, finding she inevitably ended up hitting the caps lock key when she did.

The second problem (2) was wrist strain: because the G keys are actual keys and live on the left hand side of the keyboard, my natural resting position for my left hand which is off to the side with my palm free introduced a great deal of strain to my left arm specifically. The pictures below of my hands sort of show the problem - on the top is my backup K740 and the bottom the G815:

K740 resting position G815 resting position

This is with my hands trying to rest in a ready position on the keyboard: you can see the problem - I'm having to actively support the left hand to stop it from depressing the G keys. In my experience put a strain through the tendon running right up my arm and was quite painful after a short amount of use. It is possible a wrist rest would help fix this problem, but I'm not wild about the prospect since it's not an included feature of the keyboard unlike the K740, and I also do not experience this problem using other normal thickness keyboards - this seems to be an issue specifically with how I hold my hands to type and the existence of the extra macro row.

Wrapping Up¶

None of the reviews I read or watched for this keyboard before buying it mentioned this possible issue with the full-size keyboard and G keys, though I do recall that most reviewers favore using TKL (ten key-less) variants of the keyboard for endurance typing - which notably does not have the G keys.

Please keep in mind that if you're reading this, this is all based on quirks of typing which may be specific to just how I hold my hands - I am not a touch typist, just a decently fast one from long practice and most of my typing is done using two-fingers on each hand. You may have a fundamentally different experience with this keyboard then I do.

But, I have seen no reviews of gaming keyboards with these extra macro keys in this position which commented on the possible issues in use that they may introduce - it was a huge surprise when I opened this, and significantly impactful in a very direct way.

Easy Ephemeral Virtual Machines with libvirt

Will Rouesnel

2022-08-21 21:11

The Situation¶

At a previous job I was finally fed up with docker containers: generally speaking I was always working to setup whole systems or test whole system stuff, and docker containers - even when suitable - don't look anything like a whole system.

While Vagrant does exist, there was always something slightly "off" about the feeling of using it - it did what you want, but had a lot of opinions on it.

So the question I asked myself was, what was I actually wanting to do?

What we want to do¶

Since this was a job specific issue, the thing I wanted to do was boot cloud-specific environments quickly in a way which would let me deploy the codebase as it ran in the cloud. The company had since simply moved to launching cloud VM instances for this on AWS, but ultimately this left holes in the experience - i.e. try getting access to the disk of a cloud VM - on my local machine I can just mount it directly, or dive in with wxHexEditor if I really want to - on the cloud I get to spend some time trying to security manage an instance into the right environment, attaching EBS volumes and...just a lot of not the current problem.

So: the problem I wanted to solve is, given a cloud-init compatible disk image, give myself a command line parameter which would provision and boot the machine with sensible defaults, and give me an SSH login for it that would just work.

The Solution¶

What I ended up pulling together to do this is called kvmboot and for me at least works pretty nicely. It has also accidentally become my repository for build recipes to get various flavors of Windows VMs kicked out in a non-annoying state as quickly as possible - the result of the job I took after the original inspiration.

The environment currently works on Ubuntu (what I'm running at home) and should work on Fedora (what I was running when I developed it - hence the SELinux workarounds in the repository).

What it is is pretty simple - launch-cloud-image is a large bash script which spits out an opinionated take on a reasonable libvirt. libvirt ships with a number of tools to accomplish things like this, but no real set of instructions to produce something as useful as I've found this customization - of course that might just be me.

Usage¶

The basic usage I have for it today is setting up Amazon AMI provisioning scripts. Amazong provide a downloadable version of Amazon Linux 2 for KVM, and launch-cloud-image makes using it very easy: -

kvmboot $ time ./launch-cloud-image --ram 2G --video amzn2-kvm-2.0.20210813.1-x86_64.xfs.gpt.qcow2 blogtest

xorriso 1.5.2 : RockRidge filesystem manipulator, libburnia project.

Drive current: -outdev '/tmp/lci.blogtest.userdata.3dQylgsKb.iso'
Media current: stdio file, overwriteable
Media status : is blank
Media summary: 0 sessions, 0 data blocks, 0 data, 51.0g free
xorriso : NOTE : -blank as_needed: no need for action detected
xorriso : WARNING : -volid text does not comply to ISO 9660 / ECMA 119 rules
xorriso : UPDATE :      12 files added in 1 seconds
Added to ISO image: directory '/'='/tmp/lci.blogtest.userdata.kq9RDblTKJ'
ISO image produced: 41 sectors
Written to medium : 192 sectors at LBA 32
Writing to '/tmp/lci.blogtest.userdata.3dQylgsKb.iso' completed successfully.

xorriso : NOTE : Re-assessing -outdev '/tmp/lci.blogtest.userdata.3dQylgsKb.iso'
xorriso : NOTE : Loading ISO image tree from LBA 0
xorriso : UPDATE :      12 nodes read in 1 seconds
Drive current: -dev '/tmp/lci.blogtest.userdata.3dQylgsKb.iso'
Media current: stdio file, overwriteable
Media status : is written , is appendable
Media summary: 1 session, 41 data blocks, 82.0k data, 51.0g free
Volume id    : 'config-2'
User Login: will
Root disk path: /home/will/.local/share/libvirt/images/lci.blogtest.root.qcow2
ISO file path: /home/will/.local/share/libvirt/images/lci.blogtest.userdata.3dQylgsKb.iso
Virtual machine created as: blogtest
blogtest.default.libvirt : will : aedeebootahnouD7Meig

real	0m16.764s
user	0m0.326s
sys	0m0.077s

16 seconds isn't bad from nothing to what I'd get an in EC2 VM - and since I have SSH access I can jump right into using Ansible or something else to provision that machine. Or just alias it so I can kick one up quickly to try silly things.

kvmboot $ ssh will@blogtest.default.libvirt

       __|  __|_  )
       _|  (     /   Amazon Linux 2 AMI
      ___|\___|___|

https://aws.amazon.com/amazon-linux-2/
19 package(s) needed for security, out of 59 available
Run "sudo yum update" to apply all updates.
[will@blogtest ~]$ # and then you try stuff here

What's nice is that this is absolutely standard libvirt. It appears in virt-manager, you can play around with it using all the standard virt-manager commands and management. It'll work with remote libvirtd's if you have them, but it's a super-convenient way to use a barebones VM environment - about as easy as doing docker run -it ubuntu bash or something similar, but with way more isolation.

But it also works for Windows!¶

This was the real joy of this solution: when I stumbled into a bunch of Windows provisioning, I'd never had a good solution. But it turns out launch-cloud-image (I should probably rename it kvmboot like the repo) actually works really well for this use case. By the addition of an installation mode, and some support scripting to build the automatic installation disk images, it can in fact support the whole lifecycle to go from "Windows ISO" to "cloud-initable Windows image" to "Windows workstation with all the cruft removed".

As a result the repository itself has grown a lot of my research into how to easily get usable Windows environments, but it does work and it works great - with Windows 10 we can automate the SSH installation and have it drop you straight into Powershell, ready for provisioning.

Conclusion¶

I use this script all the time. It's the fastest way I know to get VM environments up which look like the type of cloud instance machines you would be using in the public cloud, and the dnsmasq integration and naming makes them super easy to work with while being standard, boring libvirt - no magic.

Log OpenSSH public keys from failed logins

Will Rouesnel

2022-08-09 09:46

Problem¶

I setup an autossh dialback on a machine in the office and forgot to note down the public key.

While certainly not safe to do so, how hard could it really be to grab the public key from the machine with the fixed IP that's hitting my server every 3 seconds for the last 24 hours and give it a login (to be clear: a login to my reverseit tool which is only ever going to allow me to connect back to the other end if it is in fact the machine I think it is).

Solution¶

This StackOverflow solution looks like what I needed, only when I implemented it the keys I got back still didn't work.

The reason is because: you don't need to do it.

As of OpenSSH 8.9 in Ubuntu Jammy, debug level 2 will produce log messages that start with

debug2: userauth_pubkey: valid user will querying public key rsa-sha2-512 AAAAB3Nz....

and just give you the whole public key...almost.

The problem is OpenSSH log messages are truncated by default - if longer then 1024 characters to be precise, which modern public keys are longer than (when RSA - ECC would fit).

This is controlled by a #define in log.c:

#define MSGBUFSIZ 1024

Upping this to 8192 I recompiled and...it still didn't work.

Pasting the log lines I was getting into VS Code, I found that all of them were exactly 500 characters. That sounds like a format string to me, so some more spelunking and there it is - in log.c there's the do_log function with this line:

		openlog(progname, LOG_PID, log_facility);
		syslog(pri, "%.500s", fmtbuf);
		closelog();

I'm guessing this is to work with legacy syslog limited to about 512 byte messages. We're trying to log to journald so let's just increase that to 8192 and try it out.

debug2: userauth_pubkey: valid user will querying public key rsa-sha2-512 AAAAB3NzaC1yc2EAAAADAQABAAABgQCklLxvJWTabmkVDFpOyVUhKTynHtTGfL3ngRH41sdMoiIE7j5WWcA+zvJ2ZqXzH+b5qIAMwb13H4ZkXmu6HLidlaZ0T9VBkKGjUpeHDhJ4fd1p+uw9WTRisVV+Xmw9mjbpiR8+AGXnoNwIeX5tMukglAFwEIQ8GQtM8EV4tS36RWxZjOSoT5sQlAjYsgEzQ7PHXsH3hgM7dyIK1HXrr2XcwFZPCts2EhOyh4e0hyUsvm9Nix2Y7qlqhFA+nH4buuSNpJZ2LjNb9CmWo5bjiYvrRLnU0qJMuPXp0jJeV+LwGA+W/JMbsep9xoqSA6aEQvlRUQx5jRyaJZf9GKqGBNe+v55vEbaTb+PXBU4o7nVFGCygZj2fLrW475o7vZBXJJjdgW/rZ1Eh4G2/Aukz3kfrMiJynRQOc5sFHL1ogZhHEVDqViZVLAHA2aoMCYtrsBJ9BBr/r73bzs9HbsND1wqi5ejYSiODZwX0DGmWZD21OPAj/SDMPUap6Nt/tG7oqs0= [preauth]

Oh wow - there's a lot there! in fact there's the [preauth] tag at the end which is completely cut off normally.

Full Patch¶

patch
diff --git a/log.c b/log.c
index bdc4b6515..09474e23a 100644
--- a/log.c
+++ b/log.c
@@ -325,7 +325,7 @@ log_redirect_stderr_to(const char *logfile)
 	log_stderr_fd = fd;
 }
 
-#define MSGBUFSIZ 1024
+#define MSGBUFSIZ 8192
 
 void
 set_log_handler(log_handler_fn *handler, void *ctx)
@@ -417,7 +417,7 @@ do_log(LogLevel level, int force, const char *suffix, const char *fmt,
 		closelog_r(&sdata);
 #else
 		openlog(progname, LOG_PID, log_facility);
-		syslog(pri, "%.500s", fmtbuf);
+		syslog(pri, "%.8192s", fmtbuf);
 		closelog();
 #endif
 	}
--

Use git apply in the working tree of the OpenSSH, which I recommend editing with dgit.

Conclusions¶

OpenSSH does log offered public keys, at DEBUG2 level. But on any standard Ubuntu install, you will not get enough text to see them.

The giveaway for, at least these logs being truncated is whether you can see [preauth] after them. This behavior is kind of silly (and should be configurable) - ideally though we would at least get a ... or <truncated> message when this is happening because with variable length fields like public keys it is not obvious.

Jipi and the Paranoid Chip

wrouesnel

2022-06-20 15:21

This is a short story by Neil Stephenson which used to be hosted online here. It's outlined more in the wikipedia article here and I've been wanting to read it again due to the recent furor surrounding Google's LaMDA (Is Google’s LaMDA conscious? A philosopher’s view).

But alas! The original hosting returns a 404 now: fortunately the Google cached version is still available and I've downloaded that and made it part of my private collection.

So: to ensure this stays up I'm also including the cached copy as a part of this blog. It goes without saying that all rights to this story belong to the original author.

Click here to read Jipi and the Paranoid Chip (or any of the links above).

Install Firefox as a deb on Ubuntu 22.04

Will Rouesnel

2022-05-04 11:38

Introduction¶

Ubuntu 22.04 removes a native Firefox package in favor of a snap package. I'm sure this has advantages.

But the reality for me was several fold: startup times were noticeably slower, and the selenium geckodriver just plain didn't work for me (issue here), with some debate online but no canonical solution. I also couldn't get Jupyterlab to autolaunch (minor, but annoying).

Solution below reproduced from https://balintreczey.hu/blog/firefox-on-ubuntu-22-04-from-deb-not-from-snap/ with adaptations which worked for me.

Solution¶

You can still install Firefox as a native deb from the Mozilla team PPA. The process which worked for me was:

Step 1¶

Add the (Ubuntu) Mozilla team PPA to your list of software sources by running the following command in the same Terminal window:

sudo add-apt-repository ppa:mozillateam/ppa

Step 2¶

Pin the Firefox package

echo '
Package: *
Pin: release o=LP-PPA-mozillateam
Pin-Priority: 1001
' | sudo tee /etc/apt/preferences.d/mozilla-firefox

Step 3¶

Ensure upgrades will work automatically

echo 'Unattended-Upgrade::Allowed-Origins:: "LP-PPA-mozillateam:${distro_codename}";' | sudo tee /etc/apt/apt.conf.d/51unattended-upgrades-firefox

Step 4¶

Install firefox (this will warn of a downgade - ignore it)

sudo apt install firefox

Step 5¶

Remove the Firefox snap

sudo snap remove firefox

Conclusion¶

This worked for me - Firefox starts, my existing Selenium scripts work.