I have a project ( Home Automation Upgrade (2022)) that requires a virtual machine with booting by EFI (Extensible Firmware Interface), and as part of this campaign I needed to upgrade my procedures for creating virtual machines in general. In particular I wanted to burst into the new millennium with a btrfs (binary tree filesystem) for the root, which has been the default for OpenSuSE for several years and which has a lot of technical advantages.
Of course I knew that all this would be a big learning experience, so
instead of delaying the upgrade by making the btrfs root a prerequisite for
running Home Assistant, I used the Debian x86_64 image for Home Assistant. It's
working well and has given no trouble. But the OS and kernel are not going to
get updated, just the Home Assistant software, and we don't get root access to
the OS, similar to the situation with commercial home automation appliances.
[Update: the OS and kernel do get updated, removing my biggest
objection to using the H-A OS image.] All my other machines run OpenSuSE
Tumbleweed and my system administration software is configured for that distro,
so I continued the sub-project to put OpenSuSE's package of Home Assistant
Supervised
on a second VM running Tumbleweed.
Dramatis personae: this subprojet involves five hosts:
Orion is the virtual machine that is going to get Home Assistant Supervised. It is my first machine with a BTRFS filesystem, and is the third with EFI booting (the first VM with EFI). It is currently operational and is awaiting installation of Home Assistant Supervised.
[Update: I discovered something nasty while putting together all the pieces of the puzzle. H-A Core lives in a Docker instance, for which they want the overlay2 union filesystem. But the Docker docs warn that overlay2 is bad mojo on any underlying filesystem that has copy-on-write, specifically btrfs; it is typically used over ext4. Docker does have a storage driver specifically for btrfs, but this isn't the storage driver that H-A instructions specify. I see a big can of big worms opening, discussed more at the end of this document.]
Dragon currently is running the Home Assistant OS image, but when ready, Orion will be renamed to Dragon. Dragon has EFI, but I actually don't know what it's using as a root filesystem.
Ocelot is a disused VM for a project that didn't work out. But I saved its identity, IP addresses, and host keys. I'm temporarily resurrecting it for this demo.
Jacinth is the host of Dragon and Orion. It's an Intel NUC6CAYH with a Celeron J3455 CPU. Its main jobs are the main router and the directory service master. It runs 24/7, and the low power of the Celeron is appreciated. And it's not that slow, though others of my machines are faster. My IoT devices use Z-Wave to communicate, via a proprietary USB dongle (radio) called a Z-Stick, which needs to be passed through to Dragon. But to avoid taking it away from Dragon while I try to set up Orion, I provided a flash memory stick that can be passed through to Orion, on which I deposited my SSH public key, which the Tumbleweed installer will offer to copy to become /root/.ssh/authorized_keys, helpful for post-installation setup.
Iris is the host of Ocelot. It's identical to Jacinth, effectively a hot spare, but its main jobs are backup storage, distro installation master, and audio-visual. As on Jacinth, I provided and passed through a flash memory stick with my SSH public key. Fakeout: it had an old rescue disc on it, and when Iris woke from Linux hibernation, it rebooted into the rescue disc. Wipe and reformat the media if needed.
Here's what I did to get OpenSuSE Tumbleweed (with jimc package selection, admin scripts, etc) onto a KVM-qemu virtual machine with EFI booting and a btrfs root filesystem. The reader should keep in mind that, while documenting this installation, my goal is not only to share commonly applicable points with the general community, but also to aid my memory in reproducing my idiosyncratic preferred features on future virtual machines. This to-do list is the union of quite a number of false starts and repair interventions. It was done first on Orion, and again on Ocelot to verify that the procedure was working reliably and/or to shake out remaining bugs. The procedure is described in terms of Ocelot.
This is all happening on the VM's host, Jacinth
hosting Orion, or Iris hosting Ocelot. On my machines that host VMs I have
a partition with a lot of free space, called /s1, a directory /s1/kvm, and
within that, a home directory
for each guest.
ocelot. The VM with the Home Assistant image is called
dragon, and in a very few places this name appears on Ocelot, to imitate the procedure on Orion, where I wanted to avoid forgetting to rename things when Orion is eventually renamed to Dragon.
Createthe Installation DVD
I knew that I would need to repeat these steps several times (ended up 8 times), so I downloaded the OpenSuSE Tumbleweed installation DVD. Normally the ISO file would go on the VM host, but I've found it convenient to put one copy on my installation server, and let all the hosts get it by NFS. How I use it:
ln -s /net/distro/s1/SuSE/SuSE-build/x86_64/99.8/iso/openSUSE-Tumbleweed-DVD-x86_64-current_j.iso ocelot-cd.iso
I'm creating new XML from scratch using virt-manager, rather than copying an old VM's XML, to get the latest best practices and to leave behind accumulated cruft. Then I will compare the new and old XML, and I'll bring forward a few important details.
Prerequisite: you will need to install SuSE package qemu-hw-usb-host to pass the Z-Stick through to the guest, as well as to create the XML to do the pass-through. Do that now if not yet installed.
Prerequisite: Make sure the installation disc can be read. If NFS
mounting is flaky, virt-manager will hang until it comes back (i.e.
forever) when writing out the XML, and you can't kill it. file
has to actually read the file, so use this command. Do it in the
background so if it hangs you can just ignore it. -L means to follow
the symbolic links.
file -L ocelot-cd.iso &
Start up the virt-manager GUI (no command line arguments).
Create a new VM; use the icon or the file menu. It will pop up a sequence of windows to gather information.
In the first pop-up, it wants to know where the OS installation media is. Click Local Install Media. The architecture should be x86_64. Click Forward.
Use the file browser to find the link to the ISO image (ocelot-cd.iso).
Click on Browse Local to see outside the default directory of
/var/lib/libvirt/images which is too small with my partition layout.
Specify the OS type. There's a checkbox for automatically detect
from the installation media
, but it couldn't figure it out; turn
that off. Type openSUSE Tumbleweed; just type the first few letters
and a match list will be shown that you can pick from. Click Forward.
Give it 2048 MiB of memory and 2 CPUs, my usual resources for this kind of VM. Click Forward.
Enable storage
, turn it on. Pick Select Custom Storage. Click
on Manage to get a file browser. Go direct to Ocelot's homedir; look
in the sidebar. Click on ocelot-disc1.raw, then Choose Volume. Click
Forward in the containing popup.
Ready to begin installation.
Change the VM's name (to
ocelot
). Click on Customize configuration before install
.
I always use bridge networking and the host already has a bridge set
up: br0. Under Network Selection (click on the headline), pick Bridge
Device and fill in the device name br0.
Now we're on the customization page.
TianoCoreEFI booter, but they have different certificates installed, used when verifying boot images, The ones including the string
mshave Microsoft signing certs,
opensuseis for Tumbleweed and Leap, and
susehas SLES certs. The ones without a vendor name have no certs and will fail Secure Boot unless you import your own cert (or turn it off). Since this VM will run Tumbleweed, I picked
UEFI x86_64: /usr/share/qemu/ovmf-x86_64-opensuse-code.bin.
unknown; DHCP will reveal it later.
lsusbon the host to discover the vendor and product identifiers (4 hex digits each) for both the flash stick and the Z-Stick.
Steps that virt-manager does autonomously:
The boot process:
Installationso you can take notes.
Installationand press Return.
Early installation steps:
Here we reconcile SuSE and local policies about storage, and develop new policies for btrfs.
It would be very convenient if Guided Setup could be used. But they propose to give 0.5Gb to the EFI partition. I don't know what they're planning to stuff into 0.5GiB, but that's way overkill. Debian, Raspbian, Ubuntu and Gentoo put the kernel and initrd into the EFI partition, and I found that 64MiB was enough for one kernel at a time (Gentoo on Raspberry Pi). Tumbleweed puts the kernel in /boot which, if not a separate partition, is in the root, and only actual EFI stuff is in EFI: currently just over 10MiB. On a machine dual-booting Windows this went up to 20MiB. Normally I generously allow 64MiB. But you can't shrink partitions like that and transfer the space to other partitions; you have to delete all the partitions and do them over. In the Expert Partitioner.
Start up Expert Partitioner. Start with Existing Partitions (there aren't any, yet). Select /dev/vda. By default they put the partition table in a GPT (GUID Partition Table), vs. MBR (Master Boot Record). Make sure you're getting a GPT; it's technically much superior.
This procedure will be repeated for the EFI, swap and root partitions, in that order. After selecting /dev/vda or one of its partitions, click on Add Partition. Go through these pages (illustrated for EFI):
guaranteedunique, the resulting 256 bit (32 hex) integer is unreadable and isn't recognized as a name by humans. (Click on OK, and Next on the containing page.)
Set Hostname via DHCP, change to
No. I want to rename the interface from enp1s0 to en0, but don't try it here; it got confused once and killed the net, preventing installation. Click Next.
This particular instance was created just to verify and collect procedures for the initial installation, so I'm not going to go through the full post_jump procedure (install my wanted packages, (re)enable services, remove unwanted packages, online update, install my configuration files). But there are a few items needed before post_jump that should be documented.
Will it let me log in to the console as root? Yes.
What SSH host keys is it using? The wrong ones, created randomly. How about SSL host keys? It has none, yet.
In each VM's homedir I made a directory called ./config into which I could drop host keys and similar important items, and just rsync the whole directory onto the virtual machine.
To get SSH and SSL right on Ocelot, I did these steps:
Additional files that are good to have in ./config/ ; most are copied from the host.
How to create the host keys and SSHFP records in ./etc/ssh :
Update jacinth: /home/hostdata/hostdata.db uncommenting the stanza for Orion. In its hostgroup line change it from grub_noefi to grub_efi, add btrfs, and add jacinth_vms. To Jacinth's hostgroups add orion_vhost. It should already have dragon_vhost. Add Orion's SSHFP records, just hash type 2 (the long ones).
Install the derived files. Copy changed ones (non-DNS like /etc/hosts) into the post_jump storage area.
An important item that gave me a lot of grief: in
/etc/default/grub you need:
SUSE_BTRFS_SNAPSHOT_BOOTING="true"
See this forum post about the issue. Basically, grub2-mkconfig runs all (or most?) filenames through a function make_system_path_relative_to_its_root which chops off the mount point of the partition containing the file. But on btrfs filesystems only, it prepends the subvolume path (rooted at /@ on SuSE). Grub at boot time cannot read such a path, with the result that after you rebuild /boot/grub2/grub.cfg (e.g. after getting a new kernel), the machine cannot boot. Except, if /etc/default/grub contains SUSE_BTRFS_SNAPSHOT_BOOTING="true", the -r option is added to the call to grub2-mkrelpath (from btrfsprogs package), which makes it behave like on other filesystem types, i.e. no prepended subvolume path.
SuSE includes SUSE_BTRFS_SNAPSHOT_BOOTING="true" by default, but I turned it off because I had not turned on snapshots. Bad move. Now that the option is true again, bootable grub.cfg is being produced again, including on hosts with ext4 root filesystems.
Before monkeying with Ocelot any more, I'm powering it off and making
a backup copy (took only 2 or 3 mins with this minimal content). Don't
try this with the VM running; if you restore it the result will be
corrupt.
pigz -c ocelot-disc1.raw > ocelot-disc1.pristine
Also we no longer need the DVD, so I renamed it to ocelot-cd.iso.nfs
and created a zero-length ISO image
to replace it. The booter
won't treat it as removable and removed, but just gives a one-line
error message and boots /dev/vda. I can bring the DVD
back if I need to use it as a rescue disc (see the More
option on the initial screen).
I want the domain XML in the home directory where I can get at it.
The domain XML is hiding in /etc/libvirt/qemu/ocelot.xml , and one
could copy it from there, but the straight-arrow way to extract it
is:
virsh dumpxml ocelot > ocelot.xml
I also want my NVRAM in the homedir. It's stored in /var/lib/libvirt/qemu/nvram/ocelot_VARS.fd ; copy that file (217 bytes) into the homedir. Then edit the XML to point to the new containing directory; look for the <nvram> element early in the file.
To install the altered XML file, just do:
virsh define ./ocelot.xml
This assumes that you didn't alter the VM's name or UUID.
If you did so, e.g. if you're cloning one machine as the basis for
another, be sure to give it a unique name, and execute uuidgen (command
line options are not necessary) to
print out a new UUID, and paste it into the file. If the VM was defined
already with a different UUID, before defining it you need to do:
virsh undefine --nvram ocelot
The --nvram option gives it permission to toss the machine's
nvram; without this, virsh will refuse to undefine.
I have a script that runs virt-viewer. The command line that it execs,
if my laptop is the host, is:
virt-viewer -c qemu:///system -w -r $guest
Or in the more common case that the host is remote:
virt-viewer -c qemu+ssh://root@$host/system -w -r $guest
To see the early boot process, start virt-viewer before booting
the guest. Or you can start it any time later.
On Iris (the host); virsh start ocelot
It boots right up, no hassle.