Upgrading to OpenSuSE Tumbleweed

James F. Carter <jimc@jfcarter.net>, 2017-05-23

OpenSuSE Leap 42.1, which is what I'm running, just died, which is what prompts this upgrade. See SuSE's version lifetime page. 42.2 is expected to last until second quarter 2018, i.e. 1 year more, and I think it's been out about 6 months. 42.3 is expected to come out at the end of July 2017, i.e. in two months.

The Lifetime page says that a major release, i.e. the whole 42.x series, should last 3 years, but minor releases like 42.1 should appear annually and support (i.e. bugfixes and security) will continue for 6 months after the next version comes out. This schedule is very inconvenient for me. I'm going to try to switch to the Tumbleweed rolling distro.

Introduction
Infrastructure Setup
Initial Installation
Template for an Upgrade
Overcoming Problems
Upgrading Production Machines
Remaining Problems
- Fixed Problems

Introduction

Goals, Issues, and Actions

I always begin a project by stating its goals, issues likely to be encountered, and then the actions to be taken. I have a lot of experience upgrading SuSE distros from 6.4 to 13.1 and then Leap 42.1, but these have been fixed, non-rolling distros. Tumbleweed is going to be different in some ways. But I expect that a lot of steps for the fixed distros will carry over to Tumbleweed.

Goals

To get the OpenSuSE Tumbleweed distro on all the machines with the minimum fuss, labor and disruption.
To spread out future upgrades over the whole year, piecemeal, rather than having to do a big upgrade campaign on someone else's schedule.
To add new features that will be useful in our context.
To preserve our unique setup parameters and administrative methods.
To simplify our lives by junking cruft and special cases that are no longer useful.

Issues

Tumbleweed is going to need some changes in administrative practices. In particular, there is no longer an update repo (new packages are added directly to the main distro), so I will need to do zypper dist-upgrade rather than zypper patch. And I will need to retrieve and save the updated RPMs rather than the new patch RPMs. (Update: there is an update repo. Urgent patches go there but are moved to the main repo after system integration tests are finished. At present it has one package, vs. 15651 packages in update/42.1.)
At home I only have x86_64 CPUs. i586 was retired several years ago. Plans for several kinds of ARM machine fell through. Even so, I want to leave a path for ARM support. At work the demise of i586 was anticipated, and either the remnant i586's have been replaced or they have a known upgrade path that can be executed as part of this project.
We've been over-conservative in adopting new features. We need to aggressively identify what's new with this distro version, and get good ones installed (and garbage omitted).

Actions

Here's an overview of installing a new distro:

Infrastructure: Download and emplace the new distro in the enterprise mirror.
Virtual machine: Upgrade the VM up to the minute with the old distro. Save an image to revert to. Upgrade it. Test screwups. Revert to the old image. Repeat.
Package selection: Update /m1/custom/mathnet.sel according to our preferred package selection. We won't know all the package problems until we try to install on the virtual machine. Adjustments to mathnet.sel will be frequent while we prepare for the upgrade.
SuSE Build Service and PackMan: Packages (and their dependencies) that aren't in the main distro will have to be downloaded. Fortunately we have gotten rid of most locally compiled packages.
Configuration files: Do a giant diff between what's on the virtual machine and what Mathnet installs on the previous distro. Bring forward local hacks (unless they deserve to be tossed). Make sure they work in the new distro version.
No hands installer (instsetup): Create an installer image for the new version and make sure it can really upgrade a machine without intervention from the sysop. I've found that it's sufficient to do a dist-upgrade (which did not always work on older versions), and I may bypass instsetup this time around.
Upgrade the production machines. Try to detect screwups early.

Maintaining the Old Static Distro

Presently for the static distros I have this basic infrastructure:

There is an enterprise mirror on distro.cft.ca.us (Diamond). It has an instance (mounted ISO image) of the static main distro, plus an update area which is refreshed weekly. There are also two more repo areas, one for packages obtained from the SuSE Build Service and one for non-distro packages, e.g. PackMan, or locally compiled hacks.
/m1/custom/couchnet.sel and extra.sel have lists of keystone (mostly) packages that are supposed to be on the machine.
The script audit-pkgs has three sub-functions: -i means to install any missing packages from couchnet.sel. -e means to remove any package that is not on the list or is not a dependency of a listed package. -u means to install patches, bringing in new versions of packages that are already installed.
audit-pkgs -u is executed weekly on each host, after patches have been downloaded from SuSE's repo server.
distro.cft.ca.us has a library of configuration file master copies, with an instance for each OS version, and with the ability to restrict to subsets of the hosts. The script sync_jump is used to copy modified files into this library. pushconfig sends the files out to the various hosts, weekly, according to their OS version. The term post-jump is used for this process, carried forward from Sun Microsystems' Solaris installation.
/m1/custom/conffiles on the target contains specially modified configuration files specific to the target, which pushconfig refrains from overwriting.
For installation, the script post_jump executes on the master site with remote execution on the target. It has these phases:
- Consistency checks: does the target have the expected distro version, and is it in the right hostgroups?
- Emplace or refresh the remote execution keys, the repo definitions, and critical infrastructure packages.
- Copy configuration files and admin scripts from the library.
- audit-pkgs -i: Install wanted packages that are missing. In general it is better to install a minimal system and let audit-pkgs -i install desktop and server packages, because at this point zypper has access to the update repo.
- audit-pkgs -e: Remove unwanted packages.
- audit-pkgs -u: Update back-version packages to the latest version.
- audit-scripts enables wanted services and disables unwanted ones.
- Miscellaneous special cases in configuration, like TeXlive font indices, and the Kerberos host keys.
- Run daily housekeeping for the first time.
To install the distro on a new machine I generally either use the network installer or the distro ISO (on USB, usually). I install a minimal system since I've had inconsistent results trying to get the installer to use the update repo. Then I run post_jump.
To upgrade an existing machine to a new distro version, I have two strategies:
- I've found that zypper dist-upgrade (with the new repo definitions but running on the old kernel and utilities) is consistently successful. This was not always true in the past.
- I have a script instsetup which puts a chroot environment on the target, containing a minimal OS of the new version plus the SuSE installer. You boot the new installer kernel and execute the installation script that instsetup customized for the machine. It does zypper dist-upgrade amid various infrastructure steps. This strategy will work in non-upward-compatible situations like when they changed the RPM file compression algo and when I'm changing architecture from i586 to x86_64.
- In both cases I run post_jump afterward to get the updated configuration files and the new package selection.
There is a collection of about 65 functional tests of various services such as sshd, apache2 and cron. Not every service is on every machine; typically 50 services are eligible for checking (and 30 more that get generic checks). I normally run the script checkout.sh after an upgrade, which runs these tests. Also a script does the tests every three hours and restarts (most) services that have died or that fail the functional test.

Changes for Tumbleweed

For using Tumbleweed I want to continue as much as possible with these admin scripts and procedures. I envision these changes:

In a rolling distro they just add/replace new versions of packages in the main distro directory, and I will need to add/delete such packages instead of adding to the update repo. (Tumbleweed does have an update repo but it has only urgent patches that will soon be moved into the main repo.)
On the other hand, I could consider having a complete mirror of the main distro. The disadvantage of the complete mirror is its size and its frequent updates. SuSE warns that the complete content including Tumbleweed occupies 2.95Tb. It should be possible to mirror just Tumbleweed, 40.2Gb. HTTP URL: http://rsync.opensuse.org/tumbleweed/repo/oss/ The following is written assuming I'm storing only the packages I need, not making a complete mirror.
There is a Tumbleweed DVD ISO (3.9Gb) that appears to be updated every day or two. Also the net installer (0.11Gb) and rescue CD (0.65Gb).
Nasty little tidbit: the net installer downloads the installation system, which knows which product it is a part of. This means that I can install from download.opensuse.org, or from a local copy of the DVD (local HTTP), but I can't mix them up or install from a local subset of the distro.
But there's a workaround: when starting the installer put kexec=1 on the kernel command line. It will download the kernel from the installation media to be used, e.g. the local repo, and reboot using kexec.
I need to have the SuSE repo active at all times, so the installer knows about updated packages. See below in New Repo Definitions about priorities.
I also need to configure the lifetime of the package index from SuSE. This is enormous, and likely it will take a long time for each client to download it and rebuild the index of all 39,000 packages. The packages.gz file is 9.0Mb, compared to 1.5Mb for the 42.1 DVD, which has only a subset of the packages.
A web proxy (Squid) could considerably speed up downloading packages and metadata from the SuSE server. However, I should rely on Squid for intra-day downloads, but on cowboy programming, i.e. the local mirror, for long-term storage of RPM files.
For weekly updates, do I have to start the process and then wait for a slow download? How could I do this overnight, similar to the way I refresh the fixed distro's update area? I'll download the Tumbleweed repo metadata (packages.gz), then compare with the metadata of files in the local mirror. If there is an updated version I can download it on the spot. It should be possible to use the requires and provides metadata to recognize when the updated package requires new packages that we need to download. Update: this plan was canceled, too complicated. New versions on the master of installed packages are downloaded without regard to possible new dependencies. When the new version is installed, the new dependencies are downloaded at that time, and Squid will speed that up.
When packages are added to or purged from the local mirrors, their metadata has to be reindexed. I wish this could be done overnight, but I haven't come up with a secure way, so a report will be generated, and reindexing will be part of the later update/installation process.
I still need the SuSE-build local mirror, for packages obtained from developer homedirs or special collections, but presently most of the content is main distro material that isn't on the DVD, and these will migrate to the main distro's local mirror.
I should turn on keep-packages for the SuSE repo. Then after an update/upgrade I will retrieve the downloaded packages (if any), save them in the local mirror, and rebuild its index. The overnight update script is supposed to prevent this from happening, but inevitably I'll need to cut corners causing missed packages.

I need to create or modify these scripts and/or services:

Set up the Squid web proxy. [Done and working.]
- This is squid-3.3.14.
- 8.3Mb installed.
- Uses /usr/sbin/pinger (its own program) which will override my program of the same name.
- Systemd unit file in /usr/lib/systemd/system/squid.service
- Operational configuration in /etc/squid/squid.conf .
  - cache_dir set to /s1/squid-cache (750 squid:root).
  - Max cache size set to 1Gb (approx). After I upgraded several machines, it's up to only 72Mb.
  - Max size of an individual file is 200Mb. The largest file in the v42.1 distro is 158Mb for images/common-base-x86_64.tar.xz .
  - Using port 3128 which is the default. (Needs firewall hole. Done.)
  - Access control is basically, anyone who can talk to Diamond can use the proxy, but only for filenames with Tumbleweed or OpenSuSE (case insensitive) in the path name.
- In the .repo files you will need something like:
  http://download.opensuse.org/distribution/13.1/repo/oss/?proxy=distro.cft.ca.us&proxyport=3128
  It turns out that you only need ?proxy=http://distro.cft.ca.us:3128 i.e. a complete URL including the scheme and port.
- Needs a functest module. [Done]
We need to change the repo definitions to use the proxy. [Done.] And maybe it's time to make the pathnames a little more logical. [Guess what, I already changed the HTTP and DIR paths for the main distro to be just SuSE, not SuSE-dir.]
audit-repos, when installing repo definitions on the master site, needs to use the local disc rather than HTTP URLs. This way, problems are avoided when you update Apache. Change http://distro.cft.ca.us/SuSE/etc to dir:/s1/SuSE/SuSE-distro/etc (only one initial slash). [Done.]
commrepo (new script) reads the repo metadata files of the SuSE Tumbleweed repo and our local mirror. It emits one list of relative filenames of packages that have been updated on SuSE, plus required packages that are not locally available. It emits another list of obsolete local packages. We should have a policy of how many versions to retain, probably just two. The script should be able to take a list of keystone packages, and to include in the obsolete list all locally stored packages that are not required by any keystone.
commrepo needs a wrapper that downloads the SuSE metadata file, execs commrepo, downloads the listed updated packages, tosses obsolete packages, and rebuilds the local repo index.
I have commrepo mostly written, but I'm getting nervous about security patches, and about how long it's going to take to debug this thing, so I'm deferring completion. I will rely on manual steps and simple scripts instead.
Instead of commrepo, I'm going to have a new script called updaterepo. It will basically be an improved rsyncsuse. See previous discussion of the rsync features appropriate for updating the local mirrors. It will dynamically match up local repos and off-site masters. [Done.]
mksuserepo (existing) needs some way to sign the repo index without human supervision. PROBLEM, not done yet.
audit-pkgs (existing) needs a new mode to do a dist-upgrade. [Done.]
It also needs to refuse to update any of the Apache or Squid components unless given a special flag. If all the hosts are downloading packages from the local mirror, and Apache goes out of service on the master, chaos ensues. The right procedure is, update the master with the Apache flag, wait for it to finish, then run audit-pkgs on the other hosts. The way it will probably work out is, you update all the hosts at once, the one on the master site gives an error message, and you do that one over with the Apache flag set. PROBLEM, not done yet.
retrieve-pkgs (new script), executing on the master, will recover downloaded packages from the target and then purge them. And it rebuilds the mirror index if anything was downloaded. [Done.]
distupgrade (new script) is kind of a mini post_jump that will do much of the post_jump activities, suitable for weekly updates. PROBLEM, not done yet.

Here's an important detail: what version number do I use for Tumbleweed? In some contexts SuSE uses tw, but my scripts in some cases assume a numerical version number. I'm going to use 99.8.

Infrastructure Setup

Following along in my writeup on Upgrading to OpenSuSE Leap 42.1:

The New Repo Definitions

I need a lot more repos for Tumbleweed than for v42.1. All will need new repo template files except SBS (SuSE Build Service) and CouchNet. Except for Couch-SuSE-tw, the local mirrors contain only installed packages.

Couch-SuSE-tw: main OSS repo. A recent snapshot DVD ISO image is mounted here. It contains only a subset (about 15%) of the main repo packages.
download.opensuse.org-oss: The official main OSS repo.
Couch-Snoss-tw: Local mirror of the main non-OSS repo.
download.opensuse.org-non-oss: The official main non-OSS repo.
download.opensuse.org-twupd: Update repo for emergency patches, both OSS and non-OSS. The name is unofficial.
Couch-Update-tw: Update repo #2 for packages on the main repo that are newer than what's on the snapshot DVD. Old versions are systematically purged when they drop off the master repo (unless installed), unlike with the fixed distro update repo.
Couch-Build-tw: SuSE Build Service (SBS) repo. This is for SuSE packages in special collections. With the fixed distros, the local main repo had only packages on the DVD, and additional mainline packages lived in SBS, but for Tumbleweed they now have their own repo, Couch-Update-tw.
CouchNet-tw: For locally compiled packages and for packages obtained from alien sources.
Couch-PackMan-tw: Local mirror for PackMan packages.
packman-tw: Official PackMan site.

As discussed previously, I don't want to locally mirror the entire Tumbleweed distro (40.2Gb with swarms of updates each day) plus PackMan; instead I want to keep mostly the packages I am actually using. However, I want to do network installations from a local mirror, and that means I need to put on the (local) net an authentic, untampered Tumbleweed snapshot DVD ISO image. [Done.]

I need to set up repo priorities. My goal is for the latest version of each package to be installed, whichever repo has it. But unfortunately Zypper has priorities backward. At least according to the SLES deployment guide dated 2016-03-14, if several repos have a package (referring to the basename), it is downloaded from the repo with the numerically lowest priority (in 1-200, default 99). If there is a choice of versions in repos of the same priority, the latest is preferred, but later versions in higher numbered repos are not considered. If I make the SuSE repo's priority numerically lower than the local mirror, I'll get the latest version, but for every package: the local mirror will never be used. Therefore I will need to identify and download new versions to the local mirror, but only for packages I actually use. This script is called updaterepo.

I've adopted these priorities (lower numbers are preferred) (repo aliases beginning with Couch are local):

download.opensuse.org-twupd: 21; prefer emergency updates.
CouchNet-tw: 60; CouchNet can override any other source.
Couch-Build-tw: 62; SBS versions can override the main distro, but conversely if a better version appears in the main distro I'll have to notice and remove the SBS package.
Couch-PackMan-tw: 64; prefer local PackMan over most others.
packman-tw: 66; download missing packages rather than using SuSE.
Couch-Update-tw: 70; this is for newer versions than on the DVD.
Couch-Snoss-tw: 70; like Couch-Update except packages are never on the DVD and the baseurl is different.
Couch-SuSE-tw: 70; a moderately recent DVD snapshot. In this group of 3 repos, the newest package version in any of them is preferred.
download.opensuse.org-oss: 99; the official repo is available for never before installed packages, but updated versions have to be copied into Couch-Update-tw by the updaterepo script.
download.opensuse.org-non-oss: 99; same as download.opensuse.org-oss.

In the local mirrors I want to limit the number of old versions kept. In the main OSS repo most packages have only one version, and when multiple they are close in time; the biggest time spread I saw was one month. A reasonable strategy would be to download every version but to delete local versions that are no longer on the master repo.

Designing the updaterepo script:

It matches up local and remote master repos. It turns out that you can put extraneous parameters in a repo definition file such as Xmaster (in the local repo) which identifies the master from which new packages should be downloaded.
It makes a list of all the packages in the union of the local mirrors. Duplicate packages in different mirrors are possible.
It makes a separate list of potentially deletable packages. Reasons for non-deletion include being on the DVD, being in a repo with no remote master (i.e. CouchNet or Couch-Build), or being too young; downloaded packages are retained unconditionally for 30 days.
For every host having relevant versions (presently only 99.8), it collects a list of installed packages from rpm -qa. It merges, uniquifies and reformats this list. The result shows all packages that should be in the union of the local mirrors.
Unused packages are purged. The package must be potentially deletable, and not on the master (comparing the version), and not installed on any host (comparing only the basename).
Packages on the remote master with the same basename as an installed package are downloaded to the matching local mirror, if not there already. This includes all remote versions, particularly new versions that are not yet installed.
There could be packages from no known repo; those get a noisy warning that they should be found and saved in the appropriate local mirror. The union of the contents of the local and remote repos are known.

How is updaterepo going to obtain the rpm -qa output? This involves copying a file from each of the hosts to Diamond. I want it to run overnight, so it will have no human supervision and authority.

NFS: This is the natural choice for Mathnet (at work), but on CouchNet NFS has not been reliable, particularly for the virtual machines, so NFS is rejected.
scp (SSH): It requires a remote execution key. The generic key is not available. A special execution key is possible, and I've done it before, but it's kind of complicated, and I'm leaving special execution as a less preferred solution.
HTTP: Although all the CouchNet machines are supposed to have webservers, using the full infrastructure of Apache just for file access seems like overkill, and in a development situation the webserver may become broken. On Mathnet only servers, and only some of them, have Apache. So HTTP is rejected.
FTP: On CouchNet all hosts have socket activated vsftpd and it works pretty reliably. I think the same is true on Mathnet (can't quite remember). The infrastructure for FTP is a lot more lightweight than for HTTP. I think this is the protocol I will use.
Not so fast! At midnight when the updates are downloaded, almost all of the hosts are asleep, and the installed package lists cannot be retrieved. Let's try more mechanisms.
NFS again: The leaf node can write its rpmqa file to a directory on a machine that's always up: Jacinth. Updaterepo runs on Diamond, which can retrieve the files from Jacinth. But we still have the issue of not 100% reliable NFS. And root on the leaf node will be squashed to nobody on Jacinth, hindering writing. But I can su to some other user and chown the receiving directory to that user.
NFS security has been tightened up, and it appears to cue on the real UID (i.e. root) vs. effective UID of the user trying to steal the file. The leaf nodes, executing as root, were unable to write on the destination directory on Jacinth. Directories that the client does not have permission to traverse just don't appear in a directory listing (readdir).
Mail: That's so low tech, but is probably the cleanest solution. I generalized the housekeeping report storage script, and the installed package lists are being deposited.

Creating the New Repos

Shell Variable Cruft Check

You can source the file /s1/SuSE/source.me to set useful shell variables for the distro root and the major scripts in distro maintenance. This file was reviewed for being up to date. It auto-sets the architecture (according to the machine it's run on) and the release, taking the lexically highest version in /s1/SuSE/SuSE-build/$ARCH/ , which will be the new version when you create the directory for it. Here is jimc's version of source.me.

Remove Old Distros

We need to keep files for v42.1 until all hosts have been upgraded, but 13.1 can definitely be tossed. There is nothing from before 13.1. All sub-repos already have only x86_64 (and noarch and setup). At home, before cleanup, the repo occupied 114 Gb (oink). After, 69Gb; tossed 85Gb.

At the same time, we need to get rid of the configuration files for old distro versions. These are in /home/post_jump/${RLSE} or on Mathnet, /h1/post_jump/${RLSE}. Being anally retentive I made a directory ./ancient and moved all the obsolete dirs into it. When deleting one of these directories it's a good idea to shred the non-public files. Here's how, illustrated for distro version 10.2:
find 10.2 -type f ! -perm -04 -ls |& less #Are you going to remove the right files?
find 10.2 -type f ! -perm -04 -print | xargs -n 25 shred -u -n 3 rm -r 10.2

Create Directories

Create all the directories for distro components. Owned by root:root, mode 755, except the update dir has to be owned by wwwrun. The actual content (metadata files) is provided later. These are illustrated for the new version, 99.8 and all paths are relative to $di (/s1/SuSE).

./CouchNet/x86_64/99.8 -- Locally created packages, or from alien repos.
./PackMan/x86_64/99.8 -- Local copies of PackMan packages.
./SuSE-build/x86_64/99.8 -- Local copies from SuSE special collections.
./SuSE-build/x86_64/99.8/iso -- Storage for ISO images of installation discs.
./SuSE-noss/x86_64/99.8 -- Local copies of non-open-source SuSE packages (which are never on the DVD).
./SuSE-update/x86_64/99.8 -- Local copies of main distro packages that are newer than what's on the DVD.
./SuSE/x86_64/99.8 -- The Tumbleweed DVD is mounted here.

Download ISO Images

Web links:

Tumbleweed download index page
ISO image directory. Ignore the innumerable change notices. You're looking for $NAME-$ARCH-current.iso and its sha256 checksum.
Tumbleweed DVD, a snapshot of its current state. (3.9Gb) I initially bypassed this, but found later that it was essential.
Tumbleweed network installer CD (85Mb)
Rescue CD (613Mb)

To be totally straight arrow about your security, you need to follow this procedure for each downloaded image, here called $name.iso One of my downloaded files came from a Russian mirror site, and given the pervasive dirty tricks (particularly in I.T.), from Russia in the USA 2016 election, it would be a good idea to take security seriously.

You should have downloaded the checksum files ($name.iso.sha256).
You need to download the openSUSE Project Signing Key if you don't already have it. Command:
gpg --keyserver hkp://keys.gnupg.net --recv-keys 0x22C07BA534178CD02EFE22AAB88B2FD43DBDC284
It is also included in your old distro. Take the last 8 digits of the key hash, convert to lower case, and look for $distro/gpg-pubkey-3dbdc284-xxxxxxxx.asc . Then do:
gpg --import $distro/gpg-pubkey-3dbdc284-xxxxxxxx.asc
If you really believe in this key, sign it yourself, to avoid the messages that there is no trusted signature.
Check the signature of the checksum file, proving that it was signed by the openSUSE team (or someone who had stolen their secret key). GPG is picky about file extensions, so you need to fake it out like this:
ln -s $name.iso.sha256 gorf.gpg
gpg -a gorf.gpg
It maunders that the signature is good, but there is no trusted signature; there is no indication that the signature belongs to the owner. The payload is written to the file without the gpg extension, i.e. gorf.
Now check the file itself:
cat gorf # And note the filename, the second field, after the checksum.
ln -s $name.iso $filename
sha256sum -c - < gorf
It should report $filename: OK. If not, either you mixed up the filenames, or there was an undetected error during download (unlikely), or the national security agency of a country that shall remain unnamed is meddling with I.T. infrastructure.
Remove the gorf files and symbolic links, and repeat for the other ISO images.

A less high tech alternative is to do sha256sum $name.iso, then grep Tumbleweed $name.iso.sha256, and compare the sums by eyeball, 64 hex digits.

Populate the Distro Metadata

Formerly I mounted the fixed distro's ISO image on ./SuSE/x86_64/$version, but for Tumbleweed I need to populate it from the DVD (if I downloaded it, which I didn't) or from the SuSE repo itself. Metadata for the SuSE-build and CouchNet repos can be copied from the old versions. [Update: I still need to mount the DVD, and to download an up-to-date instance before upgrade campaigns or before installing Tumbleweed on a new machine.]

To populate the SuSE repo: (0.67Gb download) (The barely visible last argument is a dot, meaning to populate the current directory.)
cd ./SuSE/x86_64/99.8
rsync -a --no-r --dirs --log-format="%o %f" rsync://rsync.opensuse.org/opensuse-full/opensuse/tumbleweed/repo/oss/ .
rsync -a --log-format="%o %f" rsync://rsync.opensuse.org/opensuse-full/opensuse/tumbleweed/repo/oss/{boot,docu,media.1} .

To populate the non-OSS repo: (0.0002Gb download :-)
cd ./SuSE-noss/x86_64/99.8
rsync -a --no-r --dirs --log-format="%o %f" rsync://rsync.opensuse.org/opensuse-full/opensuse/tumbleweed/repo/non-oss/ .
rsync -a --log-format="%o %f" rsync://rsync.opensuse.org/opensuse-full/opensuse/tumbleweed/repo/non-oss/{boot,media.1} .

Package keys wanted: The first two come with the main distro DVD; the others go in the CouchNet or Mathnet repo.

gpg-pubkey-307e3d54-4be01a65 -- SuSE Package Signing Key
gpg-pubkey-3dbdc284-53674dd4 -- openSUSE Project Signing Key (new)
gpg-pubkey-1abd1afb-4c97c60c -- PackMan Project (signing key) (2006)
gpg-pubkey-c8da93d2-493f7d78 -- VLC openSUSE Repository (not for Mathnet)
gpg-pubkey-5c6c793e-4b272bea -- Carter Family Trust Distro Signing Key
gpg-pubkey-0cc9523f-4a9865cc -- (or) Mathnet Distro Signing Key

These keys are obsolete:

gpg-pubkey-3dbdc284-4be1884d -- openSUSE Project Signing Key (old, v11.4)

How to identify a public key:

gpg gpg-pubkey-0cc9523f-4a9865cc.asc

It prints: pub 1024D/0CC9523F 2009-08-28 UCLA-Mathnet Distro Signing Key <distro@math.ucla.edu>

Check Update Download Script

For Tumbleweed the update repos and the main repos are updated the same way using the new script updaterepo. See the 42.1 transition writeup for how to update the script rsyncsuse if reverting to fixed distros.

We need to switch from weekly execution of $di/bin/rsyncsuse.sh to updaterepo. [Done.]

Virtual Machine

One of my virtual machines for development is called Oso. It is hosted on Diamond, the repo site and compute server, using KVM as the framework and qemu as the virtual executor. The guest connects itself to the host's network bridge (br0) and thinks that it is directly on the local network. Its disc is 17.2Gb (external). Deducting the swap partition and inode storage, it has 15.3Gb (total internal) of which 7.3Gb is unoccupied. This should be plenty to hold the RPM files that have to be downloaded from SuSE. So I'm going to develop it up the kazoo: it will be the first machine to migrate to Tumbleweed.

VM Specifications: Oso is defined in this file. Key parameters of Oso and Petra are:

RAM: 1Gb
CPUs: 2
Architecture: x86_64 (both VMs), formerly i686 for Petra.
Disc: 17Gb (why such an odd number? Not odd, it's 2³⁴) on the virtio bus. 7.3Gb are currently unoccupied in the root partition. (#2 in boot order.)
CD: Symlink to whatever ISO image, or to /dev/null. Currently it points to the network installer for Tumbleweed. (#1 in boot order.)
Graphics: VNC on Cirrus Logic emulation
MAC address: 52:54:00:09:c8:d4 (Oso) or 52:54:00:9:c8:c7 (Petra) on the host's network bridge (virtio). Local convention: the MAC address of a VM is the KVM vendor number (52:54:00) followed by the last 3 octets of the machine's assigned IPv4 address.

An initial step is to junk old saved copies of Oso's disc and to make a new backup of its current state on v42.1. Command line: gzip -c disc1.raw > ./disc1.421final.gz Run time: 12 mins. Output file size: 5.7Gb.

Commands to use on the virtual machine with libvirt:

gzip -c disc1.raw > disc1.421final.gz # To save a disc snapshot (12 min)
zcat disc1.421final.gz > disc1.raw # To restore a snapshot (8 min).
virsh undefine oso # To undefine a VM (when altering the XML).
virsh define /s1/kvm/oso/oso.xml # To define a VM (with new XML).
virsh start oso # To start the machine.
virsh shutdown oso # Signals the machine to do a normal shutdown and poweroff.
virsh destroy oso # To kill a hung machine.
virt-viewer -c qemu+ssh://root@diamond/system -w -r oso # Start viewer on your desktop/laptop machine using VNC.
virt-viewer -w -r oso # Only works on the execution host; not recommended; the X-Windows protocol is a slug compared to VNC.

Lurking dragon: make sure on the VM that /etc/default/grub_installdevice says to install grub in the MBR. 99% of the time you want the device to be (hd0), not e.g. (hd0,1) which would install in the root partition's boot sector. To detect whether (hd0) is correct you could do:

grub2-probe -t drive /boot/.

Remove the partition ID, e.g. it might print (hd0,msdos2) and you ignore ,msdos2 keeping (hd0).

Initializing the Local Mirror Repo

Now transforming this package list into a script to download the actual packages to the CouchNet enterprise mirror. 10min flat to download; sheesh, that's fast! 2.46Gb actually downloaded. We have 50Mbit/s FIOS. Here's the script:

#!/bin/bash
# Downloads a list of packages.  Do "rpm -qa" to get the list.
# Expects to see names like pkgname-1.2-3.4.x86_64

function snarf () {
    local arch=${1##*.}
    echo $arch/$1.rpm
}

function myrsync () {
	# -a = preserve metadata, -R = preserve input path on output,
	# -no-{o,g} = files to be owned by local root:root, not whatever 
	# special user is on their server.  
    rsync -a --no-o --no-g -R --files-from - --log-format="%o %f" \
	rsync://rsync.opensuse.org/opensuse-full/opensuse/tumbleweed/repo/oss/suse/ \
	/s1/SuSE/SuSE/x86_64/99.8/suse/
}

cat <<EOF | (while read pkg junk ; do snarf $pkg ; done) | myrsync 
aaa_base-extras-13.2+git20170512.8fa87a3-1.1.x86_64
abiword-docs-3.0.2-1.2.noarch
etc. etc.
EOF

39 packages could not be downloaded. These are obsolete, except for one of them which appears to be no longer available.

The gpg-pubkey pseudo-packages are a special case.
x86_64/kernel-default-4.1.39-56.1.x86_64.rpm (we're on 4.10.13)
x86_64/libgnutls28-3.2.15-8.1.x86_64.rpm (we're on gnutls-30)
x86_64/libhogweed2-2.7.1-9.1.x86_64.rpm (we're on hogweed-4)
x86_64/libnettle4-2.7.1-9.1.x86_64.rpm (we're on nettle-6)
noarch/lightdm-webkit-greeter-branding-bevel-1371971285-1.1.noarch.rpm (no longer on SBS, though lightdm-webkit2-greeter is.)
x86_64/openSUSE-release-dvd-42.1-1.46.x86_64.rpm (obsolete)

7 packages were in SuSE Build Service (SBS) repositories and were downloaded into the local SBS repo. Two were in the non-OSS repo; a similar repo was created locally and the packages were downloaded to it.

The following 17 packages are probably from PackMan:

x86_64/gstreamer-plugins-libav-1.6.1-2.1.x86_64.rpm
x86_64/gstreamer-plugins-ugly-orig-addon-1.6.1-54.1.x86_64.rpm
x86_64/libdca0-0.0.5-3.1.x86_64.rpm
x86_64/libfaac0-1.28-9.1.x86_64.rpm
x86_64/libfaad2-2.7-15.1.x86_64.rpm
x86_64/libmad0-0.15.1b-1.1.x86_64.rpm
x86_64/libmp3lame0-3.99.5-1015.1.x86_64.rpm
x86_64/libmpeg2-0-0.5.1-3.1.x86_64.rpm
x86_64/libmpeg2convert0-0.5.1-3.1.x86_64.rpm
x86_64/libopencore-amrnb0-0.1.3-4.1.x86_64.rpm
x86_64/libopencore-amrwb0-0.1.3-4.1.x86_64.rpm
x86_64/librtmp1-2.4+git20150115-1.1.x86_64.rpm
x86_64/libtwolame0-0.3.13-2.1.x86_64.rpm
x86_64/libvo-aacenc0-0.1.2-3.1.x86_64.rpm
x86_64/libvo-amrwbenc0-0.1.2-3.1.x86_64.rpm
x86_64/libx264-148-0.148svn20150804-1.1.x86_64.rpm
x86_64/mpeg2dec-0.5.1-3.1.x86_64.rpm

I'm going to defer the PackMan downloads, since most of these are undoubtedly 18 months back version, and to get current versions is going to involve either a lot of hand labor, or letting Zypper pick the version, which is much preferred. [Downloads accomplished using the new updaterepo script.]

Repo Maintenance Scripts

The next group of steps will be:

Clone the config file library for 99.8. [Done]
Edit the repos in the library. I need to provide definitions for the local mirrors. I need to add proxy parameters to the remote masters, and to turn on keep-packages. See above for a list of repos. [Done]
Create the directories for the local repos, and retrieve the metadata content. [Done]
Create and debug the updaterepo script. [Done.]
It's hard to get rsync's filter rules right. It would not download any files, that I knew were on the server and that the rules were expected to accept. At first I blamed server configuration, but it turned out that I was excluding containing directories. Anyway, I ended up listing all promising files, then picking relevant ones locally, and using --files-from to download only those.
The PackMan repo was a special challenge, since it is a rpm-md type repo vs. yast2 for everything else.
Improve /s1/SuSE/bin/mksuserepo so it can sign a repo without human intervention. [Done.] Here we have to balance security, usability and availability. While the individual packages are signed (but for SBS packages I rarely have the authors' keys nor any secure way to know that they aren't fake), it's also important that the repo contents not be tampered with. In an enterprise, it would be reasonable to download the files first, then have a responsible sysop give his passphrase to decrypt the secret key for the repo signing key. In my small operation this is definitely overkill and I want to automate the whole process. [Done.] [Famous last words; Zypper rejected the content file.]
Turn it on to be run from cron. [Done, runs, gets updates.]

Initial Installation

Now I'm ready to install Tumbleweed for the first time. I'm going to want to upgrade v42.1 but to save the result before installing any CouchNet customizations, to know what customizations I'll actually need. I'm going to also do a completely default installation on bare (virtual) metal, to know what SuSE wants to give me when not influenced by existing package selection and configuration.

Unhacked Instance of Tumbleweed

To make progress I need an unhacked instance of Tumbleweed, to compare with the normal CouchNet v42.1 installation. It's going to go on Oso. Steps:

Switch Oso's CD link to the network installer ISO (was /dev/null).
Power off Oso, then start it again. That's when it re-reads the link.
It promptly boots the net installer. Hit F4 (Source) promptly. The default is to boot from the hard disc after a timeout.
Scroll to HTTP and hit Enter. (Actually HTTP is the default.) Fill in the server and directory. Normally these would refer to the CouchNet enterprise mirror, but since this is not yet populated, take the defaults, which are download.opensuse.org and tumbleweed/repo/oss/ (no leading slash, yes trailing slash). Hit OK. It doesn't ask you if you need to use a proxy.
Scroll to Upgrade and press enter. Downloading is not too slow. When you get the Plymouth progress bar, hit Esc to see the real boot messages. Starting about 11:00.
It calls the NIC ens3. It gets a random DHCP IPv6 address (as suggested by my router advertisement), and it gets, but doesn't announce, its own assigned IPv4 address from DHCP.
Agree to the installer's license. (There's another one later.)
It probes the system, then downloads packages and metadata including ./setup/descr/packages.gz .
Select for update: There's only one candidate, the root partition. Hit Next.
It proposes to remove all the CouchNet repos. That's correct for this initial installation. Hit Next.
It proposes to add the Tumbleweed OSS, non-OSS and update repos. Tidbit: there is an update repo. The description says that it holds urgent updates until they get integrated into the main repos. Accept proposal, hit Next.
Read and agree to the OSS repo license. Don't sue us, we'll sue you. openSUSE is a trademark. No warranty. You aren't allowed to develop nuclear weapons using openSUSE. Linux is a registered trademark of Linus Torvalds. Hit Next. It takes quite a long time to get to the next step.
Installation Settings:
- System and Hardware: Yes it found the NIC on Virtio Ethernet Card 0. (Hit OK.)
- 2646 updated packages, 1515 new packages, 204 removed packages, total size (installed?) 7Gb.
- Bootloader: grub2. No Trusted Boot (EFI) on this machine. Status Location: /dev/vda1 (root). It says to not install booter in MBR, do install in partition header.
- Change to put the booter in the MBR and not the partition header. I tried putting it in both places but got burned. Forum posts suggest that this is a known problem, but poorly explain what's going wrong.
- There is no section for allowing SSH through the firewall if there is one. (firewall.J was preserved and it lets SSH come in.)
- Hit Update.
Agree to Adobe ICC Profiles license. It's going to install Nouveau (and then not use it); agree to the at your own risk license. Agree to GStreamer Fluendo license. Start the upgrade. (At 11:27)
It needs 4158 OSS packages, 2 non-OSS, 1 from Main Update Repo. About 6.8Gb to download. Predicted time 105 mins (varies with weather on the download server). Evidently it's predicted to fit (just barely) on the available disc. On its way...
Download finished at 12:51. 78 mins. Not bad.
Post-install scripts and activities, including a few forgotten package downloads, ran until 13:02.
Reboot. When it loads the net installer CD, the default is to boot from the hard disc. Either wait for the timeout, or hit Enter. Then select openSUSE Tumbleweed. It boots. Seems normal (except named failed to start). It started the X-Windows greeter, looks like my lightdm, but died one second later.
Flip to virtual terminal 1 and log in as root. It let me on.
Disc usage: Root is using 9.2Gb, 5.3Gb available, 1.9Gb more used than before the upgrade.
The RPM files were deleted after installation (enabling everything to fit).

Checking Out Tumbleweed

Oso is on the net, and many items are working. First I ran checkout.sh just to see what survived. Discrepancies:

Big ugly circular dependency in boot units: basic.target timers.target time-sync.target chrony.J.service network.target wicked.service firewall.J.service basic.target .
rpcbind.socket: nobody listening, netstat is not installed.
earlyhostname failed.
dbus: could not connect to dbus.
rsyslog: it's running but the test message was not found.
named: failed.
avahi-daemon: running, but query timed out. Offsite queries OK.
autovt@.service is in a weird state; this is a template unit.
bridge.J: wanted, disabled, active (counts as a failure).
rng-tools: dead.
sshd: key_load_public invalid format, but the command was executed.
wakeup.J: wanted, disabled, active (counts as a failure).
slpd: failed.
cronj: failed.
display-manager: failed. (See more info below about customizing MDM.)
5 daemons are wanted but not enabled.
12 daemons are enabled but unwanted. The list is not properly folded.

Only the dbus item is critical. Dbus seems to be working; the problem must be with the tester. I'll debug this stuff later.

I saved a copy of oso:/etc and its package list, rpm -qa.

Shutting down Oso and making a copy of this disc.

Default Installation

My next step will be to wipe Oso's disc and install on bare (virtual) metal. The result will be called oso-pure. This time around I'm going to accept its offer to put a btrfs filesystem on the root partition. [Update: reverted to ext4.] Later I plan to convert the old machines to btrfs. [Update: not going to happen.] In package selection I will use defaults as much as possible, to find out what they intend to give me, but I will use a XFCE desktop framework and will decline KDE and Gnome (as usual). AppArmor also will be bypassed, but the SuSEFirewall will stay (for now).

I already made a copy of Oso's disc. Wipe the disc with 0's. 17179869184 bytes seems like an odd size; actually it is 2³⁴ bytes or 16Gb.
Run virt-viewer. Start Oso, booting from the network install disc, and pick New Install.
It is using its assigned IPv6 address from DHCP. Downloading installation system from the SuSE server.
Installation system does not match your boot medium. Sorry, this will not work. I tried using the Curses UI, and tried the up-to-the-minute network installer, unsuccessfully.
Tumbleweed blog post dated 2016-10-02, Dominique Leuenberger answers a user complaint on this issue saying add kexec=1 on the kernel command line, and it will download the initrd and kernel from the repo rather than relying on the back-version boot media which, for Tumbleweed, is never up to date. So how do you do that?
Boot the net installer ISO. Scroll to Installation. Type kexec=1 (no spaces, appears as Boot Options) and hit Enter. Hit Esc when the Plymouth progress bar covers the screen. OK, now it downloads the current kernel and initrd, kexecs it, and goes through the initrd again. Now it downloads the installation system. And it starts up!
Initial installer steps:
- Agree with the installer's license and/or set language and keyboard.
- You don't get a chance to set up custom repos.
- Proposed partitions: 1.39Gb swap, 10.00Gb root with btrfs, 4.60Gb home with xfs. Lots of sub-volumes, some with no COW. Many of these sub-volumes are for software that I'm not installing, like mariadb and mailman. I edited to ditch the separate home (which was turned into a sub-volume), and otherwise accepted the proposal.
- Time zone: Pacific/Los Angeles.
- Computer Role: Custom. And here you can configure repos. I.e. choose which of their repos to use. Just take the defaults.
- Agree to the main repo license. Back to Computer Role. Hit Next.
Software Selection: Since I picked the Custom role, I'm minus some software. I made these changes, trying to leave defaults as much as possible.
- I engaged these patterns:
  - Console Tools
  - 64bit Runtime Environment (vs. x86)
  - XFCE Desktop Environment (vs. Gnome, KDE, LXDE, LXQt, Enlightenment, MATE)
  - C/C++ Development
- I left these patterns turned off:
  - AppArmor (on by default; I turned it off)
  - Office Software
  - Games
  - Technical Writing
  - All server patterns
  - Non-OSS packages (but it installed a few anyway)
- These wanted packages were already selected: tk, rsync, m4, expect, pam-modules.
- I specially added these packages: (Hit Details, then the Search tab): pam_krb5, krb5-client.
- This package is missing and is going to cause problems when a user tries to log in: pam_ldap. It's in the main distro in v42.1. Found it for Tumbleweed.
Final installer steps:
- Local User: select Skip User Creation, hit Next.
- Password for root: give it.
- Now we're on the Installation Settings page.
  - Booting: Trusted Boot is off by default, at least for virtual machines (it knows this is a VM). Change to install the booter in both the MBR and the root partition header. [Update: just the MBR or bad things will happen.]
  - Software: 1.2Gb to download, 4Gb installed.
  - Firewall: enabled, change to open and turn on SSH. You need to do both.
  - Nothing about image installation, nor any sign of images being downloaded.
- Let her rip! 1857 packages to install. Downloading took 37min; post-install activities took 4 mins more.
Logging in as root: flip to VT1; too dangerous to run XFCE as root. It knows that the hostname is oso. It's using the correct IPv4 and IPv6 addresses on ens3. It let me on.
In diamond: /scr/oso-setup-1706/oso-pureinst I saved a copy of the package list and of /etc.

Package Selection

When rebooting Oso, remember that it still has the network installer disc in the virtual drive, and there will be a 30sec timeout before it does the default of booting from the hard disc. Soon this can be reverted.

Actually I faked myself out: Oso is supposed to be set up as a user workstation (rather than pure development), so it should have these patterns turned on: Office, Technical Writing, and Games. In couchnet.sel I don't select everything in these categories; the purpose is to see what they think a normal user workstation ought to have on it. Now I'm installing the omitted patterns. I've turned on keeppackages=1 and will retrieve missing packages into the local mirror repo. 2672 packages to download including texlive. Downloaded 1.3Gb, installed 2.8Gb, elapsed time 41min including 4min post-install activities.

For testing, I'm also creating a local account for myself (with SSH access).

I now have a set of files of package basenames (minus version and arch) on Claude (1967 packages) (Oso should come out similar), Oso pure installation (1859 packages), and Oso upgraded (8388 packages, oink). Plus Oso not upgraded but with the user packages (4531 packages). Command line to create these lists:

ssh oso rpm -qa | sed -e 's/-[^-]*-[^-]*$//' | sort -o bases.oso.pureuser

Package groups on Claude-42.1 and not on Oso-pureuser. Not every package, just the interesting ones.

abiword, dia, gimp,
Some games are on Oso and some aren't.
alsa-plugins-pulse
apache2 and friends
bind (DNS)
bridge-utils
chrony
ConsoleKit etc
cpp48 (upgraded to cpp7) and related compiler stuff
cyrus-sasl-saslauthd
gstreamer (presumably 1.0) and also gstreamer-0_10
gtk2 stuff is not on oso.pure but is on oso.pureuser
kernel-default (so what package has the kernel on Oso?)
krb5 stuff
(skipping all libraries)
openslp-server
A lot of perl modules
pam_ldap and pam_ssh (this will make trouble) (Found it.)
php5 and modules
pulseaudio
python modules (upgraded to python3)
ruby (upgraded from 2.1 to 2.2)
termcap
texlive (93 pkgs) (is on oso.pureuser but not oso.pure)
tkinfo
w3m-inline-image
various X clients
xf86-{input,video-(various)
xfce4 panel plugins (not all)

Package groups on Oso-pure and not on Claude-42.1:

Adobe fonts
apache-pdfbox (investigate)
blog (investigate)
brasero
cdrdao + cdrecord (this has come back?)
cpp7, gcc7, etc. (upgraded from cpp48)
emacs (Gnu)
freerdp (investigate this)
ftview and friends (investigate)
graphviz (investigate)
grub2-x86_64-efi
gstreamer-plugins-cogl (investigate what is cogl)
gtk3 (upgrade from gtk2)
(skipping all libs)
libreoffice and numerous friends are on both machines.
ntp (I want chrony)
open-iscsi (why?)
openmpi
openssl-1_0_0 (version split? make sure everything works.)
plymouth and friends (toss)
pragha (music player? investigate)
There are modules for both python2 and python3
remmina (Multi protocol remote desktop; tried, didn't like)
rollback-helper (investigate)
rsocket (sounds ominous)
ruby2.2 (upgrade from 2.1)
samba (toss)
screen
sensors (is back)
sound-juicer (to rip CDs; good but not on a VM)
SUSEConnect (what is it?)
susepaste (investigate)
swig (investigate)
texlive, 2359 packages, none are supposed to be on Claude.
tmux (investigate)
transmission-common (investigate)
tuned (investigate)
xclip
xfce4-screenshooter
xorriso (investigate)

Notable daemons running on Oso-pure:

A ton of kernel threads.
systemd (of course) for system and user
auditd
dbus
avahi-daemon
rngd (fill-watermark = 3700, same as for v42.1)
polkitd
wicked
ntpd
lightdm with gtk greeter
sshd
gvfsd
postfix
cron
cupsd
These are useless: ModemManager nscd

Packages are not radically different from what's on Claude now. Actually one of my leading issues is, do I get valuable security by running auditd, which I don't run now? There are, as usual, several interesting unfamiliar packages in the default load, which I will want to investigate later.

Toward a Working VM

For package selection, I'm going forward just copying couchnet.sel from v42.1. Of course screwups will be revealed and I'll deal with them when found. The goal in this step is to run post_jump on oso-pureuser and then to get it to pass the tests in checkout.sh. Real reconciliation between v42.1 config files and as-installed Tumbleweed config files will happen once that's done, so the edited v42.1 files will be functional in the Tumbleweed context.

Here's a list of needed script updates:
- I intend to leave the configuration files alone for now.
- Tumbleweed's /etc/os-release does not state the architecture at all, and doesn't have a VERSION= keyword. Got to fake it from the PRETTY_NAME. [Fixed in a lot of scripts.]
- New script retrieve-pkgs which copies packages in the target machine's /var/cache/zypp/packages to the local mirrors. [Done.]
- Change audit-pkgs with a new action -U to do a dist-upgrade. [Done.]
- post_jump needs to auto switch to -U (from -u) on Tumbleweed. [Done.]
- Best to run post_jump the first time by copy and pasting lines one at a time. [That's what I did.]
Next step is to run post_jump on oso-pureuser, fix program problems as I go along, and finally work on package selection problems.
- I'm running in a subshell, cutting and pasting post_jump line by line, to catch problems immediately.
- It installs /etc/resolv.conf from the master to the slave, and the master (Diamond) has link local DNS addresses like fe80::201:c0ff:fe12:3044%br0 (and br0 doesn't exist on Oso). The resolver library successfully ignores them and uses the global addresses. I didn't try to fix this.
- Among the basic config files installed in the first batch of post_jump, the only major change is to sshd_config. The v42.1 sshd_config works on Tumbleweed.
- Tumbleweed's /etc/os-release lacks VERSION= and the architecture. Got to fake it for Tumbleweed by cowboy programming, several places. It works now.
- audit-pkgs -G takes a long time to deliver the list of wanted kernels. Be patient, it's not hung.
- Squid installations in general should be configured to avoid open proxying, i.e. downloading anything for any client, so they aren't used as force amplifiers in a DDOS attack. My squid was cueing on target hostnames (download.opensuse.org etc), but if (when) the request is diverted to a mirror it will fail giving a 403 return code (permission denied). So we have to cue on path segments. I ended up with these ACLs:
  - /tumbleweed/repo/ for the main repo.
  - /update/tumbleweed/ for the update repo's baseurl.
  - /repositories/openSUSE:/Factory:/ is what the update repo's baseurl redirects to.
  - /suse/openSUSE_Tumbleweed/ for PackMan.
- I ran audit-repos which copies the new repo definition files to the target (Oso), but I'm having trouble refreshing the repos. The symptom is, it downloads some or all of the files, says it is rebuilding the cache, and then maunders, Failed to cache repo (4). Without useful error messages. strace was my friend. There were several screwups that had to be fixed.
- When the new repo packages are downloaded at midnight, e.g. from download.opensuse.org-oss to Couch-SuSE-tw, how am I going to sign the repo metadata? I put in a -i option to mksuserepo making it omit packages.gz and friends, because those are the only ones that change (and are the most important for security). The resulting content file was accepted by zypper-1.12.50-19.1 for OpenSuSE Leap 42.1 but was rejected by zypper-1.13.28-1.1 for Tumbleweed as of 2017-06-20. Re-signing without -i fixed some of the CouchNet repos. I'll need to deal with unsigned keys in PGP, later.
- When I refresh Couch-Build-tw and Couch-PackMan-tw (zypper refresh), /usr/bin/susetags2solv from libsolv-tools-0.6.27-2.1 gets SIGSEGV. Always fails. "rpm -V libsolv-tools" returns 0 (not corrupt). ldd indicates no weird libraries. (It uses liblzma and libbz2 but nothing for regular gz.) It was called with the -c $dir/content option, and is supposed to receive a packages file on stdin. It writes a solv file on stdout.
  What do you want to bet, that the URL-like objects in the content file are the cause of SIGSEGV? Working on Couch-Build-tw. Removed NAME, which is probably a holdover from SuSE-11.4. That cured the problem! Also worked for Couch-PackMan-tw. download.opensuse.org-oss/content has a REPOID line and Couch-Build-tw doesn't; lack of that line doesn't seem to make trouble.
- Web resources about YaST2 repository metadata:
  - Overall format of the repo.
  - Format of packages.gz with definitions of the keywords.
  - Metadata signatures. (Doesn't tell what is required to be in the content file.)
- Back to running audit-repos. After I put in the cowboy programming to use version 99.8 to represent Tumbleweed, it now works and the repos are refreshed.
- Once again running post_jump by hand. Oso has btrfs with about 22 sub-volumes. The section in post_jump to generate /etc/exports is totally flummoxed. I'm not going to try to fix it now -- have to come back to it. PROBLEM
- Move /usr/local into /m1/local… Workstation means a Sun3-50 that mounts /usr/local (etc.) from its fileserver. With btrfs, the host mounts the /usr/local subvolume, faking out this test. I'm commenting it out.
  Another problem -- with btrfs, /usr/local is mounted, so you can't remove it and replace with a symlink. This needs to be fixed somehow. PROBLEM [Fixed I hope]
- pushconfig also needs to be able to determine the version that we're using with Tumbleweed.
- Phase 3, /dev/root symlink, with btrfs, we mount by GUID rather than /dev/whatever, and it fails to determine the root filesystem device. Actually when I'm using labels to mount the root on v42.1 it fails also. Looking for it in /proc/mounts fixes the problem.
- /etc/machine-id has mode 666! Oooooo! A bug! PROBLEM
- Phase 3 ends with installation of local kernel modules in t=$opt_s/mathnet/modules/$arch/. , with a note saying it's broken and should be fixed or removed. This of course is only for Mathnet and I don't think we've used this for ages. It ends up finding and installing no modules. If we ever have to resurrect it, I'm leaving the code in.
- Phase 5, installing missing packages. It installed package signing keys and removed existing related keys. How likely is this bogus?
  - gpg-pubkey-1abd1afb-4c97c60c installed, PackMan 2006 - 2014, key ID 4096R/1ABD1AFB
  - gpg-pubkey-1abd1afb-54176598 removed (not on Diamond)
  - gpg-pubkey-c8da93d2-493f7d78 installed, VLC 2006 - 2010, key ID 1024D/C8DA93D2
  - gpg-pubkey-1abd1afb-54176598 removed (not on Diamond)
  These wanted packages were not found:
  - chromium-desktop-gnome
  - gimp-help-browser
  - krb5-appl-clients
  - krb5-appl-servers
  - libreoffice-templates-en
  - perl-Astro-Sunrise (really wanted)
  - python-gstreamer-0_10 (probably should be tossed)
  - openssl-doc
  - python-pycurl
  - python-qt4
  - xchat
  - ConsoleKit
  - ConsoleKit-x11
  - susehelp_en
  - lightdm-webkit-greeter-branding-bevel
  These packages were found through capabilities:
  - dbus-1-python3 provided by python3-dbus-python
  - dirmngr provided by gpg2
  These conflicts were resolved manually:
  - liberation2-fonts-2.00.1-7.6.noarch conflicts with liberation-fonts-1.07.4-1.5.noarch , toss the latter.
  - systemd-logger-232-10.2.x86_64 (symbol syslog) conflicts with rsyslog-8.27.0-1.1.x86_64 , toss the former. Is this wise? Everything seems to be working; let's accept this package selection issue. Now, how to make it happen automatically?
  - ffmpeg-3.3.2-3.1.x86_64 requires libavcodec57 = 3.3.2-3.1 which cannot be provided. Install it (with vendor change from OpenSuSE to PackMan). Similar issues with libavutil55-3.3.2-3.1.x86_64 libswresample2-3.3.2-3.1.x86_64. This would happen automatically in a dist-upgrade.
  428 new packages to install, 1 to remove, 3 to upgrade with vendor change. 439Mb to download, 1.7Gb installed.
  These packages could replace installed ones, but they come from a repo with a lower priority. Spot checks show the installation was from Couch-SuSE-tw (priority 64) while the lower priority (100) belongs to download.suse.org-oss (the master repo). This issue would be resolved if packages were freshly downloaded from the master.
  - apache2-doc-2.4.26-1.1.noarch
  - gvim-8.0.627-2.1.x86_64
  - libopenblas_pthreads0-0.2.19-2.4.x86_64
  - Similarly for libopenblas_serial0 .
  - libreoffice-icon-theme-tango-5.4.0.0.beta2-1.1.noarch
  - openvpn-2.4.2-2.1.x86_64
  - whois-5.2.16-1.1.x86_64
  Why are we installing ntp-doc when we use chrony? Because we're idiots. [Removed from couchnet.sel.]
- Erasing unwanted packages: Removed 2316 packages. While there were warning messages, none appear ominous.
- Investigate these interesting packages that were removed. The list of removed packages is saved in ~/upgrade/suse-tw/removed-pkgs
  - apache-pdfbox-1.8.12-4.2
  - pidgin -- is it able to do voice and/or video over XMPP? (not removed)
- Dist-upgrade step. post_jump now uses -U for Tumbleweed and -u for any other version. audit-pkgs has the -y option; is this wise? No, and it was removed in v11.4 (I think) with the advent of the --non-interactive option to zypper.
  It proposes to add 19 new packages: flash-player-ppapi php7 and php-7 modules. To remove chromium-pepper-flash php5 and the same friends. (These are all the currently installed php5 modules.) Upgrading 14 multimedia packages, all vendor change to PackMan. Total of 35 packages to be downloaded, 16.6Mb. Minus 4.9Mb after the installation. Doing it.
  It wanted to reinstall Mesa-demos and pam_ldap again. I didn't see any error the first time around. Doing it. It downloaded and reinstalled the packages (NOKEY warning), and wanted to do them again, infinitely. Killed. This has got to be a bug in zypper. PROBLEM Also, audit-pkgs should do the dist-upgrade only once. [Fixed.]
- audit-scripts:
  - Circular dependency (noted above) set off by acpid.service. There are actually 618 of these, fixed by ignoring them.
  - What is autovt@.service , not in conf file, we're killing it. This may kill the getty on vt1. [Added to scripts.dat]
  - What is issue-generator.service
  - ConsoleKit is not installed (gone), no console-kit-daemon.service
  - default.target and graphical.target has no [Install] section, could not reenable.
  - postfix, insserv tried to do /etc/init.d/postfix,start=3,5 ; don't we have a systemd unit for postfix? PROBLEM
  - wakeup.J.service apparently has no systemd unit, can we fix? PROBLEM
- These items were enabled and not in conf file; they were allowed to be killed. runlevel5.target dm-event.socket fstrim.timer logrotate.timer
- Comparing /etc/passwd group shadow: passwd, shadow were unchanged. Added wwwrun to group www and vnc to shadow, and group wwwrun is new (484).
- /etc/krb5.conf is missing! krb-maint test -v of course failed. At some point the symlink got installed and Kerberos is back in action.
- Housekeeping -- it should exec /tmp/fixup.sh to accept all the new files. [Done.]
- Check if /etc/default/grub has the correct version name (Tumbleweed). [Edited file in post_jump dir.]
- Checking X.509 certificates: /etc/pki/trust/anchors-jail does not exist, should be created. [Added to post_jump dir.]
- I used retrieve-pkgs to retrieve 69 packages downloaded from PackMan and from the main SuSE repo. retrieve-pkgs uses the -b option to mksuserepo, which I've learned is poisonous. [Fixed.]
- Rebooted Oso. Fell on face. (Eventual conclusion: I'm not going to debug btrfs today, reverting to ext4.) Grub says: error: file '/@/.snapshots/1/snapshot/boot/grub2/i386-pc/normal.mod' not found.
  Response #1: Installer thinks you have EFI when you don't. Mount the root partition (on /mnt) and reinstall grub2. grub-install /dev/sda --root-directory=/mnt (needs grub2-install)
  Another forum post: Avoid installing Grub in a partition's boot sector because the partition's filesystem may move blocks around. If it's also on the MBR, it may be unable to read its core image. It's OK to have a dedicated partition for Grub, e.g. BIOS Boot Partition on GPT.
  Booted the rescue system. It took about 90 secs to download the initrd, then kexec-ed what it had gotten (didn't need manual kexec=1). Loading rescue system (from SuSE), very slow today. fsck doesn't work on btrfs, use btrfs check /dev/vda2. It took about 120 secs, half of it in checking extents (with no progress indication). No errors. Mounted. /.snapshots dir is empty. /boot/grub2/i386-pc/ exists but is empty. Did grub not get installed?
  grub2-2.02-3.1.x86_64 is installed but normal.mod is nowhere to be found. grub2-ie86-pc-2.02-3.1.x86_64 is installed and provides /usr/lib/grub2/i386-pc/normal.mod . grub2-install /dev/vda --root-directory=/mnt (it says, no error reported). Unmounting and rebooting.
  Well, that's a regression: grub says error: disk 'hd0,msdos2' not found. https://www.gnu.org/software/grub/manual/html_node/GRUB-only-offers-a-rescue-shell.html#GRUB-only-offers-a-rescue-shell Per set the prefix is (hd0,msdos2)/@/.snapshots/… (the path that failed), and root=hd0,msdos2 . ls reports no devices in existence.
- Forum post #1: the grub image lacked biosdisk driver/module. I never did get any explanation of why ls reports nothing.
- Conclusion: this is going to take some serious debugging, and I need security patches a lot more than I need btrfs. So I'm going to revert to ext4. First, can I rescue the data from the btrfs filesystem? If not, I'll have to reinstall from the beginning.
  losetup -f # Prints the name of an unused loop device
  losetup -P /dev/loop0 disc1.raw # Attaches the device to the file; -P = read partition table
  losetup -d /dev/loop0 # Tears down the loop device
  
  In parted, do unit B and print, and you will get the partition boundaries in bytes, for use with the -o option of losetup. Remember to remove the unit of 'B'.
  It looks like the rescue is succeeding. Later steps:
  - Toss /.snapshots (empty dir).
  - Fix /scr symlink to m1/scr (was /m1/scr).
  - Edit /etc/fstab removing btrfs subvolumes, and changing to mount by labels oso-root and oso-swap.
  - Make sure /etc/default/grub is correct, also /etc/default/grub.m4 .
  - Reinstall grub. This will have to be done with the rescue system not with Diamond's own discs.
  Booting the rescue system: I tried from Diamond, and it found no bootable kernel. Missing from the distro? I reverted to download.suse.org-oss.
  Somehow the backup GPT was corrupt (but the primary one was OK). Fixed with gdisk. But it still won't mount /dev/vda2. Rebooted the rescue system. Still won't mount /dev/vda2. I'm not going to be dragged into debugging this -- I'm going to trash the disc and start from scratch. This is going to replicate oso-pureuser.
Reinstallation: btrfs out, ext4 in.
- When starting the net installer, you need to preset two items: Source is http://diamond/SuSE/x86_64/99.8/ (no leading slash, yes trailing slash in the directory part. Then, kexec=1 as a boot option.
- $URL/content: invalid signature. Installation aborted. Duh, content.key is for the openSUSE Project. Need to replace it with the CFT public key. [Done.] May the fleas of a thousand camels infest someone's nether regions! It still refuses the repo. Reverting to the stupid download.opensuse.org-oss. Installer is running. [Update: the problem was that a required checksum, for packages.gz, was missing. If it's provided, a human has to give the password for the distro signing key, precluding fully automatic overnight updates. Hiss, boo! The change in mksuserepo was reverted so packages.gz is included.]
- Partitioning: It proposes to shrink the existing swap partition, plus BIOS Grub, another swap, btrfs root, xfs home. Wipe them all and give it: BIOS Grub (needs 1.0Mb, rounded to a cylinder boundary of 7.84Mb, do not format, do not mount), swap (2Gb), root (ext4, the rest).
- Computer role: Server, Text Mode. Local user: Create myself, so I can get in if sshd won't let root log in. Installation settings: Booter in MBR, not in root partition. Enable and open firewall for SSH.
- Software: AppArmor off, 64bit runtime on, practically nothing else. These were specially added: m4 expect pam_krb5 krb5-client. pam_ldap is not in the main SuSE repo but is in Couch-Build-tw. 1054 packages, 525Mb to download, installed 1.8Gb.
- Let her rip! Start 15:46, done 16:03 (17 mins), post install activities 2 mins more. It rebooted successfully, getty on vt1. ssh as root works; got the new host key.
- Making a backup of Oso's disc. disc1.textmode.
- post_jump -r 99.8 oso :
  - Phase 1, 2, 3 look OK.
  - Installing 338 missing keystone packages, 2057 total.
  - Removing 472 unwanted packages. It got the dependencies right the first time.
  - dist-upgrade: 59 packages to downgrade, 22 new, 19 to remove, 2 to reinstall and change vendor. Total 83 to install. The downgraded packages: spot checks show that the main repo (from which these were installed) has a newer version than the local mirror. [Moral: run updaterepo at the beginning of the day when you install Tumbleweed on machines.]
  - Complaint from postfix: These dirs in /etc/postfix/ are present but not in the package. postfix-files postfix-script post-install . It wants them removed. This message appears to be bogus; they are absent from postfix-2.11.6 (v42.1) and earlier, and are listed in postfix-3.2.2 . Ignore.
  - Re-install Grub: claims no error reported. (2 places)
  - Kerberos initialization: no /etc/krb5.conf, can't get host key. This seems to have self-healed.
  - Housekeeping: Maybe we should suppress checkdisc if POST_JUMP is set. [Done]
  - Reboot after post_jump:
    - Failed to start earlyhostname named display-manager cronj
    - Network was not up either.
- Getting Oso to run after post_jump:
  - systemctl status wicked says the network is up. The network interface is ens3 while the firewall looks for the trusted network on eth0. Got to create a udev rule in /etc/udev/rules.d/70-persistent-net.rules
    SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="52:54:00:09:c8:d4", ATTR{type}=="1", NAME="eth0"
  - The udev rule worked, the NIC ended up as eth0. But at 2.49 secs into the boot (just at the end of device driver loading), /var/log/debug shows this message: virtio_net virtio0 ens3: renamed from eth0. This has to be fixed.
  - /etc/systemd/network/70-rename-eth0 may have had no effect; the NIC was renamed to ens3 again at 2.55 secs. See systemd.link(5); it came in after systemd-210 and before systemd-233.
  - Suppressing persistent names with a kernel parameter of net.ifnames=0 . [I fixed this, but how? Find out. I remember using the kernel parameter by hand, but it's not on /proc/cmdline now, nor in /etc/default/grub. So I put it in grub.m4.]
  - earlyhostname problem: the specified hostname is invalid. Probably because it's empty, but I can't figure why. The unit was created for SuSE 12.1 when /etc/HOSTNAME had the FQDN, and it used sed to remove the realm, but in modern times /etc/hostname (lower case) has the 1-component name, so all that's necessary is to do hostname --file /etc/hostname, and it's fixed. This fix was propagated to /etc/systemd/system/earlyhostname.service on all hosts.
  - What's wrong with cronj: no Astro::Sunrise perl module. Found it, but installing it has to wait until the network is de-hosed. [Fixed, cronj runs.]
  - named's complaints: Can't load a shared library, possibly with id=gost (a dynamic crypto issue). Is it a chroot issue? One forum person copied /usr/pbi/bind-amd64/lib/engines/libgost.so to the chroot jail and that fixed it. Growl, a Russian engine that we don't have anyway. /usr/diklo/sbin/named-start copied /lib/engines and /lib64/engines which no longer exist. The new location is /usr/lib64/openssl-1_0_0/engines/ . Now named starts. Until we go to openssl-1_0_1, hiss, boo! Remember to extract the changes off Oso once the network is fixed. named-start now looks for the engines dynamically and the fix was installed on all hosts.
- I ran checkout.sh . Discrepancies:
  - Big ugly circular dependency in boot units: basic.target timers.target time-sync.target chrony.J.service network.target wicked.service firewall.J.service basic.target . Earlier units require later units. [Fixed by ignoring 618 ridiculous circular dependencies, most not involving my units.] This deserves a bug report. PROBLEM
  - Wicked: IPv6 is using a pool DHCP address plus RFC 4862 plus a random address (RFC 4941 privacy address). Need to (a) get rid of the lease [done], (b) suppress the random address AGAIN. /etc/sysctl.conf turns this off as it should (in key net.ipv6.conf.$IFC.use_tempaddr = 0 or file /proc/sys/net/ipv6/conf/$IFC/use_tempaddr) but something is configured to turn on the privacy thing periodically, almost certainly it's wicked. This deserves a bug report. PROBLEM
  - All tests from remote machines to Oso fail because of the IPv6 addressing. [Self healed.]
  - Dbus: could not connect, one of those failures when dbus is obviously running. [Fixed, problem was a race condition in the tester causing it to lose a message.]
  - alsasound is inactive, no sound on this machine. Maybe the tester should be a little smarter about this. [Smartened]
  - apache2 has no ServerName, would not start. Never set up because Oso is going to be reinstalled repeatedly. I did so, but this setup needs to be saved somewhere so it can be restored. [Saved.]
  - autovt@ confused the tester. [De-confused.]
  - bridge.J is disabled but is active (why?) Same for wakeup.J. [Fixed Scripts.pm, status was 'generated' not 'enabled'.]
  - krb-client.J says Oso is offline. Kerberos is hosed. [Fixed, the symlink to /etc/krb5.conf was missing but eventually got installed. Kerberos works now and the host keys were installed.]
  - rsyslog: Can't find the test message. Nasty! systemd-journald used to forward messages to syslog by default, but this was turned off in systemd-216. You need in /etc/systemd/journald.conf to explicitly set ForwardToSyslog=yes. To limit journal bloat, I also set MaxRetentionSec=14day and MaxFileSec=2day. This means to toss messages older than 14 days, and to rotate files every 2 days.
  - slpd: Took a long time to start, evidently it died. This screws up logrotate when it tries to HUP. strace didn't give obvious clues. Disabling slpd; try to fix this later. PROBLEM
  - display-manager is hosed. I'm using lightdm, and the symlink that picks the greeter (formerly lightdm-webkit-greeter-0.1.2) is dangling. The greeter is now called lightdm-webkit2-greeter-2.2.2 and it doesn't seem to call update-alternatives. I couldn't figure out the correct arguments for update-alternatives --install. I tried fixing the symlink by hand, but that just moved the failure elsewhere. Time to switch display managers once again, this time to mdm (Mint Display Manager). It's working enough that testing can continue, but needs customization. (See Customizing MDM.)
  - rpcbind.socket: netstat is not installed, tester failed. netstat is in package net-tools-deprecated. You're supposed to use ip instead. In my case fuser /run/rpcbind.sock was the better choice.
  - rpmconfigcheck is in a weird state. systemctl start rpmconfigcheck works fine, but it doesn't start at boot. See fix further on.
  - sshd says: key_load_public: invalid format. But the command does get run. In the initiator's public key file I had the limited execution stuff (command="uname -a" etc). This used to be OK, but now results in a warning. [Removed, and the tester was fixed to not need it.] (Why does the initiator need the public key file? To send the key hash so the server can find its copy of the public key, without having to multiply out the private key into the public key every time.)
  - These daemons are wanted but disabled: bridge.J console-kit-daemon wakeup.J . console-kit-daemon is gone. bridge.J wakeup.J aren't really disabled. [Fixed.]
  - These daemons are unwanted but are enabled: autovt@ [Added to scripts.dat.] And the tester prints the list with no indentation and no ending newline: messy output. [Output fixed.]
  Additional issues noted:
  - logrotate is not getting run! See /usr/lib/systemd/system/logrotate.timer and .service (disabled). [Fix: another script logrotate.J in /usr/diklo/lib/daily for v99.8 only.] [slpd refuses to start up and causes an error when it is hupped after rotating its log file.] [Disabling slpd.] PROBLEM
  - This message is logged frequently, at least once/minute: dns-resolver[4595]: ATTENTION: You have modified /etc/resolv.conf. Leaving it untouched... This appears to have self-healed.
  - Need to enforce update-alternatives choosing /usr/lib64/libopenblas_serial.so.0 for libblas libcblas liblapack libopenblas. I had this at Mathnet; not on CouchNet? PROBLEM
- More script and infrastructure changes:
  - Mount openSUSE-Tumbleweed-DVD-x86_64-Current.iso on /s1/SuSE/SuSE/x86_64/99.8 [Done.]
  - Edit the repo template for Couch-SuSE-tw to use the ISO image. [Done]
  - Rename the former Couch-SuSE-tw to Couch-Update-tw and make a new repo template for it. [Done]
  - Change updaterepo and retrieve-pkgs to retrieve into $su. [Done]
  - Change updaterepo to purge $su packages <= what's in $sd. [Done]
  - Change snarf-bs to download into Couch-Update-tw or Couch-Build-tw according to the URL. (Not done yet.) PROBLEM

Reconciling Configuration Files

Now that I have a working VM, it's time to get the configuration files into shape. The first is couchnet.sel, i.e. package selection. All references to v42.1 and v13.1 are to be tossed, except for historical notes. [Done.] I didn't do a package by package evaluation of which should be kept on the machine. Probably I should come back to do that, but there's no sign of radical reorganization, and I want to make progress so I can get my security patches installed.

After installing oso-pureuser I saved the pristine /etc. Comparing that with post_jump. Executing on Diamond.

diff -r -w /home/post_jump/99.8/etc /s1/scr/oso-setup-1706/oso-pureinst/etc >& oso.diff

There were 102 unequal files and 8689 lines of diff. Here are the highlights, omitting cases (like /etc/hosts) where the CouchNet version is obviously the better choice.

/etc/csh.cshrc has 300 lines of intricate stuff that we're replacing. Similarly 300 lines for csh.login, and 400 lines for /etc/profile. Plus particular packages drop code fragments in directories that they source. Someday I should very closely look at what I'm chopping out.
/etc/cups/cupsd.conf -- There is a possibly new stanza for <Policy authenticated> containing JobPrivateAccess default and similar. CouchNet is not paranoid about printing, and I think I can safely ignore this, since no activities are governed by this policy.
/etc/default/grub -- GRUB_DISTRIBUTOR provides the OS name for the boot selection menu. Starting (when?), if you leave GRUB_DISTRIBUTOR unset, the installer will get it from /etc/os-release. I changed to unset it, keeping all other CouchNet hacks.
/etc/logrotate.conf -- The nocompress parameter was bogusly not inherited in per-file stanzas, starting in v12.2 and continuing through Leap 42.1. This appears to be fixed in logrotate-3.11.0 (Tumbleweed 2017-07-23). All non-global nocompress parameters were removed. [The files are not being compressed, except for zypper.log which is supposed to be compressed.]
Checking in /etc/pam.d/common-session* -- Yes I have these:
- pam_loginuid.so records the UID of the logged in user, for auditd. Do not use for su; you want the real UID.
- pam_umask.so sets the umask for the session.
/etc/pam.d/useradd and usermod -- They have pam_permit.so for everything; I do a complete session. Which is right? What happened to userdel?
It's long past time to get rid of the fallback PAM directories. To remove: pam.d (symlink) pam.d.114/ pam.d.nok114/ pam.d.rpmorig/ [Done.]
Need to actually read the man page for ssh_config and sshd_config and update them as needed. The v42.1 file seems to work OK, though. Tumbleweed has openssh-7.2p2 (vs 6.6p1 in v42.1), so for KEX, curve25519-sha256@libssh.org can be used successfully (starting 6.7). [Updated.]
For the main cipher, I did some timing tests. With the AES crypto engine, aes256-gcm is equal in speed to aes128-gcm, and twice as fast ( and more secure) than aes256-ctr. aes256-gcm is 2.5x faster than chacha, but without the engine chacha is 2 to 2.5x faster than aes128-gcm. Virtual machines don't have access to the engine, but I'm not going to have a different version for them. So the order I've adopted is
- aes256-gcm@openssh.com
- aes128-gcm@openssh.com
- chacha20-poly1305@openssh.com
- aes256-ctr
- aes192-ctr
- aes128-ctr
- aes256-cbc
/etc/sysconfig -- diff doesn't work right on sysconfig files due to random ordering and wandering comments. Need to come back and systematically use confutil to compare them. Only 4 diffs. [Updated.]
/etc/sysctl.conf -- See sysctl.conf(5), sysctl.d(5) and sysctl(8) for overriding contingencies. Something is setting to use IPv6 RFC 4941 privacy addresses, and it ought to be turned on or off with sysctl, but I can't find it. (I want this off.) PROBLEM

Template for an Upgrade

Reinstallation again. We're ready to re-test upgrading from v42.1 (first) and installing from scratch. This procedure (most of it) is going to become the template for upgrading the production machines.

Preparing to Upgrade

Verify that I've copied all modified files from Oso to the post_jump area, and retrieve any forgotten ones. Execute on Diamond.
/home/post_jump/pushconfig -C oso
Shut down Oso and save a copy of its disc. 8Mb/sec (variable). 4.6Gb compressed, 17.2Gb uncompressed.
cd /s1/kvm/oso
gzip -c disc1.raw > disc1.pureu-final.gz
Restore Oso's saved image from v42.1.
zcat disc1.421final.gz > disc1.raw
Start up Oso. Wait for housekeeping tasks like rebuilding Diffie-Hellman groups.
Update configuration files. Several important ones were written or altered after Oso (v42.1) went into storage. It checks the version actually on the machine (42.1) but Oso is in hostgroup v99.8. Execute on Diamond and include the options to override the version.
/home/post_jump/pushconfig -r 42.1 -R -C oso
Run checkout.sh and make sure Oso (42.1) is in good condition.
Make sure Oso has a current list of installed packages: /srv/ftp/updaterepo.rpmqa This script runs periodically, but Oso (v42.1) went into storage before it was written…
/usr/diklo/lib/daily/rpmqa.J
Pre-download new files from the SuSE repo. It needs to know what's actually installed on the machines so as to avoid downloading all 38000 packages. It's better to do this when Tumbleweed is running, but v42.1 is OK and few packages will be missed; the installer will have to download those on the spot from the SuSE repo. Ignore complaints that a zillion v42.1 package versions are not in the v99.8 repos (duh). Execute on Diamond.
/s1/SuSE/bin/updaterepo -v

Upgrading the Target Machine

This was written from experience on Oso, but to make it generic I'm using $target for the target machine. Pre-set this variable.

Check available disc space in the root of the target machine. Typical file bloat (before removal of no longer wanted packages) is 1.7Gb. If you don't have enough, pre-delete cruft.
Edit diamond: /usr/diklo/lib/site_perl/hostgroup.db and change the target machine (Oso) to be in the new version (v99.8). Install at least on Diamond and the target (might as well install everywhere).
Stop and disable restarter.timer and cronj.service . You don't want things being (re)started while you're reinstalling them.
You need to install the Tumbleweed repos first using audit-repos. Execute on Diamond, requires -r.
audit-repos -v -i $target -r 99.8 -u -k
Let's do this with the instance of zypper dist-upgrade in audit-pkgs. Execute on the target machine.
audit-pkgs -v -r 99.8 -U -c -I |& tee $j/distu.$target

Took 49min. 2451 packages to upgrade, 215 to downgrade, 767 new, 9 to reinstall, 199 to remove, 66 to change vendor, 7 to change arch. 3442 total packages, 3576 installation steps. Looks like it was uneventful except for non-threatening complaints that update-alternatives had to rebuild broken groups that included removed packages. Good news: it did mkinitrd only once, at the very end. (When I upgraded Orion, it took about 1/2 hour to run %post scripts. Don't panic.)
Don't reboot yet. You don't have the kernel command line parameter (net.ifnames=0) to keep the NIC on eth0 (vs. ens3).
Post_jump: You need to override the version allegedly on the machine. The package deletion step takes extra time because it has to index package requirements. Don't panic. Execute on Diamond:
post_jump -r 99.8 $target |& tee $j/jump.$target

Took 9min. Discrepancies found:
- World writable files: /srv/www/roundcubemail-fixed1/plugins/managesieve/vendor/bin/composer-php{,.bat} [Fixed]
- Hostgroup.pm line 367 to 398: $head is undef. Why? [Fixed I hope]
- gtk2-immodule-inuktitut-2.24.31-3.1 was removed. Not politically correct.
- 615 packages were removed.
- Dist upgrade (in lieu of installing patches): 2 packages to upgrade, 3 new, 47 to reinstall, 3 to remove, 47 to change vendor, 52 total packages.
- Grub was reinstalled successfully.
- Reenabling Postfix (LSB): /etc/init.d/postfix,start=3,5 doesn't exist at /sbin/insserv line 246. This used to be the syntax to enable a script in specific runlevels. I should create a systemd unit for Postfix.
- It tried to reenable named as LSB when it HAS a systemd unit that I wrote.
- Is there a problem with /etc/openldap/ldap.conf ? Passes functional test.
These packages were found by capabilities (change couchnet.sel to want the providing package) [Done]
- python2-qt4 providing python-qt4
- python3-dbus-python providing dbus-1-python3
- python2-pycurl providing python-pycurl
- gpg2 providing dirmngr
- hexchat providing xchat
- openssl-1_1_0-doc providing openssl-doc (maybe don't change)
These wanted packages could not be installed:
- ConsoleKit (not really gone, need to find)
- ConsoleKit-x11 (not really gone, need to find)
- chromium-desktop-gnome (gone)
- gimp-help-browser (gone)
- krb5-appl-clients (gone)
- krb5-appl-servers (gone)
- libreoffice-templates-en (Gone, copied from v42.1)
- python-gstreamer-0_10 (Found python-gstreamer v1.2.1)
- susehelp_en (Found)
Reboot Oso now.
Assuming previous steps did not fall on their face, check out everything. It takes 1 to 2 minutes because it has to wait for at jobs to reach their scheduled runtime.
checkout.sh > /tmp/check.out

Discrepancies:
- named is dead. GOST crypto engine not found. Botched creation of chroot jail. But I can't see what the error is. A Russian crypto engine that I never use -- hiss, boo! I suspect that it's just chance that it's loaded first so the error message implicates that one.
  Troubleshooting step #1: Turn off NAMED_RUN_CHROOTED="no". named can start up now. It has mapped /usr/lib64/engines-1.0/libgost.so (which was not in the jail). Very recently this library was in /usr/lib64/openssl-1_0_0/engines/libgost.so . I'll bet there were complaints by jail builders and the engines were moved to a non-version-dependent directory which my script didn't recognize. Improved the script, named starts now.
- rpmconfigcheck is inactive. Looks like it was never started. It has an [Install] section making it WantedBy default.target. As recommended in man systemd.special(8), there is a symlink /usr/lib/systemd/system/default.target -> graphical.target . Evidently /etc/systemd/system/default.target.wants/ is completely ignored, so anything WantedBy default.target is not going to start. rng-tools.service has a similar WantedBy (which is a hack that I put in wrongly). How do you decide between being WantedBy multi-user.target vs. graphical.target? Your [Install] section needs both of them, which is a bad workaround. Bug report here.
  OK, it got run, and there are some rpmsave/rpmnew files that post_jump did not find and remove. Removed by hand.
Comparing v42.1 and Tumbleweed, when the VM is idle the load on the host is very low for v42.1, but CPU is about 15% for Tumbleweed. This should not happen. Investigate. [See results below in CPU Usage with KVM.]

Installing from Scratch on Oso

If this is actually an existing machine (e.g. a VM for testing, like Oso), back it up in the normal way so you can restore the SSH keys, Apache configuration and home page, etc.
Following the Preparing to Upgrade steps, verify that altered files have been copied to the post_jump area, shut down Oso, and save a copy of its disc under the name disc1.upgrd.gz .
Shall I truly wipe the disc, or use the existing partitions and tell the installer to format them (destroying the contents)? Let's take the time for a complete wipe. Took 4 minutes.
dd if=/dev/zero of=disc1.raw bs=1M count=16384
Make a symlink from oso-cd.iso to the net installer, which is at /s1/SuSE/SuSE-build/x86_64/99.8/iso/openSUSE-Tumbleweed-NET-x86_64-Current.iso
Start it up.
- (On Xena as user:) virt-viewer -c qemu+ssh://root@diamond/system -w -r oso
- (On Diamond:) virsh start oso
- (On Oso via virt-viewer:) F4 Source, HTTP (press enter), Server = 192.9.200.194, Directory = SuSE/x86_64/99.8/ (no leading slash, yes trailing slash).
- Down arrow to Installation, but don't press enter yet.
- kexec=1 net.ifnames=0 (just type both, they appear in Boot Options)
- Now press enter.
- Hit Esc when the Plymouth progress bar covers the screen.
- It downloads the kernel and initrd, and reboots.
- It downloads the installer image, and the installer starts running.
Installer items.
- Agree to the license. Oops, I got it into Nynorsk. Try again.
- Partitions: Expert Partitioner. This time it decided on a MSDOS partition table. The BIOS Grub partition type is not available for MSDOS. How to get a GPT on it: Select Hard Discs in the left panel. Delete all proposed partitions. Select the raw device. Hit Edit, Expert, New Partition Table, GPT, OK.
- Now add these partitions in order:
  - 30 MiB, Data, Do not format, type BIOS Grub. Must be first.
  - 1.0 GiB, Swap, (fstab options) Mount by label oso-swap.
  - The rest (pre-filled), Operating System, ext4, on root, label oso-root.
- Set the timezone. Role is Custom. Software: toss AppArmor, add Console Tools, XFCE, no further additions or deletions. Special items: click on Details, Search, check for: rsync (got), m4 (NEED), expect (NEED), pam-modules (got), pam_krb5 (NEED), pam_ldap (missing), pam_ssh (NEED), krb-client (missing). The missing items will be installed later from SBS.
- Local User (set it just in case) and Root Password.
- Installation Settings:
  - Booter in MBR and not in root partition (change both).
  - SSH port open and SSH enabled (change both).
  - There's got to be a place for extra kernel command line args. Append net.ifnames=0 . But I can't find where to set it.
  - Hit Install.
- About 1730 packages to install. Very fast (because Diamond has the distro and is the host for Oso); sneeze and you'll miss it. Actually about 20min.
- When rebooting, add to the kernel command line: net.ifnames=0 (type 'e' and it will give you an editor). Yes, that got it onto eth0.
- But the installer thought it was going to use ens3. In /etc/sysconfig/network , mv ifcfg-ens3 ifcfg-eth0 ; systemctl reload wicked
- Bingo, it's on the net (IPv4 only).
Oso is an old machine and has various files backed up which a completely new machine would not have. It will really help to restore important backed-up files before running post_jump. (On the first attempt I only restored the SSH host keys.)
- On Diamond, clear Oso's key(s) out of /root/.ssh/known_hosts .
- Try: ssh oso uname -a ; and when it asks are you sure you want to continue connecting, tell it yes. (Or get rid of forgotten keys and try again.) Root password is required on this and following steps, and it works (before CouchNet's sshd_config is installed by post_jump).
- rsync -a /home/backup/oso/etc/ssh/ssh_host_* oso:/etc/ssh/
- rsync -a /home/backup/oso/root/.ssh oso:/root/ #(to get authorized_keys)
- ssh oso systemctl reload sshd
- Once again, clear Oso's key(s) out of diamond:/root/.ssh/known_hosts.
- Restore the old keys to /root/.ssh/known_hosts, e.g. by doing ssh oso uname -a. Each instance is related but different.
- On other hosts too, if it could do ssh to Oso before the reinstallation, it can do SSH now.
- Given that there is a backed up copy of /etc/udev/rules.d/70-persistent-net.rules , it should be copied onto Oso at this point. For reference, this is the content (all on one line) (replace with the actual MAC address):
  SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="52:54:00:09:c8:d4", ATTR{type}=="1", NAME="eth0"
- Restore the SSL host certificate:
  rsync -a /home/backup/oso/etc/ssl/{private,hostcerts} oso:/etc/ssl/
- Restore the Kerberos host keys:
  rsync -a /home/backup/oso/etc/krb5 oso:/etc/
- Restore the Apache configuration and htdocs.
  rsync -a /home/backup/oso/etc/apache2 oso:/etc
  rsync -a /home/backup/oso/etc/sysconfig/apache2 oso:/home/backup/oso/etc/sysconfig/
  rsync -a /home/backup/oso/home/httpd oso:/home/
- Edit /etc/default/grub adding net.ifnames=0 to GRUB_CMDLINE_LINUX and run grub2-mkconfig -o /boot/grub2/grub.cfg.
Post_jump.
- Command line (on Diamond): post_jump -r 99.8 oso |& tee $j/jump.oso
- Missing keys were installed: SuSE Build Service, CFT distro, PackMan.
- pam_ldap was installed (from SBS).
- World writable: /etc/machine-id . Needs bug report. So what do I tell them? Making the report would be very hard, because I have to point the finger of blame and I have no idea how this happened. PROBLEM
- Installing 1827 packages.
- Oops, after installing 1199/1827 packages it hung, net is up, but ssh is hung, and via virt-viewer, the console has a blank screen. Power cycling Oso and running post_jump again. This is supposed to work without damaging the installation. Beware of version skew. While I'm at it, I'm switching the symlink for the CD drive back to /dev/null. virsh reboot oso did nothing. virsh destroy oso destroyed it.
- It continued with post_jump. Package deletion phase was botched. Dist-upgrade of 775 packages.
- Installing Kerberos host keys failed. Jacinth is the Kerberos master and this step should be done there. (Worked.)
- Wait patiently for ssh-keygen to produce the custom Diffie-Hellman groups.
- Update /root/.ssh/known_hosts: it's missing and the script has a compulsion to not create it. PROBLEM
- Finished. 1 hour elapsed, including futile waiting when the machine hung.
Reboot and run post_jump again. If I had restored backed-up files and done the other items in that step, networking would have come up OK. But I didn't.
- Not a good sign: it's off the net. But many services seem to have come up. Connecting with virt-viewer. Firewall is bitching about UDP packets from/to dhcpv6-client and server. Idiot thing, it's using ens3 as the network interface, no wonder nothing works.
- The important fix is to add net.ifnames=0 to the kernel command line. Added to /etc/default/grub in the post_jump area, should be turned on by /usr/diklo/lib/daily/grubdflt.J which is run as a late step in post_jump. You also need to rename/copy /etc/sysconfig/network/ifcfg-ens3 to ifcfg-eth0. Now how are we going to automate that, since the bus name is static per machine, but not predictable from one machine to the next?
- Running post_jump again.
- Installing 617 packages missing from last time.
- Removing 728 packages.
- 82 packages to upgrade.
- post_jump took 17min this time.
- Rebooting. Something takes much longer than normal to go down. It took 3.5min to halt and finally reboot. Let it run.
Restore these items from backup (or create them for a totally new machine): The second time through, I did these already.
- Already restored the SSH host keys.
- Host certificate in /etc/ssl/private and hostcerts (and restart services that use it).
- Apache configuration files and /home/httpd (and restart Apache).
Running checkout.sh. Discrepancies:
- apache2 is running, but the saved configuration files need to be restored. Done, but SSLSessionCache is missing.
- rpmconfigcheck: /etc/pam.d/racoon.rpmorig is still there.
- sshd: output mangled, included added oso to known_hosts. Not really a failure.
- vsftpd.socket: missing host cert. Need to restore from backup.
- wakeup.J and bridge.J: wanted but inactive.
- Since I'm going to reinstall Oso immediately, I'm not going to fix any of these.
Running checkout.sh after the second installation. Discrepancies:
- Wicked is fixated on an aleatory IPv6 address, but the RFC 4862 address has a PTR in DNS (dnsmasq on Jacinth), and this is enough to satisfy the tester. I tried and failed to get rid of the lease.
- wickedd-dhcp6 didn't get the host's assigned IPv6 addr. Counts as a failure.
- apache2: both URLs are said to be offline. Ping now auto-switches between IPv6 and IPv4, and the host's IPv6 address is hosed, causing the offline business. [Tester was improved.] And /etc/sysconfig/apache2 did not include -SSH. All fixed, Apache works now.
- Micro-emacs is hosed, says Incomplete termcap entry. Probably there was an incomplete termcap entry. This self-healed, probably when a required package was dragged in by something else.
- Except for the IPv6 lease, Oso is in good shape. Except for using too much host CPU when idle.
- MDM still needs to be customized. (See Customizing MDM below.)
I retrieved downloaded packages: /s1/SuSE/bin/retrieve-pkgs -v -w (-w means to list the packages being retrieved.)

Overcoming Problems

Several problems were encountered during the various installations and upgrades, and their solutions turned out to be too big to include inline with the installation sections.

Customizing MDM (Mint Display Manager)

The display manager is the program that starts the X-Windows graphics server, puts up a window to solicit the user's loginID and password, and starts the user's session with his favorite desktop environment. It seems that on every operating system upgrade the previous display manager breaks and I have to find a new one. With v42.1 I was using lightdm, and a custom greeter using the Webkit engine, which basically means Javascript. It looked nice, but unfortunately it was unreliable, and after the upgrade to Tumbleweed, important symlinks set by update-alternatives were messed up, and I was unable to fix them by hand and get a working greeter.

For my new display manager I'm using MDM: Mint Display Manager. See this Wikipedia article about Linux Mint. It has a non-tree-form ancestor relation with Ubuntu and Debian; I am not tempted to investigate Mint to replace OpenSuSE. However, the requirements for MDM itself are not onerous; all the required libraries are clearly relevant to MDM's mission, and it doesn't drag in the whole Gnome infrastructure. It is themeable and the syntax and semantics of the theme file follow that for GDM (what version?) Basically you are arranging and specifying properties for GTK widgets (what version, 2 I think). Unfortunately Glade cannot handle the file format.

CouchNet Theme for MDM

Greeter design: Each of my machines has a background image that is used both for the greeter and for the webserver's front page. Generally photos like this have visual interest in the center, and therefore I try to move the greeter window into a corner, normally northeast. A black background for this window works out best when contrasted with the various colored images. These modifications are surprisingly hard for some greeters. As a fallback I have a design using XDM, which is functional but doesn't look completely professional.

The theme file is partially documented in this page from the GDM Reference Manual (what version?) Graphical Greeter Themes writeup. Surprisingly, a Google search does not reveal any other home for this page and it doesn't look like an official publication point for GDM. You will need to use your hacking ability to figure out some of the necessary properties and container classes.

The mdmsetup utility is mainly for picking a theme and setting most commonly used options like timed login. It isn't for editing the theme definition file.

The theme needs to be installed in /usr/share/mdm/themes/$ITS_NAME/ The files are:

GdmGreeterTheme.desktop -- A pretty standard XDG-type desktop file. Required, but you don't have to have as many translations as Gnome does.
background.jpeg -- The filename is hardwired in the XML, and a generic symlink to the real background, such as background.jpeg, is recommended. Pixmaps are supported, of which jpeg and png are the most practical these days, and a separate widget can show SVG.
screenshot.png -- Optional but recommended; the thumbnails are shown by mdmsetup so you know what theme you want. It should be a screenshot of the greeter, 200x150px in size. All the provided ones are PNG. I don't know if PNG is required. Suggested command line: use virt-viewer to connect to the guest, or vncviewer to a host (assuming one has VNC service), then run this command line and click on the viewer window. It's using ImageMagick's convert to compress the image.
xwd | convert xwd:- -resize 200x150 screenshot.png
couchnet.xml -- This file is named in GdmGreeterTheme.desktop . I'm not going to give a complete tutorial on how to edit this file, but I copied ../circles/circles.xml and hacked.

Session Configuration for MDM

I have several issues which I'm tentatively blaming on MDM:

The greeter has action items for shutdown, reboot and suspend, which do nothing.
When the greeter is soliciting the password, another app was caught stealing focus (and a few bytes of the password).
Frequently but not always, the greeter starts out asking for the password.
When I log in, pam_ssh.so fails to start ssh-agent and to load it with keys. (ssh-agent is started later by startxfce, but of course is empty.) However, pam_krb5.so gets my Kerberos credential. I wish I had positive proof that PAM is being used, and if so, what generic stack is used.
Neither my own ~/.xkeyboard nor the generic /etc/X11/xdm/xkeyboard (which is actually identical except for comments) is being executed at login.

/etc/mdm/custom.conf and friends have not been hacked, and very likely this is the cause of many of the missing features. See /usr/share/mdm/defaults.conf for the pre-customization state. There are 68 default settings, 47 to "". Here is where it stands with these items:

No Shutdown Action

In /usr/share/mdm/defaults.conf, RebootCommand and friends are set to (e.g.) sbin/shutdown -r now "Rebooted via mdm.". While this is correct with SysVinit, Tumbleweed has systemd-sysvinit-233 which makes /sbin/shutdown a symlink to /usr/bin/systemctl, which may have responded to SysVinit options at one time, but does not do so now. I replaced the command line with /usr/bin/systemctl reboot --message="User reboot via MDM" and the shutdown or reboot now happens, including the message. Remember that HaltCommand should request poweroff, meaning to turn off power in the machine, versus halt, which means to halt (with power still on). Remember that /usr/share/mdm/distro.conf is the correct place for distro customization. Bug report here.

No Keyboard Grab for Password

There does not seem to be any configuration parameter affecting the keyboard grab. Bug report here.

Starts Asking for Password

That's not a bug, that's a feature! By default, the last user who logged in is preselected. However, suppose you're a different user; what's a good way to change to your loginID other than giving a garbage password? Hit the Start Over button. Except it does not start over. Bug report here.

No ssh-agent

Well, isn't that cute! For MDM we're using the generic PAM stack that comes with the package, and jimc's idea of setting up a full login session, like pam_ssh.so, is not happening. Fixed with a symlink to the full login stack.

No xkeyboard Setup

BaseXsession is set by default to /etc/mdm/Xsession. When I changed to my hacked /etc/X11/xdm/Xsession, all my normal session features returned, including configuring the keyboard.

Complaint about +extension XEVIE

defaults.conf adds this extension to the standard X-server command line, but /var/log/Xorg.0.log has a complaint that it is unsupported. It turns out that this is the X Event Interception Extension, for intercepting all keyboard and mouse activity, it is deprecated, and it was removed in X.Org-7.5 (we have 7.6.1). The distro should use distro.conf to make it vanish. Bug report here.

Is VNC Working?

No, and it's not going to, when MDM is involved. Unfortunately, XDMCP (and a lot of other stuff) has been decommitted in MDM (2013-09-25 by Clem). See the discussion below of VNC.

Diarrhea on the Log File

Error message:

GLib-CRITICAL: g_key_file_get_value: assertion 'key_file != NULL' failed

When MDM starts up it spews out 110 instances of this in syslog (including on all logged-in SSH sessions, due to the critical level). It also puts out two or three each time the greeter starts, i.e. when a user logs out and back in again.

The cure: touch /usr/share/mdm/distro.conf . Bug report here.

CPU Usage with KVM

It's pretty clear that when I upgrade a VM, even though the guest is completely idle, there is considerable CPU load on the host, at least twice what it was with v42.1 on the guest. For example, 2 snapshots:

Oso (guest with Tumbleweed): Load 0.22, mdmgreeter 0.3% CPU, top 0.3% CPU, kworker 0.3% CPU, idle time 99.3% (of 1 CPU).
Diamond (host with v42.1): Load 0.13, qemu 23.3%, firefox 0.7%, 4 others 0.3% each, idle time 96.9% (of 1 CPU).
Oso (guest): Load 0.0, top 0.7% CPU, X 0.3%, mdmgreeter 0.3%, idle 99.3%.
Diamond (host): Load 0.24, qemu 23.5% CPU, 7x 0.3% CPU (including Firefox), 94.8% idle.

I went through all the modules (device drivers) on the guest and unloaded them one by one. Bingo: when I unloaded uhci_hcd, the host's CPU dropped from 15-25% down to 1-2%.

On none of the virtual machines do I pass through USB devices from the host, so I'm blacklisting the driver. I would have expected that the right way to do so would be a blacklist command in /etc/modprobe.d/99-local.conf, but it was ineffective. install uhci_hcd /bin/true was also ineffective (and also deprecated). I ended up creating a systemd unit stomp-usb, which runs late, i.e. after graphical.target or multi-user.target, and which explicitly unloads the module. That took care of it.

ehci_hcd is also loaded. There is no evidence that it causes elevated CPU on the host, but per guilt by association I'm similarly blacklisting it.

The host's CPU load is inversely proportional to the intrinsic CPU power of the host. Which is the least on Jacinth, intermediate on Diamond, and greatest on Xena. The CPU load on these hosts (with and without uhci_hcd) is:

Host	With uhci_hcd	Without uhci_hcd
Jacinth	2.0%	1.6%
Diamond	15-25%	1-2%
Xena	1.3%	1.0-1.3%

On Jacinth with v42.1 and Orion with v42.1, and with uhci_hcd loaded on Orion, the CPU usage was 2.0%, i.e. I'm not seeing excessive CPU utilization. But when I unload uhci_hcd the CPU goes down by about 0.4%, so there is CPU utilization attributable to uhci_hcd, just not excessive. This host's processor is a AMD G-T40E @1.0GHz, while Diamond and Xena have Intel Core i7 and i5.

LDAP (slapd) was Hosed [Fixed]

This problem turned out to have multiple parts. My slapd is configured out of the database, the modern recommendation. /etc/systemd/system/ldap.service starts slapd with -F /var/lib/ldap/slapd.d which contains cn\=config.ldif and dependent objects (files). All configuration filenames below will be relative to this directory. Filenames must be quoted because they contain shell-active characters. I tried restoring everything from backups, but the operational instance and the backups shared the same problems (so I reverted to the operational instance).

back_hdb.la Not Found

The main database uses the HDB backend. The symptom was that both slapadd and slapd complained: lt_dlopenext failed: (back_hdb.la) file not found. The file exists: /usr/lib64/openldap/back_hdb.la from openldap2-2.4.45-24.1.x86_64. strace revealed that it looked for /usr/lib/openldap/modules/back_hdb.la followed by some generic and useless directories.

Growl! ./cn=config/cn=module{0}.ldif contains olcModulePath: /usr/lib/openldap/modules which was correct in v42.1 but no longer. I replaced it with /usr/lib64/openldap and on the v42.1 machines I made a symlink: ln -s ../lib/openldap/modules /usr/lib64/openldap, which will be overwritten when Tumbleweed is installed. This problem is fixed; now we proceed to the next one.

Not so fast! Zypper honors the symlink when installing, rather than overwriting it with a directory, so it's best to just not make it, and to do this fixup procedure after upgrading each directory server.

What a Mess!

First rule of troubleshooting: don't change anything not directly related to the current symptom being fixed. But /var/lib/ldap is such a mess, with the main database files strewn around. I created a new directory ./couchnet.db , mode 750 owned by ldap:ldap, and I moved all the database files into it, so now /var/lib/ldap contains just ./couchnet.db, ./slapd.d, and ./sync.d (for housekeeping to import changed flat files like /etc/passwd). And I edited ./cn=config/olcDatabase={1}hdb.ldif , olcDbDirectory: /var/lib/ldap/couchnet.d .

Configuration Error in Sync Overlay

Now, upon starting, slapd complains config error processing olcOverlay={0}syncprov,olcDatabase={1}hdb,cn=config. After I re-read my writeup on Setting Up LDAP plus section 18.3.3 of the OpenLDAP 2.4 Administrator's Guide (about multi-master replication), I had a brainwave: suppose openldap2-2.4.41 either has replication compiled into the program or knows where to find the module, but openldap2-2.4.45 needs an explicit olcModuleLoad: syncprov.la. (prepend {3} or whatever to make a unique list index.) This is in ./cn=config/cn=module{0}.ldif . Sure enough, when I loaded the module explicitly, slapd started working.

Checkout

I tried this command line:

ldapsearch -x -H ldap://xena.cft.ca.us -b uid=jimc,ou=People,dc=cft,dc=ca,dc=us -LLL

And it returned my People page, with or without -ZZ (use TLS), except omitting the password because I omitted authentication. A good sign. I have a utility ldapsync -v -n to sync flat files like /etc/passwd . It said that it wanted to modify the passwd and group rows for lxdm (yes, it's different in Tumbleweed) and add a row for mdm (yes, it's new in my Tumbleweed package selection), but all the other data is identical. So LDAP is entirely operational.

Supporting VNC

A VNC (Virtual Network Computing) viewer and server use the RFB (Remote Frame Buffer) protocol to transfer graphical data from the server to the viewer, where the user can view it, and to transfer keyboard and pointer (mouse) events from the viewer to the server. On Unix the server imitates a X-Windows server by making a framebuffer available to server-side programs. This can be the physical framebuffer or a virtual one. In either case the server-side programs use normal X-Windows libraries to draw on the framebuffer, the same as they would for a user physically present at the physical display. If the server is configured to allow multiple connections to the same framebuffer, e.g. if a user is logged in to the physical display and a VNC connection to it is allowed at the same time, authentication and access control are important. But if sharing the framebuffer is forbidden, the normal paradigm is that the server's display manager puts a greeter on the framebuffer, which takes care of authentication and authorization, same as on the physical display. Then the user's preferred desktop environment is started. The RFB protocol is not intrinsically encrypted, and in most cases a separate secure tunnel is used for the data transfer.

In my environment, VNC is used for developing and testing the display manager and the desktop environment sessions; it is rare for a user to have a desktop environment on the viewer host and to do real work in a separate desktop environment on the server. But rarely does not mean never. Even so, an important goal is that VNC support should be unobtrusive and light on resources. The socket activation paradigm of systemd is ideal: systemd on the server listens for a connection, then starts the VNC server, which exits when the session is over. Xinetd is an obsolete alternative. It's overkill for me to start a VNC server at boot time.

The default connect port for VNC is 5900/tcp, corresponding to display :0. However, the qemu virtual machine emulator uses 5900 by default to send out the VM's physical display over VNC, so I plan to use 5901 to serve a virtual framebuffer on the host. The VNC server can also serve to a (web?) browser on 5800, or can call back to the viewer on 5500, but I don't use either of these features.

The normal paradigm is that the Xvnc server (package xorg-x11-Xvnc-1.8.0) contacts a XDMCP-capable display manager on localhost via port 177/udp. They negotiate a session cookie, and the display manager puts a greeter window on the virtual framebuffer, and after authentication spawns the user's session. XDM, GDM and LightDM are XDMCP-capable. But unfortunately XDMCP (and a lot of other stuff) has been decommitted in MDM (2013-09-25 by Clem).

So how am I going to get a greeter going?

xdmcpc or PXDMCP by Peter Åstrand, original (?) author Peter Eriksson (2005). This is actually an alternative to the -query option of the X-server; in other words it tells a XDMCP-capable display manager to put a greeter and user session on the display indicated on the client's command line. So it won't help me.
rxmgr ; this is proprietary from Attachmate. Not quite sure what it actually does.
Neither of these programs is going to help me.

How's this for a kludge: MDM will constitutively run what it thinks is an X-server on display #1, but it is really a script which waits forever. The [daemon] MdmXserverTimeout parameter will have to be set to the duration of forever. Systemd will wait for a connection to the VNC port. When a connection is made, the socket (on stdin) will magically be transferred to the waiting script and it will be released and will run Xvnc.

This is going to take some work to set up, so I'm going to defer it until it's actually needed. PROBLEM

What To Do About MythTV

Until recently, Iris' main role was to run MythTV to record TV shows. However, the source for my wife's preferred shows has dried up, and most likely we will rely on Internet streaming from now on. This section probably ought to be moved to a separate document, but for now it stays as part of the upgrade history. We have been using MythTV and like it, but PHP has grown to PHP7, whereas MythTV still uses PHP5, and installing it makes a mess.

Other packages also use PHP5 and not PHP7: Roundcubemail and ownCloud. The solution was to evict the php7 package, zypper addlock php7, and then reinstall php5 and the needed extension modules.

Goals for the home theater machine and software:

It needs a user interface that is easy for a non-geek to use. And the system administrator needs a reasonable setup tool too.
Its interaction with the operating system has to be reasonably sanitary. In particular, I prefer to be able to advance to PHP-7. [Not going to happen.] Windows software is excluded.
I would prefer if Iris could sleep (S3 state). In the past this was impossible due to bugs in the video capture device driver. But that is not under the control of the home theater software.
It has to record programs reliably and without a lot of handholding.
It needs to show these media types:
- Over-the-air HDTV. Not distinguishing realtime vs. recorded programs; MythTV records a realtime show in a temporary file and then plays back that file, potentially right up to the point that it's currently writing.
- DVDs. We have not used this feature much due to setup and user training limitations.
- Streaming media. We have never used this, though MythTV is able to do it, but we're going to have to do so now.
- Archival shows in reasonable formats (MPEG2 and 4), e.g. from Olympic Games. A decent indexer would be much appreciated. I had great trouble getting shows indexed in MythTV, which made it very hard to view the programs.
It goes without saying that the software has to be open source.

What's currently available? Here are some recent 10 best lists.

Top 10 Best HTPC Software for Your TV by Tuukka (updated 2017-07-01). I think these are in order of the author's preference. I've excluded non-Linux software like Windows Media Center.
- Kodi. Formerly called XBMC. …is clearly the best, but it is not the easiest to set up to make it user-friendly. The provided skin is elegant, and others are available. Link to (his?) customization guide.
- Plex. Its backend is compatible with a variety of frontends including Kodi (and of course its own frontend). Its strength is in finding movie and music metadata. It can transcode. It can record and display live TV. Neat feature: you can pause on one device and resume viewing on any other. Link to setup guide.
- Emby. Never takes more than a few clicks to find the latest show. You can set up a custom view for each user with individual view points. You can pause, then resume on a different device. Does live TV. It has a module for Kodi; I assume this means that Kodi can read files on the Emby backend.
- MythTV is a great program for the advanced user. Unfortunately, MythTV development has been discontinued. Jimc research: PackMan just got an update to mythtv-backend-0.28.1+git.20170712.eef6a480b0, note the date. Not clear what may have been discontinued.
5 Best Home Theatre and Media Center Software (Published by Ashutosh KS, in Internet, Google says date is 2016-09-06.)
- Kodi. Pro: lots of codecs; streams from popular sources like Spotify, Pandora, Youtube; access to a big library including metadata; various add-ons. Con: Lots of options and add-ons, which take time to configure. More complex than others. Fullscreen is required (no fractional windows). (Commenter says, alt-enter pops into a fractional window.)
- MediaPortal. Forked from Kodi. Exclusively for Windows.
- Plex. He describes some of the features as being premium, which suggests to jimc that Plex is not exactly open source.
- Emby. Pro: Lots of codecs. Record live streams and OTA for later viewing. Lots of plugins and mobile apps. Con: He says it's less user-friendly than Kodi or Plex.
- MythTV. Pro: Lots of features and add-ons. Lots of codecs. Con: Not as good a UI as others.
- Conclusion: He likes Kodi and Plex best.
Home Theater Software Showdown: Kodi vs Plex by Eric Ravenscraft, 2015-12-06 (or 2015-06-12) on Lifehacker.
- Kodi (formerly XBMC) has a long history of development. Open source.
- Plex is a fork of Kodi but has evolved in the direction of a simple UI.
- Plex has a media storage service (costs money). Both of them can store media on your own server.
- Kodi doesn't transcode; Plex does.
- With the backend and frontend on the same host, Kodi setup is a piece of cake. Trans-net file sharing takes more work and trips up some users. On the other hand, Plex's cloud server (costs money) handles communication between multiple servers and players and is very easy to set up.
- Plex has implementations for a zillion platforms. But Linux is not listed. Kodi has fewer platforms, not listing Linux either, even though we know it runs on Linux.
- Kodi is skinnable and the UI can be tweaked. Plex is a lot less flexible in this regard.
- Similarly, Kodi has a ton of add-ons, while Plex has a lot fewer.
- Kodi is FOSS (free open source software). Plex is mostly free. Viewers for iOS, Android and a few others cost money. The cloud server requires a subscription.
- Jimc's conclusion: Kodi is where it's at.

Kodi home site.
The issuer of the license (which one?) for Kodi is the XBMC Foundation. XBMC means XBox Media Center; XBox was its first architecture.
Current version (as of 2017-08-23) is 17.4 Krypton.
Kodi is not on SuSE.
Kodi is on PackMan, version 17.3 uploaded 2017-08-17. Page says last update was in 2015 but I think this refers to other than the RPMs.

Diversion for CouchNet System Administration

I had to take some time out to deal with a few problems on the non-upgraded machines on CouchNet.

My wild-side host certificate is about to expire. I opened this can of worms and out crawled out a very gross Taenia saginata. I have used StartCom for several years, but they have been bought out by WoSign.com (a Hong Kong CA) and relocated from Israel to Spain. Both WoSign and StartCom have been blacklisted by Mozilla and Google for fatally sloppy operating practices (at the very least). So I dropped them like a blighted potato.
I signed up with Let's Encrypt, a free service recently (2016) founded by the Electronic Frontier Foundation, Mozilla Foundation, and University of Michigan, and with additional well-known supporting sponsors. Both their business model and their administrative procedures are unique, and it took some work to integrate the new provider with my setup, which assumes long-lived certs. Anyway, the job is done and most services seem happy.
OpenVPN had issues with the new certificates, mainly in getting the correct intermediate and root certificate into the configuration files. Now the tester, the client and the server are all using the same certs, and it seems to be happy.
Tigase (a XMPP server) is not so happy. Coincident with the advent of the new host certificate, it started complaining about a format error in the certificate and getting a segfault. I haven't resolved this satisfactorily, and am temporarily omitting XMPP service in order to concentrate on the upgrade. PROBLEM
Alsasound.service and/or alsa-restore.service have become confused on Oso and I need to do some research to find out which I really should be using. alsa-restore appears to be the primary service and alsasound is just an alias, so it is bypassed in scripts.dat.
chkstat (fixes the mode and owner of system files) exuded this error message: /var/lib/named/dev/random: don't know what to do with that type of file. This turns out to be bogus. Bug report here.

Upgrading Production Machines

Order of Upgrades

It looks like Oso is pretty close to ready, and it's time to put Tumbleweed into production, Here's the order for doing the upgrades:

Oso -- VM on Diamond. First to be finished, for testing.
Petra -- VM on Xena. Complicated networking. This will be the second test. Good test case for CPU utilization in KVM. Finished successfully (but testing is complicated because of Petra's weird networking.)
Claude -- VM on Jacinth. But it's mission-critical. I will clone it to become Orion (I use this name for machines that I'm setting up). Then I will upgrade Orion and check it out. When it appears functional, then I will swap the hostnames and IP addresses. Done and successful.
Aurora -- Hot spare for Jacinth. This will be the first real machine to be upgraded. Successful.
Xena -- My laptop. I should start eating my own dogfood promptly. Fixing all the details took a lot of work, but it's now presentable.
Kermit -- Audio playback. Upgrade was successful.
Iris -- Audio-video recording and playback. Upgrade was successful. But MythTV is not going to survive this.
Jacinth -- Main router and directory master site, with about 40 services including Wi-fi (hostapd). As always, upgrading Jacinth was traumatic, but has been concluded successfully. Tigase (XMPP server) is out of action, but the upgrade is not to blame.
Diamond -- Alice's desktop machine, and distro storage, thus it has extra complications. It was going to be done before Jacinth, but a problem appeared on the Windows virtual machine (Baobei), and the upgrade will be deferred until this is dealt with.

Upgrading Petra

I repeated the upgrade process on Petra, but not taking detailed notes. Since I had improved the procedure from experience with Oso, it went smoothly on Petra, except for coddling needed for the bizarre networking.

Cloning Claude into Orion

Since Claude is mission critical, I wanted to clone it, upgrade the clone, and then atomically replace the old Claude with the clone. I use the name Orion for machines that I'm installing, before an atomic replacement like this. The cloning process is sufficiently complex, and sufficiently useful in the future, that I wanted to make a record of the steps.

In hostgroup ( /usr/diklo/lib/site_perl/hostgroup.db ), change orion to be up, and in the v99.8 (Tumbleweed) hostgroup. For the rest of the hostgroups, just clone the line for Claude. Only has to be on Diamond but it will eventually propagate everywhere.
In /etc/firewallJ.d/trusted-adr.fw (on Jacinth, Orion and Claude's host), change Orion's MAC address to the correct value and systemctl reload firewall.J. Local convention: the first three octets are the KVM vendor code (52:54:00) and the rest are the last three octets of the host's assigned IPv4 address (in hex). Though not strictly necessary, eventually you should install this file on all hosts and reload the firewall, because they will reject packets from Orion until this is done. At least do on Diamond.
Since we have working NFS, import the installer CD like this.
ln -s /net/diamond/s1/SuSE/SuSE-build/x86_64/99.8/iso/openSUSE-Tumbleweed-NET-x86_64-Current.iso /s1/kvm/
The XML definition file's CD definition relies on a symlink in the orion directory to the network installer, or the DVD, or /dev/null. Since we're not installing, the latter is the operational one. Due to a crock, probably starting in v12.2, a link to the actual /dev/null is not allowed, and there is a separate character device in the containing directory.
ln -s ../devnull orion-cd.iso
How much disc space does Orion/Claude need? At present it has a MSDOS disc label with 1Mb unallocated at the beginning for Grub, 0.5Gb swap, and 9.5Gb for the rest, which has 9433916 kb total, 6271856 (6.3Gb) used, and 2659788 (2.7Gb) available. This seems fine; I'm going to copy it exactly.
Without shutting down Claude, copy claude.xml and claude-disc1.raw into ../orion/ . Using dd for the disc may be slightly faster. Copy speed (with dd blocksize=1M): 38Mb/sec, expected time 5 mins.
Mount Orion's disc on Jacinth, using these steps:
parted orion-disc1.raw
(parted) unit B #Everything in bytes
(parted) print #Note the start of the root partition, without the B: 534773760 (bytes)
(parted) quit
losetup -f # Prints e.g. /dev/loop0, an unoccupied loop device
losetup -o 534773760 /dev/loop0 orion-disc1.raw
fsck -C -f /dev/loop0 # Orion only, not Claude. Fix problems if any.
mount /dev/loop0 /mnt # -o ro for Claude, not Orion.
Semantically copy Claude's disc to Orion's. Only 17 files were actually copied. I could have done the first copy step in this way, but since both VMs are on the same host, using dd greatly speeded it up.
rsync -a -x -O claude:/ /mnt/
To be completely straight-arrow, shut down Claude, connect its disc to a loop device (readonly, but the offset is the same, because it was copied), and mount it on a different mount point. Then use rsync to copy from Claude's disc to Orion's disc. -x is not needed this time. About 45 files were copied, probably mostly date changes. Unmount Claude's disc, losetup -d /dev/loop1, and restart Claude.
Now turn Claude into Orion.
- In orion.xml (the VM definition file), change claude to orion wherever occurring. The name must be unique. Elements being changed include <name>, the hard disc, and the CD drive.
- Generate a new UUID: uuidgen -r. Insert in the <uuid> element. The UUID must be be unique.
- Find the <interface> element and change the MAC address, which must be unique. The local convention is that the first 3 octets are the KVM vendor code (52:54:00) and the rest are the last 3 octets of the host's assigned IPv4 address (in hex).
- Now we're working in /mnt/etc; do not alter the host's /etc/HOSTNAME and related files. Change /mnt/etc/HOSTNAME and /mnt/etc/hostname to say orion.
- Check if /mnt/etc/sysconfig/network/ifcfg-eth0 has a fixed IPv4 and/or IPv6 address (possibly as a fallback for DHCP). Make it right, if so.
- /mnt/etc/firewall.J/* has a lot of references to Claude. Leave them alone. They only have effect on the actual Claude.
- In /mnt/etc/ssh/ remove Claude's host keys. When Orion first boots, it will generate new keys.
  rm /mnt/etc/ssh/ssh_host_*
- In /mnt/etc/ssl/ we have Claude's host key. Leave it alone, probably nobody will notice that it was stolen :-)
- Rename /mnt/etc/apache2/conf.d/claude.conf to orion.conf and edit changing the hostname wherever occurring.
Unmount Orion's disc and disconnect the loop device.
Attach the VM to KVM, and try starting it. This is with v42.1.
virsh define ./orion.xml
virsh start orion
checkout.sh discrepancies:
- Networking is hosed. Because I didn't update Orion's MAC address in Jacinth's firewall. Fixed, and re-ran checkout.sh.
- Kerberos host keys are left over from Claude. Do not fix.
- Apache2 HTTP is OK but HTTPS failed. SSL host key is for Claude. Do not fix. When Orion turns back into Claude, both of these will self-heal (cross fingers).
Now upgrade Orion to Tumbleweed (v99.8) following the procedure above.
- The dist-upgrade was uneventful. It predicted it would use 1.0Gb more storage. Actually 1.3Gb.
- Post_jump was uneventful except rpmconfigcheck would not run (to remove updated configuration files). It now has a systemd unit and a wrapper script, and no LSB script. post_jump was fixed to use the wrapper. Kernels from v42.1 were removed by hand. After package deletions, disc usage is 0.13Gb more than on v42.1.
- checkout.sh had mostly the same failures that we found previously and aren't going to fix. Tester for rpmconfigcheck was fixed. Other problems will self-heal when Orion is renamed to Claude.
Next step is to re-do Turn Claude into Orion but reversed.
- Edit /usr/diklo/lib/site_perl/hostgroup.db , put Claude in v99.8 (from v42.1). Install everywhere, specifically on Orion.
- Shut down Orion.
- We'll leave orion.xml alone, i.e. as Orion, with its own UUID.
- Mount Orion's disc on /mnt .
- Change /mnt/etc/HOSTNAME and /mnt/etc/hostname to say claude.
- In /mnt/etc/sysconfig/network/ifcfg-eth0 revert the fixed IPs to Claude.
- In /mnt/etc/ssh/ replace the host keys with those from Claude.
- Rename /mnt/etc/apache2/conf.d/orion.conf to claude.conf and change the hostname to claude wherever occurring.
- umount /mnt and disconnect the loop device, losetup -d /dev/loop0
- Shut down Claude.
- Rename claude-disc1.raw to claude-disc1.421final
- Rename ../orion/orion-disc1.raw to ./claude-disc1.raw
- Start up Claude.
- Run checkout.sh, fix discrepancies.
  - krb-maint gets the required host keys for Claude.
  - apache2 is operational with HTTP and HTTPS. The latter is CFT only, due to using port 443 on Jacinth for OpenVPN. Web pages are being served on the public address (www.jfcarter.net).

Upgrading Claude (Continued)

Problems encountered when upgrading Claude. Most but not all are related to MDM customization.

The greeter window is in the northeast corner, as planned.
The greeter widgets avoid jumping around. (Good.)
The greeter window assembly uses available space efficiently (unlike the Circles theme). (Good.)
For a bogus loginID and/or password, it shows the PAM error message in a light salmon color. (Good.)
The theme engine is capable of showing a language selection and a clock, but jimc considers these, in my situation, as a bourgeois affectation. At the Math Department, the language selector would be a lot more useful.
If ~/.dmrc is not owned by the user or is world writable, the greeter backend will not read it, noisily. (But it is owned by the user, mode 644.) Hao bun! ~jimc/ is owned by 1000:user mode 755, and several interior files including .bashrc. [Fixed.]
A serious problem: when the user's session begins, something sources /etc/profile.d/alljava.csh, which uses CSH syntax, and it looks to me like Bash is the shell and doesn't like it.
The session output is redirected to ~/.xsession-errors by the daemon. The first progress message from /etc/mdm/Xsession is printed, then it sources /etc/profile and ~/.profile (if existing).
Idiot! SHELL = user's shell, i.e. /bin/tcsh, and /etc/profile -> /usr/diklo/default/bashrc auto-switches between /etc/profile.d/*.sh and /etc/profile.d/*.csh according to $SHELL, which of course is totally bogus. This has been working for 20 (?) years, why does it start failing now? Auto-switch was removed, works now.
In virt-viewer, the mouse is ineffective in the greeter and in the user session. It works in the installer, so virt-viewer is probably not the culprit. I don't know yet whether the mouse fails on a physical machine.
The issue turned out to be this: Sometime in the past the driver for the PS/2 mouse was flaky; I think the guest and host cursors were chronically out of sync. It was recommended to use a USB tablet, which I did. There is now a problem with the tablet, and reverting to the PS/2 mouse fixed the problem. And the cursors look in sync again. I'm speculating that there is a parameter for the tablet that used to have a useful default but which now must be specified explicitly. I'll want to try to figure this out. Later. PROBLEM
Tidbit of information: If the VM runs Microsoft Windows 10, virt-viewer will capture the cursor and you need to send a special key combination to get it loose: Ctrl-Windows for me, Ctrl-F11 for someone else. It's indicated in the titlebar. Workaround: auto-grab and release doesn't work with a mouse on Windows, but the tablet works fine, including auto-grab.
mdm emits a lot of GLib-CRITICAL: g_key_file_get_string: assertion 'key_file != NULL' failed. Lots of them when opening the greeter window, but two or three when starting the user session. See above, Session Configuration for MDM under Diarrhea of the Log File, for the ridiculous fix and a link to the bug report.
mdm is not using /usr/X11/xdm/Xsession, i.e. my Xsession. Time for a symlink. [Done.]
Where does the greeter write the name of the chosen session? Official place is /var/lib/xdm/dmrc . If mdm has its own special place, Xsession will need to look there. Looks like it writes directly in ~/.dmrc .
These items that Xsession uses are messed up:
- /usr/share/X11/xkb/xkbcomp is missing, split off to package xkbcomp (dragged in as a dependency of (?)), and it's now in /usr/bin. /etc/X11/xdm/wm.xsession modified accordingly.
- ck-launch-session is missing. ConsoleKit did not get installed. It's in SBS. Need to add to couchnet.sel [done].
- No dbus-launch because
  DBUS_SESSION_BUS_ADDRESS = unix:path=/run/user/500/bus
  On Xena: unix:abstract=/tmp/dbus-qEjR5P6gYt,guid=fd6f156d5dd5232e0e454b3b5985fc01
  Duh, it's saying that the caller (mdm) already set up dbus-daemon --session, so we shouldn't try to replace it.
- KrbAuthDialog-WARNING **: Unsupported cache type for 'DIR:/run/user/500/krb5cc_4tzw0j' Root can klist it. PROBLEM
/var/log/Xorg.0.log complains:
- Extension "XEVIE" is not recognized. Search above under MDM Session Setup for a fix and a link to the bug report.
- systemd-logind: logind integration requires -keeptty… --keeptty was not provided.
- No module cirrus, using modesetting (this is appropriate).
- /dev/input/event2 VirtualPS/2 VMware VMMouse was added.
- /dev/input/mouse1 VirtualPS/2 VMware VMMouse was added. But: No input driver specified, ignoring this device. Same for /dev/input/mouse0 .
- /dev/input/event1 VirtualPS/2 VMware VMMouse was added.
CUPS (printing) is a royal pain. The CouchNet design is, one host (Diamond) has the printer and 3 queues for it (normal, color, photo). Cups on Diamond publishes them via mDNS for use by DNS-SD (service discovery). These records are published:
- _ipp._tcp.local. PTR PrinterName@diamond._ipp._tcp.local.
- PrinterName@diamond._ipp._tcp.local. TXT "key=value"…
- PrinterName@diamond._ipp._tcp.local. SRV 0 0 631 diamond.local.
- diamond.local. A and AAAA records
The command to elicit these records is:
dig @ff02::fb -p 5353 _ipp._tcp.local. PTR #(or ANY)

This is working, and apps (e.g. Firefox) on Diamond and on leaf nodes can find the printer(s) and print on them.
However, leaf nodes run Cups, and are configured to poll mDNS periodically and to build transit queues to pass jobs submitted locally to the host with the printer. With cups-2.2.3 (Tumbleweed 2017-08-10), it gets into a mode where every time it re-discovers a printer it rebuilds the transit queue including downloading the PPD (about 10kb). This incessant net traffic was discovered because it prevents Diamond from sleeping when idle.
In addition, it is possible to subscribe to a RSS feed publishing changes in the status of printers. Cups purges expired subscriptions every 10 secs and logs a message to that effect, filling up /var/log/cups/error_log.
As far as I can tell, Cups on leaf nodes is never actually used (formerly it was essential). The cure is going to be to suppress Cups on the leaf nodes. [Done.]

Upgrading Aurora

Following along in Template for an Upgrade.

On aurora(42.1), backup-host . All but 4 files were already backed up.
pushconfig -C aurora -- 1 file to update, the Squid tester.
checkout.sh -- 1 test failed -- alsasound. Fixed (I hope) in TW.
updaterepo -v -- Did it, reindexed 3 repos.
df on Aurora: root has 9Gb free. Plenty.
In /usr/diklo/lib/site_perl/hostgroup.db put Aurora in v99.8
audit-repos -v -i aurora -r 99.8 -u -k #Install Tumbleweed repos.
audit-pkgs -v -r 99.8 -U -c -I # Distro Upgrade
Issue: Phoronix-test-suite requires php5. Toss Phoronix.
1765 packages to upgrade, 255 to downgrade, 539 new, 10 to reinstall, 225 to remove, 78 to change vendor, 8 to change arch. 2698 total items. 1.85Gb download, 1.7Gb additional space used.
Do not reboot the target yet. Post_jump will update grub.cfg adding the command line parameter which keeps the network on eth0/br0.
post_jump -r 99.8 aurora |& tee $j/jump.aurora # Minor complaints.
New systemd unit virtlockd.socket, I assume we want it, add to /m1/custom/scripts.dat. [Done, and created hostgroup vhost, and aurora is not in that hostgroup, so none of the VM daemons were enabled.]
Packages found by capabilities: [All fixed.]
- lirc from lirc-core. Do we still want package lirc and lirc-remotes? On Myth boxes.
- lirc-remotes from lirc-config
- openssl-doc from openssl-1_1_0-doc but I'm not going to fix this.
- gstreamer-0_10-plugins-bad and ugly; these are obsolete and should not be wanted.
- python-gstreamer-0_10 not found, probably should not be wanted.
- mysql-community-server-client not found, need to switch to mariadb.
- php5-imap requires back-version php5, can we toss?
/etc/openldap/ldap.conf is missing. It is created or checked by /usr/diklo/lib/daily/sorthosts.J which may not have been run yet. The file did eventually get created, at the end of post_jump.
Now reboot the target host.
Checkout.sh discrepancies:
- alsasound status is active/none. Counts as failure. The issue with alsasound is that /var/lib/alsa/asound.state has a different order (and possibly different control object numbers) in TW. Remove and re-save the file, the result will be consistent, and the test will be passed. [Done]
- libvirtd lacks virsh command. Need to install, and add to couchnet.sel. [Done.]
Special checkout items:
- The symlink for /m1/custom/background.jpeg was created, and the custom background is used by MDM and the CouchNet greeter.
- Jimc can log in. The existing ~/.dmrc is honored.
- Jimc (the console user) can use PulseAudio to play the test sound on the internal sound card's output.
- Jimc can do su root.
- insomnia -s puts the machine to sleep. It wakes up on USB (keyboard activity) and appears functional.
- Check if it sleeps when idle. Yes it does.
- date and rsyslog print localtime() in timezone -0700, but systemd journal (e.g. systemctl status alsasound) report in timezone -0600, i.e. normal time at 15:24, journal at 16:24. PROBLEM

Upgrading Xena

To discover hidden problems, I need to start eating my own dogfood as soon as possible, hence Xena (my laptop) is the next on the upgrade schedule.

Upgrade procedure: Following along in Template for an Upgrade.

df / => /dev/sda3 20510716 11481560 7964200 60% / (8.0Gb avail, plenty) After the upgrade, 6.9Gb available, used 1.1Gb additional.
Xena is in v99.8 hostgroup.
Install Tumbleweed repos.
Dist upgrade.
- The latest Phoronix test suite requires php5-imap which is uninstallable because it conflicts with php7 modules. Solution: toss Phoronix.
- 2717 packages to upgrade, 264 to downgrade, 844 new, 10 to reinstall, 246 to remove, 94 to change vendor, 9 to change arch. 3989 total items. Download 2.17Gb, installed size will grow by 1.7Gb.
- We have upgraded to PHP7. Need to check that all locally written PHP scripts still work. [Reverted to PHP5.]
- Looks like everything went OK.
Do not reboot the target yet.
I wonder if we should eventually run yast2-second-stage and yast2-firstboot YaST2-Firstboot.service YaST2-Second-Stage.service (Not doing it now.) PROBLEM
post_jump.
- A lot of world writable files! It was all ~/.cache/texmf/fonts/luatex-cache/ . [Tossed.]
- Could not install chromium-desktop-gnome vlc-gnome Duh, Gnome is not installed. Who asked for these? extra.sel. Removed.
- Installed about 55 keystone pkgs
- Removed 614 unwanted packages.
- 2nd dist-upgrade, 95 items.
- Should we completely remove lightdm and the CouchNet greeter? [Done.]
- /srv/ftp/updaterepo.rpmqa did get rebuilt.
- logrotate had an error code. wpa_supplicant.J dup log entry for /var/log/wpa_supplicant.log, awww. Moved their file to the jail.
- Need to review in detail, but other than that, looks OK.
Reboot.
- Failed to start purge-kernels.service . I finally found the error message: kernel-syms-4.10.13 requires kernel-default-devel-4.10.13 . This is the 2nd oldest version, not to be purged. And the package is installed! Removed both of them by hand. Same prob for 4.10.8-1 . Now it finishes successfully, removing both kernels. This service does not purge kernel-source; I did that by hand. [All happy now, 8.2Gb available, saved 0.2Gb.]
- Failed to start slapd / ldap.J [See fixes in Slapd Was Hosed.]
- Need to set up the greeter with a proper background. [Done]
checkout.sh discrepancies
- NetworkManager passed the test.
- NetworkManager-dispatcher is not running. It appears to now be dbus activated and it exits quickly. [Fixed scripts.extra]
- router-sol -- /proc/sys/net/ipv6/conf/accept_ra_defrtr = '0', should be 1. Only 0 when checked very soon after booting.
- libvirtd-gueses (typo) is disabled, and Petra isn't autostarting anyway. libvirt-guests reports correctly, no error here. libvirt reports correctly, no error here. [Fixed scripts.dat]
- No /var/lib/alsa/asound.state , need to create it. [Added to tester]
- vm-infra.J complains no vnet0 (because Petra was not started) [Fixed]. /usr/diklo/sbin/dhclient-vm-mon: line 110: fam2hip: $ihost4: must use subscript when assigning associative array. [There was a stray blank. Only appears when there's an error. Fixed]
- ldap is wanted, enabled, failed/failed. They now have /usr/sbin/slapd. Should be using /usr/lib/openldap/start (basically the original LSB script with parms from /etc/sysconfig/slapd) Fixed by using the correct binary, but I should come back to this and use their startup script instead. Blecch, both fixes were ineffective. Temporarily disabling ldap server. [See fixes in Slapd Was Hosed.]
- wpa_supplicant -- Selected interface 'p2p-dev-wlan0' -- it's disconnected. Wrong interface? That's the problem. [Tester fixed.]
- Should be enabled but aren't: libvirtd-gueses (a typo, fixed)
- Enabled but should not be: dbus-fi.epitest.hostap.WPASupplicant dbus-fi.w1.wpa_supplicant1 [enabled in scripts.extra]
Rebooting, and checkout again.
- ldap not fixed yet. [See fixes in Slapd Was Hosed.]
- mdm greeter action to shut down does not do anything. [Fixed, see Session Configuration for MDM.]

Once the basic problems were dealt with, I found these issues in the user session. See Session Configuration for MDM for fixes (for most of them).

In the MDM greeter, the Actions menu choice to shutdown or reboot the machine has no effect and no (findable) error messages. This is confirmed on Xena but probably affects all v99.8 hosts.
When MDM starts up, frequently (but not every time) it starts in the password phase. [That's not a bug, that's a feature!]
My ~/.xkeyboard file is supposed to swap ctrl and capslock. It also is supposed to register the alt key as a modifier, which of course doesn't happen. xkbcomp got moved to /usr/bin. Fixed to adapt to either location of the binary. It also didn't get executed at startup.
The button zones on the trackpad are not effective. The zones are set up but do not produce button codes. Evidence: in xev, a single click in the left zone produces a button1 event, but clicks in the middle or right zone produce no event at all.
xf86-input-synaptics is not installed [added to extra.sel] and the libinput driver is used. Installed that package, we're getting closer, synaptics driver is now being used (vs. libinput). Scroll events are produced. Button3 is produced, but only 2 button areas. Someone has an explicit SoftButtonAreas option for this: 70-synaptics.conf , which overrides my 60-synaptics-J.conf . Renumbering… Now it all works.
Something wacko with the beroot command. A change in the interpretation of ${@:0-1} when $# == 0. Worked around.
The session's SSH agent was started (under startxfce, not by sys.xsession nor by pam_ssh) but not loaded.
Features of the XFCE session that are horked: Most settings carried over correctly.
- Background is zoomed (anamorphically scaled), should be tiled. [Fixed.]
- Where is the workrave applet? See tale of woe in Remaining Problems.
- Mouse and Touchpad: new setting: Disable touchpad while typing, 0.5 sec timeout.
- If we had gstreamer-properties installed, the Settings app could start it.
These were checked and appear OK:
- For the desktop theme we're using Adwaita. OK.
- Desktop icon theme: Tango. OK.
- Desktop Font: Sans 11. OK.
- Appearance Desktop Settings: icons only, with images, no sounds.
- Panel: Looks like the panel widgets are all present and in the expected order. OK.
- Screensaver (XScreenSaver-5.37 2017-07-05): OK, no change.
- Session and Startup: No change, OK.
- Window manager settings: Theme Daloa, font DejaVu Sans Condensed 11 (important), jimc's buttons, F9-F10-F11 to maximize window. Looks like no changes.
Firefox has decided to use a light blue background that I don't like. Also the font is different and isn't my favorite. PROBLEM
Gnome Games (specifically, Mahjongg) are not installed. Wrong: in v42.1, gnome-games-3.10.0- requires a list of games, whereas in Tumbleweed, gnome-games-3.24.1 requires no games and you have to ask for them individually. Changing couchnet.sel accordingly.
On all v99.8 hosts and every day, /var/cache/man is set to be owned by man:man, whereas some /etc/permissions* wants it man:root 755. Also, Petra and Xena set /var/log/btmp to root:utmp while the files want it as root:root. The cure: override in /etc/permissions.local going along with what mandb sets the files to.
Diamond (and presumably other hosts) hack packet statistics generator report the uptime and total time as 0. Jacinth reports uptime 23.9hr, total 24.0hr, which is correct. PROBLEM

Upgrading Kermit

Kermit (see hardware review) is an AMD E-350 which has bounced between several roles but is now doing audio playback. The only challenge in this upgrade may be the sound. Following along in the upgrade instructions:

/s1/SuSE/bin/updaterepo -v (on Diamond). Issues:
- mod_http2 will disable itself under the Prefork MPM. Can we switch to Worker now? PROBLEM For HTTP/2 see the Apache howto about http2 or RFC 7540 or HTTP2 Explained by Daniel Stenberg. HTTP2 is binary. HTTP2 can run multiple streams over one TCP connection. There's a new server push feature. RFC 7540 blacklists cipher suites without ephemeral key exchange, or which use CBC mode.
- aspell is going to be removed from Factory [bsc#1052949]. (At least 1 package replaces with hunspell.)
- StrongSwan-5.3.4 gets sha3 plugin which supports Keccak. First app to have this.
/srv/ftp/updaterepo.rpmqa is up to date.
checkout.sh >& $j/check.out -- Petra failed to automount Kermit's NFS exports (this combination often gives trouble), and alsasound is in a strange state (should self-heal after the upgrade). Other than that, Kermit's services are all working.
df / -- 16.3Gb total, 7.6Gb used, 8.0Gb available. This is plenty.
Moved Kermit from hostgroup v42.1 into v99.8 .
audit-repos -v -i $target -r 99.8 -u -k
audit-pkgs -v -r 99.8 -U -c -I |& tee $j/distu.$target
Start 13:02, installation done 16:50 (3h48m), %post scripts done 17:27 (37m). What a slug!
Hit the problem seen before with php7-imap. Solution: toss Phoronix Test Suite. Similar issues with mythweb-0_27.
1688 packages to upgrade, 216 to downgrade, 519 new, 10 to reinstall, 211 to remove, 76 to change vendor, 7 to change arch. 2545 total items. download 1.64Gb, installed size increases by 1.0Gb.
post_jump -r 99.8 $target |& tee $j/jump.$target
Took 62min. Discrepancies found:
- World writeable files in /home/video/tv/*.png (old video thumbnails). [Fixed.]
- PHP-7 was downgraded to PHP-5 to accomodate MythTV.
- MythTV (specifically libmyth-0_28-0-0.28.1+git.20170712.eef6a480b0-1.2.x86_64) was downgraded to version 0.27. Then dist-upgrade removed all of them.
- Removed 669 packages.
- /etc/passwd - group - shadow have entries for mythtv and lirc which the master copy does not have. (Kept them on Kermit temporarily.)
- /etc/sysconfig/apache2 module list was monkeyed with. Compared with /m1/custom/conffiles/… php5 was removed.
- Does /usr/diklo/lib/daily/logrotate.J have an issue with /etc/openldap/ldap.conf ? Appears to claim that the file does not exist, when it is there.
These wanted packages could not be installed:
- No provider of mdm found? That's not right.
- mysql-community-server, -client, -tools. Need to switch to MariaDB.
Reboot Kermit now. Boot issues:
- systemd-modules-load.service failed. It tried to load powernow_k8 which complained no such device and croaked. This happened both in the initrd and in the active boot. Let's blacklist this module and see if it helps.
- upower.service failed to start the first time around, but it was started again, successfully. Bug was reported.
- Using XDM, and background is anamorphically scaled to the correct height but about double width (maybe 1920px, should be 1024px). Because MDM is bogusly not installed. [Fixed.]
- purge-kernels took a long time and I don't know if they were purged. Success, took 3 minutes.
- The grub menu is not themed and includes the installer for v42.1. See /etc/grub.d/42_installer and 43_rescue (removed).
checkout.sh >& $j/check.out
Checkout discrepancies:
- apache2 is not running. /etc/apache2/conf.d/mythweb.conf is hosed because mod_php is not loaded. I hid it, now Apache starts and passes its functional test.
- alsasound status is active/none, counts as a failure. Which are we supposed to be using, alsasound or alsa-restore? Switched to alsa-restore. And also, on a version upgrade the state file changes (different order and/or device numbers) and must be rebuilt. Remove it and re-run the tester to regenerate it.
- nfs-server: petra failed to mount all kermit exports, access denied. Xena can mount. The problem is with Petra.
Other issues:
- Grivet says: LWP will support https URLs if LWP::Protocol::https module is installed. Package: perl-LWP-Protocol-https . It's installed on Diamond; how did it get there and why was that ineffective on Kermit? [Added to couchnet.sel.]
- With that module installed, Grivet plays the music.

Upgrading Iris

Iris is the home theater PC, running MythTV and also doing audio playback. Following along in the upgrade instructions:

Preliminary steps, while Iris is still in v42.1:
- pushconfig -C $target -- All config files are up to date.
- checkout.sh -- Everything is working except irrelevant items.
- /srv/ftp/updaterepo.rpmqa -- Up to date.
- updaterepo -- Downloaded the latest versions.
- Disc space in root -- Total 20.5Gb, used 8.0Gb, available 11.5Gb, plenty.
In /usr/diklo/lib/site_perl/hostgroup.db put Iris in v99.8 from v42.1.
Stop and disable restarter.timer and cronj. You don't want stuff restarted that you're upgrading.
audit-repos -v -i $target -r 99.8 -u -k
audit-pkgs -v -r 99.8 -U -c -I |& tee $j/distu.$target
Issues:
- cdrkit-cdrtools-compat-1.1.11-26.1.x86_64 conflicts with cdrecord-3.02~a07-2.2.x86_64 -- toss cdrecord, actually I don't know where this came from, must be ancient. Also conflicts with mkisofs-3.02~a07-2.2.x86_64, toss.
- Phoronix Test Suite again. Toss.
- php7-imap-7.1.8-1.1.x86_64 conflicts with mythweb-0_27-0.27.0-1.2.noarch which requires php5. Toss mythweb and straighten out the PHP issue later.
- 1713 packages to upgrade, 224 to downgrade, 540 new, 10 to reinstall, 225 to remove, 85 to change vendor, 7 to change arch. 2605 total items. Will use additional 1.7Gb.
- Took 44min.
Don't reboot yet.
post_jump -r 99.8 $target |& tee $j/jump.$target
Issues found:
- Took 9 mins.
- World writeable files, all in /s1/video/tv/*.png [fixed].
- mdm-branding-upstream-2.0.18-2.7.noarch was not found on the media http://distro.cft.ca.us/SuSE-build/x86_64/99.8/ (aborted!) Growl, it's in the wrong architecture directory! Moved it.
- Re-did dist upgrade. Upgraded to MythTV-0.28 (from 0.27). Including mythweb (which requires php5 and got it).
- Removed 670 packages.
- /etc/sysconfig/apache2 lacks mod_php5 . See Kermit. Re-added it.
- Uninit value in split at /usr/diklo/lib/daily/fm.J line 15. In kernel 4.12.8, kernel threads have /proc/$PID/cmdline but it is empty; also a process may be included in the glob pattern but exits before its cmdline is examined. The script was changed to bypass those problems.
These packages were installed via capabilities (same issue on Kermit):
- lirc from lirc-core [Updated both in /m1/custom/extra.sel]
- lirc-remotes from lirc-config
MySQL was not available, need to switch to MariaDB. Affected packages: mysql-community-server mysql-community-server-client mysql-community-server-tools PROBLEM
Reboot Iris now. It rebooted.
checkout.sh > /tmp/check.out
Discrepancies:
- Iris' /root/.ssh/known_hosts file is out of date, need to regenerate.
- nfs-server failed. Trying to use Petra as the partner, I thought I blacklisted petra!
- alsasound -- Need to rebuild /var/lib/alsa/asound.state because it is not upward compatible (different device numbers). Now it passes.
Does it play music? Yes! OTA to FMRadio device to ezstream to Icecast to meow.sh to GStreamer playbin.
I'm sure MythTV is hosed, but except for that, I believe Iris is operational and is successfully upgraded.

Upgrading Jacinth

I had intended to upgrade Diamond after Iris and before Jacinth, but Microsoft Windows 10 (on the VM Baobei hosted on Diamond) self-destructed just before I started the upgrade. Obviously monkeying with Diamond's KVM will not affect problems on Baobei, but it's prudent to not touch Diamond until Baobei is dealt with.

Steps in upgrading Jacinth:

Preparation:
- /home/post_jump/pushconfig -C $target -- Already up to date.
- checkout.sh >& $j/check.out -- OK except for ignorable flaky items.
- /srv/ftp/updaterepo.rpmqa -- Up to date.
- s1/SuSE/bin/updaterepo -v -- Updated.
- df / : 20.5Gb total, 11.3Gb used, 8.1Gb available, plenty.
Edit /usr/diklo/lib/site_perl/hostgroup.db moving Jacinth to v99.8 from v42.1 .
Stop and disable restarter.timer and cronj.service .
audit-repos -v -i $target -r 99.8 -u -k -- On Diamond.
audit-pkgs -v -r 99.8 -U -c -I |& tee $j/distu.$target
Package selection issues:
- Can't find typelib(Workrave) for workrave-1.10.16, keep obsolete workrave-1.10.1 which gets a segfault.
- Toss Phoronix Test Suite.
- php7-imap-7.1.8 requires PHP7, cannot install, incompatible with php5-imap-5.5.14. Toss the latter.
2733 packages to upgrade, 250 to downgrade, 825 new, 10 to reinstall, 228 to remove, 94 to change vendor, 9 to change arch. 3971 total items. Download 2.16 GiB. Additional to use: 1.7 GiB.
Took 78min and aborted on item 2855/3971, http://download.opensuse.org/tumbleweed/repo/oss/suse/x86_64/libprojectM-qt5-2-2.1.0-14.1.x86_64.rpm?proxy=http://distro.cft.ca.us:3128 is temporarily inaccessible. Zypper retried 4 times but Squid got a 503 each time. A lot of %posttrans scripts were skipped. I'm redoing the dist-upgrade and watching it closely, and I plan to ignore the failure on this package. Famous last words: connectivity to download.opensuse.org (and other sites) is lost. eth1 is down. But access to Diamond remains. Retrying but letting it skip unavailable repos. (Find out what TCP_DENIED means in /var/log/squid/access_log .)
Retrying, 1191 total items remain. Upgrading the desktop environment (XFCE) when you're using it to do the upgrade gives some interesting effects. Whenever it installs or removes a font, all the XFCE plugins hog the CPU for 30 secs or so. It tried to load 20 to 30 packages from offsite, which I told it to ignore. Idiot, it's upgrading Phoronix Test Suite, not tossing it. But it had to retrieve from offsite: ignored/skipped. Finally finished after 78min. It ran all the %posttrans scripts that I remember from other machines: looks like before the abort it saved them in /var/adm/update-scripts and re-ran them when the job was retried.
Don't reboot yet. /etc/default/grub has net.ifnames=0 .
post_jump -r 99.8 $target |& tee $j/jump.$target
This is going to be a challenge; it will probably botch the install and dist-upgrade steps.
- Refreshing repos: for a 503 it retries every 30 secs 4 times, but eventually gives up, and post_jump continues.
- Lots of directories with mode 777 in /m1/openHAB-1.8/ (fixed).
- dhcpcd is the only possibly critical package that isn't getting installed. Skipped the installation phase.
- Deleting unwanted packages: took a long time to extract dependencies from every package. 689 packages to remove.
- Dist Upgrade. Off-site repo(s) are unavailable; I tried to ignore but it gave up.
- Got the grub.cfg file (with the needed kernel parm).
- Problems checking/fixing the Kerberos keys:
  sed: -e expression #2, char 21: unterminated address regex
  Multiple realms 'JFCARTER.NET JACINTH.CFT.CA.US' in /etc/krb5.conf, override with -r
  Whatever the problem was, after rebooting the keys are checkable and are all present.
- Running housekeeping. LDAP is hosed (see Slapd Was Hosed). Evidently fm.J is on Jacinth and needs the patch that Iris has [installed]. /etc/sysconfig/dhclient got changed.
Reboot. Cross fingers. What did not come up:
- dnsmasq: bad IPv4 address in /etc/dnsmasq.d/dhcp.conf at line 76. Yes it still supports 121=classless-static-route. The issue was, 0/0 used to be a legal IPv4 address; now it wants 0.0.0.0/0 . But will this be translated to the address of the host running dnsmasq? Hope it wasn't. [Verified.]
- upower (coredumps repeatedly). Eventually it started and kept running. Command line for later testing: just /usr/lib/upower/upowerd The issue was that it's dbus activated, and if it comes up before dbus does, it reports an assertion failure (no dbus connection) and things go downhill from there. It's supposed to hang, or keep retrying, until dbus appears. My fix: add "After=dbus" to the systemd unit file. Wrong, it needs to be After dbus-org.freedesktop.login1.service and also getty.target because login1 won't start until something coincident with getty.target. Bug report here.
- mdm (fallback to xdm)
- ldap / slapd: I made the fixes in Slapd Was Hosed and it works now. Except no communication with Diamond (yet).
- strongswan (charon dumps core repeatedly). This self-healed.
- network6: Needs ifconfig command. (And needs systemd unit.) Edited to use ip command. Works now but still no IPv6 connectivity. dhclient-hooks -v for status check: our wild side IPv4 address has changed. Add -c to re-register. Registered on tunnelbroker.net, members.dyndns.org, admin.mailroute.net. IPv6 is back.
checkout.sh discrepancies:
- avahi-daemon: Never worked right on Jacinth.
- network6: status check says 2001:470:1f05:844::3 is missing, which is not a lie. Eventually I re-registered the wild-side IPv4 address, which brought back 2001:470:1f05:844::3.
- domoticz: Is not installed. It's in obs:home:Guillaume_G. Downloading and installing. We got v3.4834. On startup it wants to upgrade to version 8153. Testing, it always gets an error sending a command to the switches. It turns out that v3.4834 was compiled without Z-Wave support, hiss, boo! The old version (domoticz-2.0.2276) required openzwave libopenzwave1_2 libopenzwave-devel but these are not available on Tumbleweed.
  The Z-Wave protocol is non-free, but has been illegally reverse-engineered to produce the openzwave set of packages. Could that be the reason it is banned from SuSE?
  Solution: I copied domoticz-2.0.2276 from my SBS stash, and downloaded the openzwave things from SuSE v42.3 (into local SBS repo). The database had been trashed in the failed upgrade, so I restored it from backups. This combination is installable and domoticz is once again operational.
- dovecot: Is not installed. Installed and not dead, but TLS establishment fails, no shared cipher. The weasels! They just overwrote everything in /etc/dovecot/conf.d . Need to restore. [Done and tested.]
- openvpn@server443 is dead (but openvpn@server is OK). Looks like a connection from researchscan351.eecs.umich.edu. (141.212.122.96) killed it. Restarted successfully. Message: Aug 24 16:23:35 jacinth openvpn[2386]: TCP connection established with [AF_INET]141.212.122.96:26630 ; --mtu-disc is not supported on this OS ; Exiting due to fatal error. Next day an unrelated web crawler caused the same failure. Bug report here.
- alsa-restore: Needs to rebuild /var/lib/alsa/asound.state (done). I added this to the tester. But on Iris this fails at random times. Still investigating. PROBLEM
- strongswan: Dumps core continuously. I didn't do anything; it self-healed.
- postgresql: postgresql96-9.6.3 is installed. The database is from v9.4. Got to reload. [Done, successfully.]
Now that I have DNS and networking unscrambled, I can install missing packages. audit-pkgs -v -i -c -I . Package selection issues:
- owncloud-server-8.0.0 requires php5 >= 5.4.0 but it is uninstallable. Solution: do not install php7-7.1.8-1.1.x86_64. Similarly for various php7 modules.
- It's going to install MythTV-0.27.
- It's going to install mdm.
- What happened to Dovecot? We're asking for dovecot21 which is deprecated in Tumbleweed. Installed dovecot22; they overwrote all the configuration files, hiss, boo! Restored from backups. Dovecot works now.
I have too many packages that depend on PHP5 and aren't built for PHP7. Reverting to PHP5.
I cleaned up some packages: MythTV, MySQL, squirrelmail. These were already commented out and were removed: sogo, sope, prosody and its dependencies. MySQL did not actually go away; something depends on it. 32 packages removed.
Installing packages according to the alterations. Unavailable packages: dhcpcd libmm14 perl-Danga-Socket php5-pear-Crypt_GPG . The first 3 are obsolete; removed from /m1/custom/extra.sel . php5-pear-Crypt_GPG is used by Roundcube's Enigma and is important. Downloaded and installed.
Except, several PHP5 modules are back version and have an incompatible ABI: exif fileinfo pdo_pgsql pgsql pspell zip zlib , and intl needs libicui18n.so.52.1 which is not installed.
Look for /usr/lib64/php5/extensions/exif.so . The culprits are dated 2017-03-20; Currently updated: 2017-07-30. The culprits are from v42.1 and did not get dist-upgraded. php5-exif-5.6.30 requires php5-5.6.30 which is uninstallable. Needs to downgrade php5-5.6.31 to php5-5.6.30 together with a bunch of modules depending on it. Wait a minute, the one on Tumbleweed is php5-exif-5.6.31, why isn't that being installed? Because of priorities. Strategy: retrieve-pkgs ; updaterepo ; zypper dist-upgrade. The dist-upgrade wanted to upgrade everything to php7. zypper addlock php7, that was honored by dist-upgrade. Now the correct packages are installed.
Upgrading Dovecot from v2.1 to v2.2. There are just a few configuration changes, none of which are non-default on my system, so the upgrade should go smoothly. Cross fingers. It starts… It is creating custom Diffie-Hellman groups in /var/lib/dovecot/ssl-parameters.dat to resist Logjam. Looks like it's not going to open for business until that's finished. Later, when it regenerates the DH parameters it is willing to use the old ones until finished.
Can't do TLS on any ports. Weasels! They overwrote everything in /etc/dovecot/conf.d without doing the rpmsave thing! Restoring from backup. Now TLS works. Actual mail retrieval coming soon.
Need to test Roundcube. The krb5 extension is required for GSSAPI authentication…
IMAP login Problem after upgrade to v1.2, OP quwax, 2016-05-26. (Useless.)
The name of the plugin appears to be roundcubemail/plugins/krb_authentication. See the Official Plugin Repository for installation instructions and a searcher. Unfortunately it doesn't find either krb_authentication or krb5.
This plugin is on GitHub. It turns out that this plugin is included in the core. Now, how to enable it in configuration. I edited /srv/www/roundcubemail/plugins/krb_authentication/config.inc.php filling in my own hostname@REALM. But what it's really asking for is the PHP5 Kerberos (krb5) extension. Package php5-krb5 is not on SuSE Build Service. See the PECL package page; the latest version is 1.1.2 dated 2017-04-08. Apparent installation procedure from Can't build krb5 php extension by G4schberle (2017-06-15) on Stack Exchange.
- Download the TGZ file. Unpack. Change to that directory.
- phpize (no command line arguments)
- ./configure
- make (lots of warnings e.g. assigning between different type pointers)
- Ended with an error, using a nonexistent struct member. Nobody had any suggestions for him. To me it looks like version skew, but in what devel package?
Build dependencies:
- php5-devel -- for phpize
- ./configure ran without any complaints. Famous last words.
- gssapi/gssapi.h missing. It's in krb5-devel .
- make -- compiled with no errors, no warnings, good!
- make test -- various problems, probably in the testers. Could not load the new library, undefined symbol GSS_C_NT_HOSTBASED_SERVICE. All 5 tests were skipped because config missing.
- make install -- no errors, produced /usr/lib64/php5/extensions/krb5.so.
- Running Roundcube: the krb5 extension is still required.
- I created (by hand) /etc/php5/conf.d/krb5.ini containing extension=krb5.so -- evidently, without this the module cannot be loaded. But something else prevents loading.
- mod_php5 maintains state; it's not going to load the new module. Restarted Apache. Yes it did load the module. Now I get a different error message…
Now it complains: The gssapi_cn parameter is required for GSSAPI authentication. This is the Kerberos ticket cache as propagated to the server, e.g. KRB5CCNAME=DIR:/run/user/500/krb5cc_XXXXX. But this plugin ought to be bypassed if KRB5CCNAME is empty. I'm not propagating the credential anyway, rather, relying on split authentication and authorization to get Dovecot to deliver the mail. I don't want to debug the client-server interaction. I'm going to try to sabotage /srv/www/roundcubemail/plugins/krb_authentication/krb_authentication.php Still demands the gssapi_cn parameter. This is checked for in /usr/share/php5/Roundcube/rcube_imap_generic.php . Conditional on $type == 'GSSAPI'. This value is the 3rd arg to function authenticate($user, $pwd, $type). Function connect() picks the type. If I'm reading the code right, if the mail server (Dovecot) supports GSSAPI, it is used, and can fail if the browser has not sent the needed parameters. I tampered with /usr/share/php5/Roundcube/rcube_imap_generic.php by removing 'GSSAPI' from the mechanism list. That did it; Roundcube is now showing mail. And jimc's generic REMOTE_USER plugin still works.
Enigma had a problem but finding the error message was hard. I eventually found it in /var/log/roundcube/errors . /home/httpd/htdocs/kerberos/roundcube/plugins/enigma/home is not writable (not that Enigma is going to create a user homedir today). Its perm is root:root 755, should be wwwrun:www 755. Once that's fixed, it's now signing and checking my mail.
Now that I can read mail I discover that there is a plethora of messages complaining that /usr/bin/php /home/httpd/htdocs/owncloud/cron.php and /home/jimc/public_html/radio/icekiller -k -u admin/hackme http://localhost:8000 were not being executed.
The problem was in cronj. Whenever the job was to run as other than root, the file that receives stdout-stderr had to be re-owned to the target user, and formerly that worked fine, but now you can't create the file as root and then change its ownership (at least when it's in /tmp); you have to setUID to the target user and then create the file. icekiller and ownCloud are now being executed happily.
Need to test ownCloud. Web interface works, cellphones are syncing.

Upgrading Diamond

The Windows VM Baobei has been reinstalled and appears to be working normally, and the guest is known to work on a host with Tumbleweed, so I'm ready to upgrade Diamond fromm v42.1 to Tumbleweed. Preparation steps:

/s1/SuSE/bin/updaterepo -v -r 99.8 -a x86_64 >& /tmp/system/updaterepo.log
Get this started early; it's on the critical path. Allow time to read the prolix changelogs.
Pre-removed Phoronix Test Suite.
Diamond is the host for the VM Oso. Power off Oso.
retrieve-pkg -v -w $HOST for all hosts, plus reindexed PackMan.
/home/post_jump/sync_jump -C #-- All conf files are up to date.
Checkout.sh outcomes are either successful or excusable.
Installed packages list is available.

Doing the upgrade:

Check disc space: 15.3Gb total, 12.3Gb used, 2.2Gb available. Should be enough, but space is getting tight. /var/cache/insetup.jail has over 3Gb of cruft, all for obsolete OS versions. Deleting it. Now 5.6Gb available.
Diamond hostgroup was changed to v99.8 from v42.1. This is the last v42.1 machine.
Stopped restarter.timer and cronj.service.
Installing repos: audit-repos -v -i $target -r 99.8 -u -k
Installation and refresh succeeded. The main repo's metadata is enormous and slow. Oops, it was supposed to edit baseurls to use the local directories rather than HTTP to $target. Fixed (bad regexp).
Doing the upgrade:
audit-pkgs -v -r 99.8 -U -c -I |& tee $j/distu.$target
Package selection issue: apache2-worker-2.4.27-1.2.x86_64 conflicts with apache2-worker-2.4.16-18.1.x86_64 and other 2.4.16 packages. Toss the old ones.
2649 packages to upgrade, 210 to downgrade, 845 new, 12 to reinstall, 240 to remove, 71 to change vendor, 8 to change arch. 3859 total items. Download 2.0Gb, installed 1.1Gb additional (predicted) vs 1.4Gb (actual). Total runtime 54min.
Don't reboot yet, or networking will be horked.
post_jump -r 99.8 $target |& tee $j/jump.$target
post_jump problems:
- Oops, /etc/passwd and friends on the master site (Diamond) differ from what's in one or the other post_jump directory. Specify -u 0 to ignore this minor detail.
- post_jump wants to check if there's really a distro directory to download from, but there isn't, because Apache is not responding. post_jump should look for the baseurl and honor the dir: scheme. Band-aid applied: ignore if -u 0.
- World writeable: /home/backup/jacinth/baobei/lots of files due to dumb Windows permissions. Ignore.
- Insalling 65 packages, removing 583 packages.
- Should have locked out php7. I'm going to have to revert to php5 later. [Done.]
- Total runtime 10 mins.
Tossed and locked out php7; reinstalling wanted packages (audit-pkgs -i). It's trying to install cups-browsed which is no longer wanted on CouchNet (removed from /m1/custom/extra.sel). The rest of the missing packages were installed.
Rebooted. Boot time discrepancies: slapd is hosed; no IPv6 for Oso, which was started at boot.
checkout.sh discrepancies:
baobei.automount works and so does baobei-umount.timer. Needs a checkout script for /etc/systemd/system/baobei.automount [done].
alsa-restore has a new control; need to rebuild /var/lib/alsa/asound.state [done]. The tester gets too many false positives anyway.
Apache2, which was horked before the reboot, is now working.
Squid tester tried to download http://download.opensuse.org/distribution/leap/42.1/repo/non-oss/ and failed. Probably removed from host. Why doesn't it use baseurls out of the current repo definitions? Because it recursed into the template and jail directories. [Fixed.]
Slapd is hosed, q.v. for the fixup procedure. It is now serving the directory.
Special check on printing: On Diamond, lp -d lp2 /etc/issue prints the file. On Xena it maunders Bad file descriptor. It's trying to connect to the nonexistent cupsd on localhost. I edited /etc/cups/client.conf to direct leaf nodes to Diamond. Xena can print now; and Diamond can still print.
User complaint: The MDM greeter's linespacing is awkward and it should be in the southwest corner, not northeast.
Oso still isn't getting IPv6. This self-healed, probably when something got restarted.

Updating Diamond is now finished, and this is the last host that needs to be updated.

Remaining Problems

wakeup.J.service is actually a LSB script. Need to extract the code into a separate script and to make a proper systemd unit for it. bridge.j has the same issue. PROBLEM
These are the remaining scripts in /etc/init.d :
- bridge.J -- Needs systemd unit
- named *-- jimc already created a systemd unit for it
- postfix -- Unowned; also has a unit in /usr/lib/systemd/system
- rpcbind -- Unowned; also has a unit in /usr/lib/systemd/system
- wakeup.J -- Needs systemd unit
retrieve-pkgs does one host at a time, including reindexing every time. Need to do multiple hosts and reindex only once. PROBLEM
It is possible to install evince with no backends, whereupon it exits with no error message. You need at least one backend. Jimc is installing evince-plugin-{pdf,ps,dvi,djvu}document ; also available on the Tumbleweed DVD are comics, tiff, xps. Bug report here, comment #3.
Logging in with the MDM greeter and my custom theme, I was delayed while entering my password. The daily housekeeping report popped up and received focus (and several bytes from the password), and the only way to get back to the greeter was to use that window's close button. The greeter should positively grab the keyboard. Note, the housekeeping report generator runs as root, and uses /var/lib/mdm/authdir/:0.Xauth as its XAUTHORITY file to gain access to the server. Bug report here.
This occurred with mdm-2.0.16 which I swear was from the main Tumbleweed repo, but now as of 2017-08-22 it has been upgraded to mdm-2.0.18-2.7.x86_64 from obs://build.opensuse.org/X11, not in the Tumbleweed main repo.
Tigase XMPP server complains about a format error in the new host certificate, and it gets a segfault. I should either upgrade to the latest version and try again, or switch to yet another XMPP server. Metronome is a new one. Fork of Prosody (~2013), written in LUA, actively developed, with several public deployments for social media. Not on SBS, install from source. PROBLEM
workrave-1.10.1 is installed from SuSE-13.1, hiss, boo! It gets a segfault on Tumbleweed. It is not present in any of the configured repos. I need to find a recent version, copy it to the appropriate local repo, and install it. Hung up trying to find typelib(Workrave). Also libgdome.so.0. I tried doing this the right way with SBS, but it's turning into a real can of worms, and also, the last maintenance to the package was in 2011, so I'm giving this up. PROBLEM

Fixed Problems

Claude and Oso are chattering with Diamond, on port 631 keeping it awake. Oso (and presumably Claude) downloads the PPD for all 3 printers (about 10k each) every 10 secs, as if it's polling. [Fixed by killing cups on leaf nodes; it's totally useless there.]
When an entire directory has vanished on the host, backup-host removes the directory from the backup instance before removing the content, failing. I thought I fixed this already. Use -n to not copy/remove and to retain the temp files. Fixed again, and checked.
updaterepo runs on Diamond at midnight, when most hosts are down, and it tries to retrieve rpmqa from them, failing. Fixed: leaf nodes mail lists to reports@jacinth, and updaterepo on Diamond retrieves them from http://jacinth/~reports/rpmqa.d/$HOSTNAME.
My custom greeter window on lightdm has a black background and a white border. The one on MDM lacks the border. I should add the border. But without documentation it isn't a quick fix. I also have a user complaint that the linespacing is awkward, and the box should be in the southwest corner, not northeast. OK, I moved the box and provided the white border, but wasn't able to deal with the spacing.

Upgrading to OpenSuSE Tumbleweed

Table of Contents