OpenSuSE Leap
42.1, which is what I'm running, just died, which is what
prompts this upgrade.
See SuSE's version lifetime page.
42.2 is expected to last until second quarter 2018,
i.e. 1 year more, and I think it's been out about 6 months.
42.3 is expected to come out at the end of July 2017, i.e. in two months.
The Lifetime page says that a major release, i.e. the whole 42.x series, should last 3 years, but minor releases like 42.1 should appear annually and support (i.e. bugfixes and security) will continue for 6 months after the next version comes out. This schedule is very inconvenient for me. I'm going to try to switch to the Tumbleweed rolling distro.
I always begin a project by stating its goals,
issues likely to be encountered, and then the actions to be taken.
I have a lot of experience upgrading SuSE distros from 6.4 to 13.1 and then
Leap
42.1, but these have been fixed, non-rolling distros. Tumbleweed
is going to be different in some ways. But I expect that a lot of steps for
the fixed distros will carry over to Tumbleweed.
zypper dist-upgraderather than
zypper patch. And I will need to retrieve and save the updated RPMs rather than the new patch RPMs. (Update: there is an update repo. Urgent patches go there but are moved to the main repo after system integration tests are finished. At present it has one package, vs. 15651 packages in update/42.1.)
Here's an overview of installing a new distro:
No handsinstaller (instsetup): Create an installer image for the new version and make sure it can really upgrade a machine without intervention from the sysop. I've found that it's sufficient to do a dist-upgrade (which did not always work on older versions), and I may bypass instsetup this time around.
Presently for the static distros I have this basic infrastructure:
There is an enterprise mirror on distro.cft.ca.us (Diamond). It has an instance (mounted ISO image) of the static main distro, plus an update area which is refreshed weekly. There are also two more repo areas, one for packages obtained from the SuSE Build Service and one for non-distro packages, e.g. PackMan, or locally compiled hacks.
/m1/custom/couchnet.sel and extra.sel have lists of keystone (mostly) packages that are supposed to be on the machine.
The script audit-pkgs has three sub-functions: -i means to install any missing packages from couchnet.sel. -e means to remove any package that is not on the list or is not a dependency of a listed package. -u means to install patches, bringing in new versions of packages that are already installed.
audit-pkgs -u is executed weekly on each host, after patches have been downloaded from SuSE's repo server.
distro.cft.ca.us has a library of configuration file master copies,
with an instance for each OS version, and with the ability to restrict
to subsets of the hosts. The script sync_jump is used to copy modified
files into this library. pushconfig sends the files out to the various
hosts, weekly, according to their OS version. The term
post-jump
is used for this process, carried forward from Sun
Microsystems' Solaris installation.
/m1/custom/conffiles on the target contains specially modified configuration files specific to the target, which pushconfig refrains from overwriting.
For installation, the script post_jump executes on the master site with remote execution on the target. It has these phases:
To install the distro on a new machine I generally either use the network installer or the distro ISO (on USB, usually). I install a minimal system since I've had inconsistent results trying to get the installer to use the update repo. Then I run post_jump.
To upgrade an existing machine to a new distro version, I have two strategies:
zypper dist-upgrade(with the new repo definitions but running on the old kernel and utilities) is consistently successful. This was not always true in the past.
zypper dist-upgradeamid various infrastructure steps. This strategy will work in non-upward-compatible situations like when they changed the RPM file compression algo and when I'm changing architecture from i586 to x86_64.
There is a collection of about 65 functional tests of various services such as sshd, apache2 and cron. Not every service is on every machine; typically 50 services are eligible for checking (and 30 more that get generic checks). I normally run the script checkout.sh after an upgrade, which runs these tests. Also a script does the tests every three hours and restarts (most) services that have died or that fail the functional test.
For using Tumbleweed I want to continue as much as possible with these admin scripts and procedures. I envision these changes:
In a rolling distro they just add/replace new versions of packages in the main distro directory, and I will need to add/delete such packages instead of adding to the update repo. (Tumbleweed does have an update repo but it has only urgent patches that will soon be moved into the main repo.)
On the other hand, I could consider having a complete mirror of the main distro. The disadvantage of the complete mirror is its size and its frequent updates. SuSE warns that the complete content including Tumbleweed occupies 2.95Tb. It should be possible to mirror just Tumbleweed, 40.2Gb. HTTP URL: http://rsync.opensuse.org/tumbleweed/repo/oss/ The following is written assuming I'm storing only the packages I need, not making a complete mirror.
There is a Tumbleweed DVD ISO (3.9Gb) that appears to be updated every day or two. Also the net installer (0.11Gb) and rescue CD (0.65Gb).
Nasty little tidbit: the net installer downloads the
installation system, which knows which product
it is
a part of. This means that I can install from download.opensuse.org,
or from a local copy of the DVD (local HTTP), but I can't mix them up
or install from a local subset of the distro.
But there's a workaround: when starting the installer put kexec=1 on the kernel command line. It will download the kernel from the installation media to be used, e.g. the local repo, and reboot using kexec.
I need to have the SuSE repo active at all times, so the installer
knows about updated packages. See below in
New Repo Definitions
about priorities.
I also need to configure the lifetime of the package index from SuSE. This is enormous, and likely it will take a long time for each client to download it and rebuild the index of all 39,000 packages. The packages.gz file is 9.0Mb, compared to 1.5Mb for the 42.1 DVD, which has only a subset of the packages.
A web proxy (Squid) could considerably speed up downloading packages and metadata from the SuSE server. However, I should rely on Squid for intra-day downloads, but on cowboy programming, i.e. the local mirror, for long-term storage of RPM files.
For weekly updates, do I have to start the process and then wait
for a slow download? How could I do this overnight, similar to the way
I refresh the fixed distro's update area? I'll download the Tumbleweed
repo metadata (packages.gz), then compare with the metadata of files
in the local mirror. If there is an updated version I can download
it on the spot. It should be possible to use the requires
and
provides
metadata to recognize when the updated package requires
new packages that we need to download. Update: this plan was
canceled, too complicated. New versions on the master of installed
packages are downloaded without regard to possible new dependencies.
When the new version is installed, the new dependencies are downloaded
at that time, and Squid will speed that up.
When packages are added to or purged from the local mirrors, their metadata has to be reindexed. I wish this could be done overnight, but I haven't come up with a secure way, so a report will be generated, and reindexing will be part of the later update/installation process.
I still need the SuSE-build local mirror, for packages obtained from developer homedirs or special collections, but presently most of the content is main distro material that isn't on the DVD, and these will migrate to the main distro's local mirror.
I should turn on keep-packages for the SuSE repo. Then after an update/upgrade I will retrieve the downloaded packages (if any), save them in the local mirror, and rebuild its index. The overnight update script is supposed to prevent this from happening, but inevitably I'll need to cut corners causing missed packages.
I need to create or modify these scripts and/or services:
Set up the Squid web proxy. [Done and working.]
Tumbleweedor
OpenSuSE(case insensitive) in the path name.
http://download.opensuse.org/distribution/13.1/repo/oss/?proxy=distro.cft.ca.us&proxyport=3128It turns out that you only need ?proxy=http://distro.cft.ca.us:3128 i.e. a complete URL including the scheme and port.
We need to change the repo definitions to use the proxy. [Done.] And maybe it's time to make the pathnames a little more logical. [Guess what, I already changed the HTTP and DIR paths for the main distro to be just SuSE, not SuSE-dir.]
audit-repos, when installing repo definitions on the master site,
needs to use the local disc rather than HTTP URLs. This way, problems
are avoided when you update Apache. Change
http://distro.cft.ca.us/SuSE/etc
to
dir:/s1/SuSE/SuSE-distro/etc
(only one initial slash). [Done.]
commrepo (new script) reads the repo metadata files of the SuSE Tumbleweed repo and our local mirror. It emits one list of relative filenames of packages that have been updated on SuSE, plus required packages that are not locally available. It emits another list of obsolete local packages. We should have a policy of how many versions to retain, probably just two. The script should be able to take a list of keystone packages, and to include in the obsolete list all locally stored packages that are not required by any keystone.
commrepo needs a wrapper that downloads the SuSE metadata file, execs commrepo, downloads the listed updated packages, tosses obsolete packages, and rebuilds the local repo index.
I have commrepo mostly written, but I'm getting nervous about security patches, and about how long it's going to take to debug this thing, so I'm deferring completion. I will rely on manual steps and simple scripts instead.
Instead of commrepo, I'm going to have a new script called updaterepo. It will basically be an improved rsyncsuse. See previous discussion of the rsync features appropriate for updating the local mirrors. It will dynamically match up local repos and off-site masters. [Done.]
mksuserepo (existing) needs some way to sign the repo index without human supervision. PROBLEM, not done yet.
audit-pkgs (existing) needs a new mode to do a dist-upgrade. [Done.]
It also needs to refuse to update any of the Apache or Squid components unless given a special flag. If all the hosts are downloading packages from the local mirror, and Apache goes out of service on the master, chaos ensues. The right procedure is, update the master with the Apache flag, wait for it to finish, then run audit-pkgs on the other hosts. The way it will probably work out is, you update all the hosts at once, the one on the master site gives an error message, and you do that one over with the Apache flag set. PROBLEM, not done yet.
retrieve-pkgs (new script), executing on the master, will recover downloaded packages from the target and then purge them. And it rebuilds the mirror index if anything was downloaded. [Done.]
distupgrade (new script) is kind of a mini post_jump that will do much of the post_jump activities, suitable for weekly updates. PROBLEM, not done yet.
Here's an important detail: what version number do I use for Tumbleweed?
In some contexts SuSE uses tw
, but my scripts in some cases assume a
numerical version number. I'm going to use 99.8.
Following along in my writeup on
Upgrading to OpenSuSE Leap
42.1:
I need a lot more repos for Tumbleweed than for v42.1. All will need new repo template files except SBS (SuSE Build Service) and CouchNet. Except for Couch-SuSE-tw, the local mirrors contain only installed packages.
As discussed previously, I don't want to locally mirror the entire Tumbleweed distro (40.2Gb with swarms of updates each day) plus PackMan; instead I want to keep mostly the packages I am actually using. However, I want to do network installations from a local mirror, and that means I need to put on the (local) net an authentic, untampered Tumbleweed snapshot DVD ISO image. [Done.]
I need to set up repo priorities. My goal is for the latest version of each package to be installed, whichever repo has it. But unfortunately Zypper has priorities backward. At least according to the SLES deployment guide dated 2016-03-14, if several repos have a package (referring to the basename), it is downloaded from the repo with the numerically lowest priority (in 1-200, default 99). If there is a choice of versions in repos of the same priority, the latest is preferred, but later versions in higher numbered repos are not considered. If I make the SuSE repo's priority numerically lower than the local mirror, I'll get the latest version, but for every package: the local mirror will never be used. Therefore I will need to identify and download new versions to the local mirror, but only for packages I actually use. This script is called updaterepo.
I've adopted these priorities (lower numbers are preferred) (repo aliases beginning with Couch are local):
In the local mirrors I want to limit the number of old versions kept. In the main OSS repo most packages have only one version, and when multiple they are close in time; the biggest time spread I saw was one month. A reasonable strategy would be to download every version but to delete local versions that are no longer on the master repo.
Designing the updaterepo script:
It matches up local and remote master repos. It turns out that you can put extraneous parameters in a repo definition file such as Xmaster (in the local repo) which identifies the master from which new packages should be downloaded.
It makes a list of all the packages in the union of the local mirrors. Duplicate packages in different mirrors are possible.
It makes a separate list of potentially deletable packages. Reasons for non-deletion include being on the DVD, being in a repo with no remote master (i.e. CouchNet or Couch-Build), or being too young; downloaded packages are retained unconditionally for 30 days.
For every host having relevant versions (presently only 99.8), it collects a list of installed packages from rpm -qa. It merges, uniquifies and reformats this list. The result shows all packages that should be in the union of the local mirrors.
Unused packages are purged. The package must be potentially deletable, and not on the master (comparing the version), and not installed on any host (comparing only the basename).
Packages on the remote master with the same basename as an installed package are downloaded to the matching local mirror, if not there already. This includes all remote versions, particularly new versions that are not yet installed.
There could be packages from no known repo; those get a noisy
warning that they should be found and saved in the appropriate local
mirror. The union of the contents of the local and remote repos are
known
.
How is updaterepo going to obtain the rpm -qa output? This involves copying a file from each of the hosts to Diamond. I want it to run overnight, so it will have no human supervision and authority.
NFS: This is the natural choice for Mathnet (at work), but on CouchNet NFS has not been reliable, particularly for the virtual machines, so NFS is rejected.
scp (SSH): It requires a remote execution key. The generic key is not available. A special execution key is possible, and I've done it before, but it's kind of complicated, and I'm leaving special execution as a less preferred solution.
HTTP: Although all the CouchNet machines are supposed to have webservers, using the full infrastructure of Apache just for file access seems like overkill, and in a development situation the webserver may become broken. On Mathnet only servers, and only some of them, have Apache. So HTTP is rejected.
FTP: On CouchNet all hosts have socket activated vsftpd and it works pretty reliably. I think the same is true on Mathnet (can't quite remember). The infrastructure for FTP is a lot more lightweight than for HTTP. I think this is the protocol I will use.
Not so fast! At midnight when the updates are downloaded, almost all of the hosts are asleep, and the installed package lists cannot be retrieved. Let's try more mechanisms.
NFS again: The leaf node can write its rpmqa file to a directory on
a machine that's always up: Jacinth. Updaterepo runs on Diamond, which
can retrieve the files from Jacinth. But we still have the issue of
not 100% reliable NFS. And root on the leaf node will be squashed to
nobody on Jacinth, hindering writing. But I can su
to some
other user and chown the receiving directory to that user.
NFS security has been tightened up, and it appears to cue on the real UID (i.e. root) vs. effective UID of the user trying to steal the file. The leaf nodes, executing as root, were unable to write on the destination directory on Jacinth. Directories that the client does not have permission to traverse just don't appear in a directory listing (readdir).
Mail: That's so low tech, but is probably the cleanest solution. I generalized the housekeeping report storage script, and the installed package lists are being deposited.
You can source the file /s1/SuSE/source.me to set useful shell variables for the distro root and the major scripts in distro maintenance. This file was reviewed for being up to date. It auto-sets the architecture (according to the machine it's run on) and the release, taking the lexically highest version in /s1/SuSE/SuSE-build/$ARCH/ , which will be the new version when you create the directory for it. Here is jimc's version of source.me.
We need to keep files for v42.1 until all hosts have been upgraded, but 13.1 can definitely be tossed. There is nothing from before 13.1. All sub-repos already have only x86_64 (and noarch and setup). At home, before cleanup, the repo occupied 114 Gb (oink). After, 69Gb; tossed 85Gb.
At the same time, we need to get rid of the configuration files for
old distro versions. These are in /home/post_jump/${RLSE} or on Mathnet,
/h1/post_jump/${RLSE}.
Being anally retentive I made a directory ./ancient and moved all the
obsolete dirs into it. When deleting one of these directories it's a
good idea to shred the non-public files. Here's how, illustrated for
distro version 10.2:
find 10.2 -type f ! -perm -04 -ls |& less
#Are you going to remove the right files?
find 10.2 -type f ! -perm -04 -print | xargs -n 25 shred -u -n 3
rm -r 10.2
Create all the directories for distro components. Owned by root:root, mode 755, except the update dir has to be owned by wwwrun. The actual content (metadata files) is provided later. These are illustrated for the new version, 99.8 and all paths are relative to $di (/s1/SuSE).
Web links:
To be totally straight arrow about your security, you need to follow this procedure for each downloaded image, here called $name.iso One of my downloaded files came from a Russian mirror site, and given the pervasive dirty tricks (particularly in I.T.), from Russia in the USA 2016 election, it would be a good idea to take security seriously.
You should have downloaded the checksum files ($name.iso.sha256).
You need to download the openSUSE Project Signing Key
if
you don't already have it. Command:
gpg --keyserver hkp://keys.gnupg.net --recv-keys 0x22C07BA534178CD02EFE22AAB88B2FD43DBDC284
It is also included in your old distro. Take the last 8 digits of the
key hash, convert to lower case, and look for
$distro/gpg-pubkey-3dbdc284-xxxxxxxx.asc . Then do:
gpg --import $distro/gpg-pubkey-3dbdc284-xxxxxxxx.asc
If you really believe in this key, sign it yourself, to avoid the messages that there is no trusted signature.
Check the signature of the checksum file, proving that it was
signed by the openSUSE team (or someone who had stolen their secret key).
GPG is picky about file extensions, so you need to fake it out like this:
ln -s $name.iso.sha256 gorf.gpg
gpg -a gorf.gpg
It maunders that the signature is good, but there is no trusted signature; there is no indication that the signature belongs to the owner. The payload is written to the file without the gpg extension, i.e. gorf.
Now check the file itself:
cat gorf # And note the filename, the second field, after the checksum.
ln -s $name.iso $filename
sha256sum -c - < gorf
It should report $filename: OK
. If not, either you mixed up the
filenames, or there was an undetected error during download (unlikely), or
the national security agency of a country that shall remain unnamed is
meddling with I.T.
infrastructure.
Remove the gorf files and symbolic links, and repeat for the other ISO images.
A less high tech alternative is to do sha256sum $name.iso
, then
grep Tumbleweed $name.iso.sha256
, and compare the sums by eyeball,
64 hex digits.
Formerly I mounted the fixed distro's ISO image on ./SuSE/x86_64/$version, but for Tumbleweed I need to populate it from the DVD (if I downloaded it, which I didn't) or from the SuSE repo itself. Metadata for the SuSE-build and CouchNet repos can be copied from the old versions. [Update: I still need to mount the DVD, and to download an up-to-date instance before upgrade campaigns or before installing Tumbleweed on a new machine.]
To populate the SuSE repo: (0.67Gb download) (The barely visible last
argument is a dot, meaning to populate the current directory.)
cd ./SuSE/x86_64/99.8
rsync -a --no-r --dirs --log-format="%o %f" rsync://rsync.opensuse.org/opensuse-full/opensuse/tumbleweed/repo/oss/ .
rsync -a --log-format="%o %f" rsync://rsync.opensuse.org/opensuse-full/opensuse/tumbleweed/repo/oss/{boot,docu,media.1} .
To populate the non-OSS repo: (0.0002Gb download :-)
cd ./SuSE-noss/x86_64/99.8
rsync -a --no-r --dirs --log-format="%o %f" rsync://rsync.opensuse.org/opensuse-full/opensuse/tumbleweed/repo/non-oss/ .
rsync -a --log-format="%o %f" rsync://rsync.opensuse.org/opensuse-full/opensuse/tumbleweed/repo/non-oss/{boot,media.1} .
Package keys wanted: The first two come with the main distro DVD; the others go in the CouchNet or Mathnet repo.
These keys are obsolete:
How to identify a public key:
gpg gpg-pubkey-0cc9523f-4a9865cc.asc
It prints: pub 1024D/0CC9523F 2009-08-28 UCLA-Mathnet Distro Signing Key <distro@math.ucla.edu>
For Tumbleweed the update repos and the main repos are updated the same way using the new script updaterepo. See the 42.1 transition writeup for how to update the script rsyncsuse if reverting to fixed distros.
We need to switch from weekly execution of $di/bin/rsyncsuse.sh to updaterepo. [Done.]
One of my virtual machines for development is called Oso. It is hosted on Diamond, the repo site and compute server, using KVM as the framework and qemu as the virtual executor. The guest connects itself to the host's network bridge (br0) and thinks that it is directly on the local network. Its disc is 17.2Gb (external). Deducting the swap partition and inode storage, it has 15.3Gb (total internal) of which 7.3Gb is unoccupied. This should be plenty to hold the RPM files that have to be downloaded from SuSE. So I'm going to develop it up the kazoo: it will be the first machine to migrate to Tumbleweed.
VM Specifications: Oso is defined in this file. Key parameters of Oso and Petra are:
An initial step is to junk old saved copies of Oso's disc and to make a
new backup of its current state on v42.1. Command line:
gzip -c disc1.raw > ./disc1.421final.gz
Run time: 12 mins. Output file size: 5.7Gb.
Commands to use on the virtual machine with libvirt:
Lurking dragon: make sure on the VM that /etc/default/grub_installdevice says to install grub in the MBR. 99% of the time you want the device to be (hd0), not e.g. (hd0,1) which would install in the root partition's boot sector. To detect whether (hd0) is correct you could do:
grub2-probe -t drive /boot/.
Remove the partition ID, e.g. it might print (hd0,msdos2)
and you
ignore ,msdos2
keeping (hd0)
.
Now transforming this package list into a script to download the actual packages to the CouchNet enterprise mirror. 10min flat to download; sheesh, that's fast! 2.46Gb actually downloaded. We have 50Mbit/s FIOS. Here's the script:
#!/bin/bash # Downloads a list of packages. Do "rpm -qa" to get the list. # Expects to see names like pkgname-1.2-3.4.x86_64 function snarf () { local arch=${1##*.} echo $arch/$1.rpm } function myrsync () { # -a = preserve metadata, -R = preserve input path on output, # -no-{o,g} = files to be owned by local root:root, not whatever # special user is on their server. rsync -a --no-o --no-g -R --files-from - --log-format="%o %f" \ rsync://rsync.opensuse.org/opensuse-full/opensuse/tumbleweed/repo/oss/suse/ \ /s1/SuSE/SuSE/x86_64/99.8/suse/ } cat <<EOF | (while read pkg junk ; do snarf $pkg ; done) | myrsync aaa_base-extras-13.2+git20170512.8fa87a3-1.1.x86_64 abiword-docs-3.0.2-1.2.noarch etc. etc. EOF
39 packages could not be downloaded. These are obsolete, except for one of them which appears to be no longer available.
7 packages were in SuSE Build Service (SBS) repositories and were downloaded into the local SBS repo. Two were in the non-OSS repo; a similar repo was created locally and the packages were downloaded to it.
The following 17 packages are probably from PackMan:
I'm going to defer the PackMan downloads, since most of these are undoubtedly 18 months back version, and to get current versions is going to involve either a lot of hand labor, or letting Zypper pick the version, which is much preferred. [Downloads accomplished using the new updaterepo script.]
The next group of steps will be:
Clone the config file library for 99.8. [Done]
Edit the repos in the library. I need to provide definitions for the local mirrors. I need to add proxy parameters to the remote masters, and to turn on keep-packages. See above for a list of repos. [Done]
Create the directories for the local repos, and retrieve the metadata content. [Done]
Create and debug the updaterepo script. [Done.]
It's hard to get rsync's filter rules right. It would not download any files, that I knew were on the server and that the rules were expected to accept. At first I blamed server configuration, but it turned out that I was excluding containing directories. Anyway, I ended up listing all promising files, then picking relevant ones locally, and using --files-from to download only those.
The PackMan repo was a special challenge, since it is a rpm-md type repo vs. yast2 for everything else.
Improve
/s1/SuSE/bin/mksuserepo so it can sign a repo
without human intervention. [Done.] Here we have to balance security,
usability and availability. While the individual packages are signed
(but for SBS packages I rarely have the authors' keys nor any secure
way to know that they aren't fake), it's also important that the repo
contents not be tampered with. In an enterprise, it would be
reasonable to download the files first, then have a responsible sysop
give his passphrase to decrypt the secret key for the repo signing
key. In my small operation this is definitely overkill and I want
to automate the whole process. [Done.] [Famous last words; Zypper
rejected the content file.]
Turn it on to be run from cron. [Done, runs, gets updates.]
Now I'm ready to install Tumbleweed for the first time. I'm going to want to upgrade v42.1 but to save the result before installing any CouchNet customizations, to know what customizations I'll actually need. I'm going to also do a completely default installation on bare (virtual) metal, to know what SuSE wants to give me when not influenced by existing package selection and configuration.
To make progress I need an unhacked instance of Tumbleweed, to compare with the normal CouchNet v42.1 installation. It's going to go on Oso. Steps:
Linuxis a registered trademark of Linus Torvalds. Hit Next. It takes quite a long time to get to the next step.
Status Location:/dev/vda1 (root). It says to not install booter in MBR, do install in partition header.
at your own risklicense. Agree to GStreamer Fluendo license. Start the upgrade. (At 11:27)
Main Update Repo. About 6.8Gb to download. Predicted time 105 mins (varies with weather on the download server). Evidently it's predicted to fit (just barely) on the available disc. On its way...
openSUSE Tumbleweed. It boots. Seems normal (except named failed to start). It started the X-Windows greeter, looks like my lightdm, but died one second later.
Oso is on the net, and many items are working. First I ran checkout.sh just to see what survived. Discrepancies:
key_load_publicinvalid format, but the command was executed.
Only the dbus item is critical. Dbus seems to be working; the problem must be with the tester. I'll debug this stuff later.
I saved a copy of oso:/etc and its package list, rpm -qa
.
Shutting down Oso and making a copy of this disc.
My next step will be to wipe Oso's disc and install on bare (virtual) metal. The result will be called oso-pure. This time around I'm going to accept its offer to put a btrfs filesystem on the root partition. [Update: reverted to ext4.] Later I plan to convert the old machines to btrfs. [Update: not going to happen.] In package selection I will use defaults as much as possible, to find out what they intend to give me, but I will use a XFCE desktop framework and will decline KDE and Gnome (as usual). AppArmor also will be bypassed, but the SuSEFirewall will stay (for now).
I already made a copy of Oso's disc. Wipe the disc with 0's. 17179869184 bytes seems like an odd size; actually it is 234 bytes or 16Gb.
Run virt-viewer. Start Oso, booting from the network install disc,
and pick New Install
.
It is using its assigned IPv6 address from DHCP. Downloading installation system from the SuSE server.
Installation system does not match your boot medium. Sorry, this
will not work.
I tried using the Curses UI, and tried the
up-to-the-minute network installer, unsuccessfully.
Tumbleweed blog post dated 2016-10-02, Dominique Leuenberger answers a user complaint on this issue saying add kexec=1 on the kernel command line, and it will download the initrd and kernel from the repo rather than relying on the back-version boot media which, for Tumbleweed, is never up to date. So how do you do that?
Boot the net installer ISO. Scroll to Installation. Type kexec=1 (no spaces, appears as Boot Options) and hit Enter. Hit Esc when the Plymouth progress bar covers the screen. OK, now it downloads the current kernel and initrd, kexecs it, and goes through the initrd again. Now it downloads the installation system. And it starts up!
Initial installer steps:
Software Selection: Since I picked the Custom role, I'm minus some software. I made these changes, trying to leave defaults as much as possible.
Final installer steps:
Logging in as root: flip to VT1; too dangerous to run XFCE as root. It knows that the hostname is oso. It's using the correct IPv4 and IPv6 addresses on ens3. It let me on.
In diamond: /scr/oso-setup-1706/oso-pureinst I saved a copy of the package list and of /etc.
When rebooting Oso, remember that it still has the network installer disc in the virtual drive, and there will be a 30sec timeout before it does the default of booting from the hard disc. Soon this can be reverted.
Actually I faked myself out: Oso is supposed to be set up as a user workstation (rather than pure development), so it should have these patterns turned on: Office, Technical Writing, and Games. In couchnet.sel I don't select everything in these categories; the purpose is to see what they think a normal user workstation ought to have on it. Now I'm installing the omitted patterns. I've turned on keeppackages=1 and will retrieve missing packages into the local mirror repo. 2672 packages to download including texlive. Downloaded 1.3Gb, installed 2.8Gb, elapsed time 41min including 4min post-install activities.
For testing, I'm also creating a local account for myself (with SSH access).
I now have a set of files of package basenames (minus version and arch) on Claude (1967 packages) (Oso should come out similar), Oso pure installation (1859 packages), and Oso upgraded (8388 packages, oink). Plus Oso not upgraded but with the user packages (4531 packages). Command line to create these lists:
ssh oso rpm -qa | sed -e 's/-[^-]*-[^-]*$//' | sort -o bases.oso.pureuser
Package groups on Claude-42.1 and not on Oso-pureuser. Not every package,
just the interesting
ones.
Package groups on Oso-pure and not on Claude-42.1:
Notable daemons running on Oso-pure:
Packages are not radically different from what's on Claude now. Actually one of my leading issues is, do I get valuable security by running auditd, which I don't run now? There are, as usual, several interesting unfamiliar packages in the default load, which I will want to investigate later.
For package selection, I'm going forward just copying couchnet.sel from v42.1. Of course screwups will be revealed and I'll deal with them when found. The goal in this step is to run post_jump on oso-pureuser and then to get it to pass the tests in checkout.sh. Real reconciliation between v42.1 config files and as-installed Tumbleweed config files will happen once that's done, so the edited v42.1 files will be functional in the Tumbleweed context.
Here's a list of needed script updates:
-Uto do a dist-upgrade. [Done.]
Next step is to run post_jump on oso-pureuser, fix program problems as I go along, and finally work on package selection problems.
I'm running in a subshell, cutting and pasting post_jump line by line, to catch problems immediately.
It installs /etc/resolv.conf from the master to the slave, and the master (Diamond) has link local DNS addresses like fe80::201:c0ff:fe12:3044%br0 (and br0 doesn't exist on Oso). The resolver library successfully ignores them and uses the global addresses. I didn't try to fix this.
Among the basic config files installed in the first batch of post_jump, the only major change is to sshd_config. The v42.1 sshd_config works on Tumbleweed.
Tumbleweed's /etc/os-release lacks VERSION= and the architecture. Got to fake it for Tumbleweed by cowboy programming, several places. It works now.
audit-pkgs -G takes a long time to deliver the list of wanted kernels. Be patient, it's not hung.
Squid installations in general should be configured to avoid open proxying, i.e. downloading anything for any client, so they aren't used as force amplifiers in a DDOS attack. My squid was cueing on target hostnames (download.opensuse.org etc), but if (when) the request is diverted to a mirror it will fail giving a 403 return code (permission denied). So we have to cue on path segments. I ended up with these ACLs:
I ran audit-repos which copies the new repo definition files to
the target (Oso), but I'm having trouble refreshing the repos.
The symptom is, it downloads
some or all of the files, says it is rebuilding the cache, and then
maunders, Failed to cache repo (4).
Without useful error
messages. strace was my friend. There were several screwups
that had to be fixed.
When the new repo packages are downloaded at midnight, e.g. from
download.opensuse.org-oss to Couch-SuSE-tw, how am I going to sign the
repo metadata? I put in a -i option to mksuserepo making it omit
packages.gz and friends, because those are the only ones that change
(and are the most important for security). The resulting content file
was accepted by zypper-1.12.50-19.1 for OpenSuSE Leap
42.1 but
was rejected by zypper-1.13.28-1.1 for Tumbleweed as of 2017-06-20.
Re-signing without -i fixed some of the CouchNet repos. I'll need to
deal with unsigned keys in PGP, later.
When I refresh Couch-Build-tw and Couch-PackMan-tw (zypper refresh), /usr/bin/susetags2solv from libsolv-tools-0.6.27-2.1 gets SIGSEGV. Always fails. "rpm -V libsolv-tools" returns 0 (not corrupt). ldd indicates no weird libraries. (It uses liblzma and libbz2 but nothing for regular gz.) It was called with the -c $dir/content option, and is supposed to receive a packages file on stdin. It writes a solv file on stdout.
What do you want to bet, that the URL-like objects in the content file are the cause of SIGSEGV? Working on Couch-Build-tw. Removed NAME, which is probably a holdover from SuSE-11.4. That cured the problem! Also worked for Couch-PackMan-tw. download.opensuse.org-oss/content has a REPOID line and Couch-Build-tw doesn't; lack of that line doesn't seem to make trouble.
Web resources about YaST2 repository metadata:
Back to running audit-repos. After I put in the cowboy programming to use version 99.8 to represent Tumbleweed, it now works and the repos are refreshed.
Once again running post_jump by hand. Oso has btrfs with about 22 sub-volumes. The section in post_jump to generate /etc/exports is totally flummoxed. I'm not going to try to fix it now -- have to come back to it. PROBLEM
Move /usr/local into /m1/local… Workstation
means
a Sun3-50 that mounts /usr/local (etc.) from its fileserver. With
btrfs, the host mounts
the /usr/local subvolume, faking out this
test. I'm commenting it out.
Another problem -- with btrfs, /usr/local is mounted, so you can't remove it and replace with a symlink. This needs to be fixed somehow. PROBLEM [Fixed I hope]
pushconfig also needs to be able to determine the version that we're using with Tumbleweed.
Phase 3, /dev/root symlink, with btrfs, we mount by GUID rather than /dev/whatever, and it fails to determine the root filesystem device. Actually when I'm using labels to mount the root on v42.1 it fails also. Looking for it in /proc/mounts fixes the problem.
/etc/machine-id has mode 666! Oooooo! A bug! PROBLEM
Phase 3 ends with installation of local kernel modules in t=$opt_s/mathnet/modules/$arch/. , with a note saying it's broken and should be fixed or removed. This of course is only for Mathnet and I don't think we've used this for ages. It ends up finding and installing no modules. If we ever have to resurrect it, I'm leaving the code in.
Phase 5, installing missing packages. It installed package signing keys and removed existing related keys. How likely is this bogus?
These wanted packages were not found:
These packages were found through capabilities:
These conflicts were resolved manually:
428 new packages to install, 1 to remove, 3 to upgrade with vendor change. 439Mb to download, 1.7Gb installed.
These packages could replace installed ones, but they come from
a repo with a lower priority. Spot checks show the installation
was from Couch-SuSE-tw (priority 64) while the lower
priority (100) belongs to download.suse.org-oss (the master repo).
This issue would be resolved if packages were freshly downloaded
from the master.
Why are we installing ntp-doc when we use chrony? Because we're idiots. [Removed from couchnet.sel.]
Erasing unwanted packages: Removed 2316 packages. While there were warning messages, none appear ominous.
Investigate these interesting packages that were removed. The list of removed packages is saved in ~/upgrade/suse-tw/removed-pkgs
Dist-upgrade step. post_jump now uses -U for Tumbleweed and -u for any other version. audit-pkgs has the -y option; is this wise? No, and it was removed in v11.4 (I think) with the advent of the --non-interactive option to zypper.
It proposes to add 19 new packages: flash-player-ppapi php7 and php-7 modules. To remove chromium-pepper-flash php5 and the same friends. (These are all the currently installed php5 modules.) Upgrading 14 multimedia packages, all vendor change to PackMan. Total of 35 packages to be downloaded, 16.6Mb. Minus 4.9Mb after the installation. Doing it.
It wanted to reinstall Mesa-demos and pam_ldap again. I didn't see any error the first time around. Doing it. It downloaded and reinstalled the packages (NOKEY warning), and wanted to do them again, infinitely. Killed. This has got to be a bug in zypper. PROBLEM Also, audit-pkgs should do the dist-upgrade only once. [Fixed.]
audit-scripts:
fixedby ignoring them.
These items were enabled and not in conf file; they were allowed to be killed. runlevel5.target dm-event.socket fstrim.timer logrotate.timer
Comparing /etc/passwd group shadow: passwd, shadow were unchanged. Added wwwrun to group www and vnc to shadow, and group wwwrun is new (484).
/etc/krb5.conf is missing! krb-maint test -v of course failed. At some point the symlink got installed and Kerberos is back in action.
Housekeeping -- it should exec /tmp/fixup.sh to accept all the new files. [Done.]
Check if /etc/default/grub has the correct version name (Tumbleweed). [Edited file in post_jump dir.]
Checking X.509 certificates: /etc/pki/trust/anchors-jail does not exist, should be created. [Added to post_jump dir.]
I used retrieve-pkgs to retrieve 69 packages downloaded from PackMan and from the main SuSE repo. retrieve-pkgs uses the -b option to mksuserepo, which I've learned is poisonous. [Fixed.]
Rebooted Oso. Fell on face. (Eventual conclusion: I'm not going
to debug btrfs today, reverting to ext4.) Grub says: error: file
'/@/.snapshots/1/snapshot/boot/grub2/i386-pc/normal.mod' not found.
Response #1: Installer thinks you have EFI when you don't.
Mount the root partition (on /mnt) and reinstall grub2.
grub-install /dev/sda --root-directory=/mnt (needs grub2-install)
Another forum post: Avoid installing Grub in a partition's boot
sector because the partition's filesystem may move blocks around.
If it's also on the MBR, it may be unable to read its core image.
It's OK to have a dedicated partition for Grub, e.g. BIOS Boot
Partition on GPT.
Booted the rescue system. It took about 90 secs to download the
initrd, then kexec-ed what it had gotten (didn't need manual kexec=1).
Loading rescue system (from SuSE), very slow today. fsck doesn't work
on btrfs, use btrfs check /dev/vda2
. It took about 120 secs,
half of it in checking extents (with no progress indication).
No errors. Mounted. /.snapshots dir is empty. /boot/grub2/i386-pc/
exists but is empty. Did grub not get installed?
grub2-2.02-3.1.x86_64 is installed but normal.mod is nowhere to be found. grub2-ie86-pc-2.02-3.1.x86_64 is installed and provides /usr/lib/grub2/i386-pc/normal.mod . grub2-install /dev/vda --root-directory=/mnt (it says, no error reported). Unmounting and rebooting.
Well, that's a regression: grub says error: disk 'hd0,msdos2'
not found.
https://www.gnu.org/software/grub/manual/html_node/GRUB-only-offers-a-rescue-shell.html#GRUB-only-offers-a-rescue-shell
Per set
the prefix is (hd0,msdos2)/@/.snapshots/… (the
path that failed), and root=hd0,msdos2 . ls
reports no devices
in existence.
lsreports nothing.
Conclusion: this is going to take some serious debugging, and I need security patches a lot more than I need btrfs. So I'm going to revert to ext4. First, can I rescue the data from the btrfs filesystem? If not, I'll have to reinstall from the beginning.
losetup -f # Prints the name of an unused loop device
losetup -P /dev/loop0 disc1.raw # Attaches the device to the file; -P = read partition table
losetup -d /dev/loop0 # Tears down the loop device
In parted, do unit B
and print
, and you will get
the partition boundaries in bytes, for use with the -o option of
losetup. Remember to remove the unit of 'B'.
It looks like the rescue is succeeding. Later steps:
Booting the rescue system: I tried from Diamond, and it found no bootable kernel. Missing from the distro? I reverted to download.suse.org-oss.
Somehow the backup GPT was corrupt (but the primary one was OK). Fixed with gdisk. But it still won't mount /dev/vda2. Rebooted the rescue system. Still won't mount /dev/vda2. I'm not going to be dragged into debugging this -- I'm going to trash the disc and start from scratch. This is going to replicate oso-pureuser.
Reinstallation: btrfs out, ext4 in.
When starting the net installer, you need to preset two items: Source is http://diamond/SuSE/x86_64/99.8/ (no leading slash, yes trailing slash in the directory part. Then, kexec=1 as a boot option.
$URL/content: invalid signature. Installation aborted. Duh, content.key is for the openSUSE Project. Need to replace it with the CFT public key. [Done.] May the fleas of a thousand camels infest someone's nether regions! It still refuses the repo. Reverting to the stupid download.opensuse.org-oss. Installer is running. [Update: the problem was that a required checksum, for packages.gz, was missing. If it's provided, a human has to give the password for the distro signing key, precluding fully automatic overnight updates. Hiss, boo! The change in mksuserepo was reverted so packages.gz is included.]
Partitioning: It proposes to shrink the existing swap partition, plus BIOS Grub, another swap, btrfs root, xfs home. Wipe them all and give it: BIOS Grub (needs 1.0Mb, rounded to a cylinder boundary of 7.84Mb, do not format, do not mount), swap (2Gb), root (ext4, the rest).
Computer role: Server, Text Mode. Local user: Create myself, so I can get in if sshd won't let root log in. Installation settings: Booter in MBR, not in root partition. Enable and open firewall for SSH.
Software: AppArmor off, 64bit runtime on, practically nothing else. These were specially added: m4 expect pam_krb5 krb5-client. pam_ldap is not in the main SuSE repo but is in Couch-Build-tw. 1054 packages, 525Mb to download, installed 1.8Gb.
Let her rip! Start 15:46, done 16:03 (17 mins), post install activities 2 mins more. It rebooted successfully, getty on vt1. ssh as root works; got the new host key.
Making a backup of Oso's disc. disc1.textmode.
post_jump -r 99.8 oso :
Getting Oso to run after post_jump:
systemctl status wickedsays the network is up. The network interface is ens3 while the firewall looks for the trusted network on eth0. Got to create a udev rule in /etc/udev/rules.d/70-persistent-net.rules
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="52:54:00:09:c8:d4", ATTR{type}=="1", NAME="eth0"
virtio_net virtio0 ens3: renamed from eth0. This has to be fixed.
hostname --file /etc/hostname, and it's fixed. This fix was propagated to /etc/systemd/system/earlyhostname.service on all hosts.
I ran checkout.sh . Discrepancies:
Big ugly circular dependency in boot units: basic.target
timers.target time-sync.target chrony.J.service network.target
wicked.service firewall.J.service basic.target .
Earlier units require later units. [Fixed
by ignoring 618
ridiculous circular dependencies, most not involving my units.]
This deserves a bug report. PROBLEM
Wicked: IPv6 is using a pool DHCP address plus RFC 4862 plus a random address (RFC 4941 privacy address). Need to (a) get rid of the lease [done], (b) suppress the random address AGAIN. /etc/sysctl.conf turns this off as it should (in key net.ipv6.conf.$IFC.use_tempaddr = 0 or file /proc/sys/net/ipv6/conf/$IFC/use_tempaddr) but something is configured to turn on the privacy thing periodically, almost certainly it's wicked. This deserves a bug report. PROBLEM
All tests from remote machines to Oso fail because of the IPv6 addressing. [Self healed.]
Dbus: could not connect, one of those failures when dbus is obviously running. [Fixed, problem was a race condition in the tester causing it to lose a message.]
alsasound is inactive, no sound on this machine. Maybe the tester should be a little smarter about this. [Smartened]
apache2 has no ServerName, would not start. Never set up because Oso is going to be reinstalled repeatedly. I did so, but this setup needs to be saved somewhere so it can be restored. [Saved.]
autovt@ confused the tester. [De-confused.]
bridge.J is disabled but is active (why?) Same for wakeup.J. [Fixed Scripts.pm, status was 'generated' not 'enabled'.]
krb-client.J says Oso is offline. Kerberos is hosed. [Fixed, the symlink to /etc/krb5.conf was missing but eventually got installed. Kerberos works now and the host keys were installed.]
rsyslog: Can't find the test message. Nasty! systemd-journald used to forward messages to syslog by default, but this was turned off in systemd-216. You need in /etc/systemd/journald.conf to explicitly set ForwardToSyslog=yes. To limit journal bloat, I also set MaxRetentionSec=14day and MaxFileSec=2day. This means to toss messages older than 14 days, and to rotate files every 2 days.
slpd: Took a long time to start, evidently it died. This screws up logrotate when it tries to HUP. strace didn't give obvious clues. Disabling slpd; try to fix this later. PROBLEM
display-manager is hosed. I'm using lightdm, and the symlink that
picks the greeter (formerly lightdm-webkit-greeter-0.1.2) is
dangling. The greeter is now called lightdm-webkit2-greeter-2.2.2
and it doesn't seem to call update-alternatives. I couldn't
figure out the correct arguments for update-alternatives --install.
I tried fixing the symlink by hand, but that just moved the failure
elsewhere. Time to switch display managers once again, this time
to mdm (Mint Display Manager). It's working enough that testing
can continue, but needs customization.
(See Customizing MDM
.)
rpcbind.socket: netstat is not installed, tester failed.
netstat is in package net-tools-deprecated. You're supposed to use
ip
instead. In my case fuser /run/rpcbind.sock
was the better choice.
rpmconfigcheck is in a weird state. systemctl start
rpmconfigcheck
works fine, but it doesn't start at boot.
See fix further on.
sshd says: key_load_public: invalid format
. But the
command does get run. In the initiator's public key file I had
the limited execution stuff (command="uname -a" etc).
This used to be OK, but now results in a warning. [Removed, and
the tester was fixed to not need it.]
(Why does the initiator need the public key file? To send the key
hash so the server can find its copy of the public key, without
having to multiply out the private key into the public key every
time.)
These daemons are wanted but disabled: bridge.J console-kit-daemon wakeup.J . console-kit-daemon is gone. bridge.J wakeup.J aren't really disabled. [Fixed.]
These daemons are unwanted but are enabled: autovt@ [Added to scripts.dat.] And the tester prints the list with no indentation and no ending newline: messy output. [Output fixed.]
Additional issues noted:
logrotate is not getting run! See /usr/lib/systemd/system/logrotate.timer and .service (disabled). [Fix: another script logrotate.J in /usr/diklo/lib/daily for v99.8 only.] [slpd refuses to start up and causes an error when it is hupped after rotating its log file.] [Disabling slpd.] PROBLEM
This message is logged frequently, at least once/minute:
dns-resolver[4595]: ATTENTION: You have modified /etc/resolv.conf. Leaving it untouched...
This appears to have self-healed.
Need to enforce update-alternatives choosing /usr/lib64/libopenblas_serial.so.0 for libblas libcblas liblapack libopenblas. I had this at Mathnet; not on CouchNet? PROBLEM
More script and infrastructure changes:
Now that I have a working VM, it's time to get the configuration files into shape. The first is couchnet.sel, i.e. package selection. All references to v42.1 and v13.1 are to be tossed, except for historical notes. [Done.] I didn't do a package by package evaluation of which should be kept on the machine. Probably I should come back to do that, but there's no sign of radical reorganization, and I want to make progress so I can get my security patches installed.
After installing oso-pureuser I saved the pristine /etc. Comparing that with post_jump. Executing on Diamond.
diff -r -w /home/post_jump/99.8/etc /s1/scr/oso-setup-1706/oso-pureinst/etc >& oso.diff
There were 102 unequal files and 8689 lines of diff. Here are the
highlights, omitting cases (like /etc/hosts) where the CouchNet version is
obviously
the better choice.
/etc/csh.cshrc has 300 lines of intricate stuff that we're replacing. Similarly 300 lines for csh.login, and 400 lines for /etc/profile. Plus particular packages drop code fragments in directories that they source. Someday I should very closely look at what I'm chopping out.
/etc/cups/cupsd.conf -- There is a possibly new stanza for
<Policy authenticated> containing JobPrivateAccess default
and similar. CouchNet is not paranoid about printing, and I think
I can safely ignore this, since no activities are governed by this
policy.
/etc/default/grub -- GRUB_DISTRIBUTOR provides the OS name for the boot selection menu. Starting (when?), if you leave GRUB_DISTRIBUTOR unset, the installer will get it from /etc/os-release. I changed to unset it, keeping all other CouchNet hacks.
/etc/logrotate.conf -- The nocompress parameter was bogusly not
inherited in per-file stanzas, starting in v12.2 and continuing through
Leap
42.1. This appears to be
fixed in logrotate-3.11.0 (Tumbleweed 2017-07-23). All non-global
nocompress parameters were removed. [The files are not being
compressed, except for zypper.log which is supposed to be compressed.]
Checking in /etc/pam.d/common-session* -- Yes I have these:
su; you want the real UID.
/etc/pam.d/useradd and usermod -- They have pam_permit.so for everything; I do a complete session. Which is right? What happened to userdel?
It's long past time to get rid of the fallback PAM directories. To remove: pam.d (symlink) pam.d.114/ pam.d.nok114/ pam.d.rpmorig/ [Done.]
Need to actually read the man page for ssh_config and sshd_config and update them as needed. The v42.1 file seems to work OK, though. Tumbleweed has openssh-7.2p2 (vs 6.6p1 in v42.1), so for KEX, curve25519-sha256@libssh.org can be used successfully (starting 6.7). [Updated.]
For the main cipher, I did some timing tests. With the AES crypto engine, aes256-gcm is equal in speed to aes128-gcm, and twice as fast ( and more secure) than aes256-ctr. aes256-gcm is 2.5x faster than chacha, but without the engine chacha is 2 to 2.5x faster than aes128-gcm. Virtual machines don't have access to the engine, but I'm not going to have a different version for them. So the order I've adopted is
/etc/sysconfig -- diff doesn't work right on sysconfig files due to random ordering and wandering comments. Need to come back and systematically use confutil to compare them. Only 4 diffs. [Updated.]
/etc/sysctl.conf -- See sysctl.conf(5), sysctl.d(5) and sysctl(8) for overriding contingencies. Something is setting to use IPv6 RFC 4941 privacy addresses, and it ought to be turned on or off with sysctl, but I can't find it. (I want this off.) PROBLEM
Reinstallation again. We're ready to re-test upgrading from v42.1 (first) and installing from scratch. This procedure (most of it) is going to become the template for upgrading the production machines.
Verify that I've copied all modified files from Oso to the post_jump area, and retrieve any forgotten ones. Execute on Diamond.
/home/post_jump/pushconfig -C oso
Shut down Oso and save a copy of its disc. 8Mb/sec (variable). 4.6Gb compressed, 17.2Gb uncompressed.
cd /s1/kvm/oso
gzip -c disc1.raw > disc1.pureu-final.gz
Restore Oso's saved image from v42.1.
zcat disc1.421final.gz > disc1.raw
Start up Oso. Wait for housekeeping tasks like rebuilding Diffie-Hellman groups.
Update configuration files. Several important ones were written or altered after Oso (v42.1) went into storage. It checks the version actually on the machine (42.1) but Oso is in hostgroup v99.8. Execute on Diamond and include the options to override the version.
/home/post_jump/pushconfig -r 42.1 -R -C oso
Run checkout.sh and make sure Oso (42.1) is in good condition.
Make sure Oso has a current list of installed packages: /srv/ftp/updaterepo.rpmqa This script runs periodically, but Oso (v42.1) went into storage before it was written…
/usr/diklo/lib/daily/rpmqa.J
Pre-download new files from the SuSE repo. It needs to know what's actually installed on the machines so as to avoid downloading all 38000 packages. It's better to do this when Tumbleweed is running, but v42.1 is OK and few packages will be missed; the installer will have to download those on the spot from the SuSE repo. Ignore complaints that a zillion v42.1 package versions are not in the v99.8 repos (duh). Execute on Diamond.
/s1/SuSE/bin/updaterepo -v
This was written from experience on Oso, but to make it generic I'm using $target for the target machine. Pre-set this variable.
Check available disc space in the root of the target machine. Typical file bloat (before removal of no longer wanted packages) is 1.7Gb. If you don't have enough, pre-delete cruft.
Edit diamond: /usr/diklo/lib/site_perl/hostgroup.db and change the target machine (Oso) to be in the new version (v99.8). Install at least on Diamond and the target (might as well install everywhere).
Stop and disable restarter.timer and cronj.service . You don't want things being (re)started while you're reinstalling them.
You need to install the Tumbleweed repos first using audit-repos. Execute on Diamond, requires -r.
audit-repos -v -i $target -r 99.8 -u -k
Let's do this with the instance of zypper dist-upgrade
in audit-pkgs. Execute on the target machine.
audit-pkgs -v -r 99.8 -U -c -I |& tee $j/distu.$target
Took 49min. 2451 packages to upgrade, 215 to downgrade, 767 new, 9 to reinstall, 199 to remove, 66 to change vendor, 7 to change arch. 3442 total packages, 3576 installation steps. Looks like it was uneventful except for non-threatening complaints that update-alternatives had to rebuild broken groups that included removed packages. Good news: it did mkinitrd only once, at the very end. (When I upgraded Orion, it took about 1/2 hour to run %post scripts. Don't panic.)
Don't reboot yet. You don't have the kernel command line parameter (net.ifnames=0) to keep the NIC on eth0 (vs. ens3).
Post_jump: You need to override the version allegedly on the machine. The package deletion step takes extra time because it has to index package requirements. Don't panic. Execute on Diamond:
post_jump -r 99.8 $target |& tee $j/jump.$target
Took 9min. Discrepancies found:
These packages were found by capabilities (change couchnet.sel to want the providing package) [Done]
These wanted packages could not be installed:
Reboot Oso now.
Assuming previous steps did not fall on their face, check out
everything. It takes 1 to 2 minutes because it has to wait for
at
jobs to reach their scheduled runtime.
checkout.sh > /tmp/check.out
Discrepancies:
named is dead. GOST crypto engine not found. Botched creation of chroot jail. But I can't see what the error is. A Russian crypto engine that I never use -- hiss, boo! I suspect that it's just chance that it's loaded first so the error message implicates that one.
Troubleshooting step #1: Turn off NAMED_RUN_CHROOTED="no". named can start up now. It has mapped /usr/lib64/engines-1.0/libgost.so (which was not in the jail). Very recently this library was in /usr/lib64/openssl-1_0_0/engines/libgost.so . I'll bet there were complaints by jail builders and the engines were moved to a non-version-dependent directory which my script didn't recognize. Improved the script, named starts now.
rpmconfigcheck is inactive. Looks like it was never started. It has an [Install] section making it WantedBy default.target. As recommended in man systemd.special(8), there is a symlink /usr/lib/systemd/system/default.target -> graphical.target . Evidently /etc/systemd/system/default.target.wants/ is completely ignored, so anything WantedBy default.target is not going to start. rng-tools.service has a similar WantedBy (which is a hack that I put in wrongly). How do you decide between being WantedBy multi-user.target vs. graphical.target? Your [Install] section needs both of them, which is a bad workaround. Bug report here.
OK, it got run, and there are some rpmsave/rpmnew files that post_jump did not find and remove. Removed by hand.
Comparing v42.1 and Tumbleweed, when the VM is idle the load on the
host is very low for v42.1, but CPU is about 15% for Tumbleweed. This
should not happen. Investigate. [See results below in
CPU Usage with KVM
.]
If this is actually an existing machine (e.g. a VM for testing, like Oso), back it up in the normal way so you can restore the SSH keys, Apache configuration and home page, etc.
Following the Preparing to Upgrade
steps, verify that altered
files have been copied to the post_jump area, shut down Oso, and save a
copy of its disc under the name disc1.upgrd.gz .
Shall I truly wipe the disc, or use the existing partitions and tell the installer to format them (destroying the contents)? Let's take the time for a complete wipe. Took 4 minutes.
dd if=/dev/zero of=disc1.raw bs=1M count=16384
Make a symlink from oso-cd.iso to the net installer, which is at /s1/SuSE/SuSE-build/x86_64/99.8/iso/openSUSE-Tumbleweed-NET-x86_64-Current.iso
Start it up.
Installer items.
Oso is an old machine and has various files backed up which a completely new machine would not have. It will really help to restore important backed-up files before running post_jump. (On the first attempt I only restored the SSH host keys.)
are you sure you want to continue connecting, tell it yes. (Or get rid of forgotten keys and try again.) Root password is required on this and following steps, and it works (before CouchNet's sshd_config is installed by post_jump).
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="52:54:00:09:c8:d4", ATTR{type}=="1", NAME="eth0"
grub2-mkconfig -o /boot/grub2/grub.cfg.
Post_jump.
supposedto work without damaging the installation. Beware of version skew. While I'm at it, I'm switching the symlink for the CD drive back to /dev/null.
virsh reboot osodid nothing.
virsh destroy osodestroyed it.
added oso to known_hosts. Not really a failure.
Incomplete termcap entry. Probably there was an incomplete termcap entry. This self-healed, probably when a required package was dragged in by something else.
Customizing MDMbelow.)
Several problems were encountered during the various installations and upgrades, and their solutions turned out to be too big to include inline with the installation sections.
The display manager is the program that starts the X-Windows graphics server, puts up a window to solicit the user's loginID and password, and starts the user's session with his favorite desktop environment. It seems that on every operating system upgrade the previous display manager breaks and I have to find a new one. With v42.1 I was using lightdm, and a custom greeter using the Webkit engine, which basically means Javascript. It looked nice, but unfortunately it was unreliable, and after the upgrade to Tumbleweed, important symlinks set by update-alternatives were messed up, and I was unable to fix them by hand and get a working greeter.
For my new display manager I'm using MDM: Mint Display Manager
.
See this Wikipedia article
about Linux Mint. It has a non-tree-form ancestor relation with Ubuntu
and Debian; I am not tempted to investigate Mint to replace OpenSuSE. However,
the requirements for MDM itself are not onerous; all the required libraries
are clearly relevant to MDM's mission, and it doesn't drag in the whole Gnome
infrastructure. It is themeable and the syntax and semantics of the theme file
follow that for GDM (what version?) Basically you are arranging and specifying
properties for GTK widgets (what version, 2 I think). Unfortunately Glade
cannot handle the file format.
Greeter design: Each of my machines has a background image that is used both for the greeter and for the webserver's front page. Generally photos like this have visual interest in the center, and therefore I try to move the greeter window into a corner, normally northeast. A black background for this window works out best when contrasted with the various colored images. These modifications are surprisingly hard for some greeters. As a fallback I have a design using XDM, which is functional but doesn't look completely professional.
The theme file is partially documented in this page from the
GDM Reference Manual
(what version?)
Graphical Greeter Themes writeup. Surprisingly, a Google search does not
reveal any other home for this page and it doesn't look like an official
publication point for GDM. You will need to use your hacking ability to figure
out some of the necessary properties and container classes.
The mdmsetup utility is mainly for picking a theme and setting most commonly used options like timed login. It isn't for editing the theme definition file.
The theme needs to be installed in /usr/share/mdm/themes/$ITS_NAME/ The files are:
convertto compress the image.
xwd | convert xwd:- -resize 200x150 screenshot.png
I have several issues which I'm tentatively blaming on MDM:
/etc/mdm/custom.conf and friends have not been hacked, and very likely this is the cause of many of the missing features. See /usr/share/mdm/defaults.conf for the pre-customization state. There are 68 default settings, 47 to "". Here is where it stands with these items:
In /usr/share/mdm/defaults.conf, RebootCommand
and friends are set to (e.g.) sbin/shutdown -r now "Rebooted via
mdm."
. While this is correct with SysVinit, Tumbleweed has
systemd-sysvinit-233 which makes /sbin/shutdown a symlink to
/usr/bin/systemctl, which may have responded to SysVinit options at one
time, but does not do so now. I replaced the command line with
/usr/bin/systemctl reboot --message="User reboot via MDM"
and the shutdown or reboot now happens, including the message. Remember
that HaltCommand should request poweroff
, meaning to turn off power
in the machine, versus halt
, which means to halt (with power still
on). Remember that /usr/share/mdm/distro.conf is the
correct place for distro customization.
Bug report here.
There does not seem to be any configuration parameter affecting the keyboard grab. Bug report here.
That's not a bug, that's a
feature! By default, the last user who logged in is preselected. However,
suppose you're a different user; what's a good way to change to your loginID
other than giving a garbage password? Hit the Start Over
button.
Except it does not start over.
Bug report here.
Well, isn't that cute! For MDM we're using the generic PAM stack that comes with the package, and jimc's idea of setting up a full login session, like pam_ssh.so, is not happening. Fixed with a symlink to the full login stack.
BaseXsession is set by default to /etc/mdm/Xsession. When I changed to my hacked /etc/X11/xdm/Xsession, all my normal session features returned, including configuring the keyboard.
defaults.conf adds this extension to the standard X-server command line, but /var/log/Xorg.0.log has a complaint that it is unsupported. It turns out that this is the X Event Interception Extension, for intercepting all keyboard and mouse activity, it is deprecated, and it was removed in X.Org-7.5 (we have 7.6.1). The distro should use distro.conf to make it vanish. Bug report here.
No, and it's not going to, when MDM is involved. Unfortunately, XDMCP (and a lot of other stuff) has been decommitted in MDM (2013-09-25 by Clem). See the discussion below of VNC.
Error message:
GLib-CRITICAL: g_key_file_get_value: assertion 'key_file != NULL' failed
When MDM starts up it spews out 110 instances of this in syslog (including on all logged-in SSH sessions, due to the critical level). It also puts out two or three each time the greeter starts, i.e. when a user logs out and back in again.
The cure: touch /usr/share/mdm/distro.conf . Bug report here.
It's pretty clear that when I upgrade a VM, even though the guest is completely idle, there is considerable CPU load on the host, at least twice what it was with v42.1 on the guest. For example, 2 snapshots:
I went through all the modules (device drivers) on the guest and unloaded them one by one. Bingo: when I unloaded uhci_hcd, the host's CPU dropped from 15-25% down to 1-2%.
On none of the virtual machines do I pass through USB devices from the
host, so I'm blacklisting the driver. I would have expected that the right way
to do so would be a blacklist
command in /etc/modprobe.d/99-local.conf,
but it was ineffective. install uhci_hcd /bin/true
was also
ineffective (and also deprecated). I ended up creating a systemd unit
stomp-usb
,
which runs late, i.e. after graphical.target or multi-user.target, and which
explicitly unloads the module. That took care of it.
ehci_hcd is also loaded. There is no evidence that it causes elevated CPU
on the host, but per guilt by association
I'm similarly blacklisting it.
The host's CPU load is inversely proportional to the intrinsic CPU power of the host. Which is the least on Jacinth, intermediate on Diamond, and greatest on Xena. The CPU load on these hosts (with and without uhci_hcd) is:
Host | With uhci_hcd | Without uhci_hcd |
---|---|---|
Jacinth | 2.0% | 1.6% |
Diamond | 15-25% | 1-2% |
Xena | 1.3% | 1.0-1.3% |
On Jacinth with v42.1 and Orion with v42.1, and with uhci_hcd loaded on Orion, the CPU usage was 2.0%, i.e. I'm not seeing excessive CPU utilization. But when I unload uhci_hcd the CPU goes down by about 0.4%, so there is CPU utilization attributable to uhci_hcd, just not excessive. This host's processor is a AMD G-T40E @1.0GHz, while Diamond and Xena have Intel Core i7 and i5.
This problem turned out to have multiple parts. My slapd is configured out of the database, the modern recommendation. /etc/systemd/system/ldap.service starts slapd with -F /var/lib/ldap/slapd.d which contains cn\=config.ldif and dependent objects (files). All configuration filenames below will be relative to this directory. Filenames must be quoted because they contain shell-active characters. I tried restoring everything from backups, but the operational instance and the backups shared the same problems (so I reverted to the operational instance).
The main database uses the HDB backend.
The symptom was that both slapadd and slapd complained: lt_dlopenext
failed: (back_hdb.la) file not found
. The file exists:
/usr/lib64/openldap/back_hdb.la from openldap2-2.4.45-24.1.x86_64.
strace revealed that it looked for /usr/lib/openldap/modules/back_hdb.la
followed by some generic and useless directories.
Growl! ./cn=config/cn=module{0}.ldif contains olcModulePath:
/usr/lib/openldap/modules
which was correct in v42.1 but no longer.
I replaced it with /usr/lib64/openldap and on the v42.1 machines I made
a symlink: ln -s ../lib/openldap/modules /usr/lib64/openldap
, which
will be overwritten when Tumbleweed is installed. This problem is fixed;
now we proceed to the next one.
Not so fast! Zypper honors the symlink when installing, rather than overwriting it with a directory, so it's best to just not make it, and to do this fixup procedure after upgrading each directory server.
First rule of troubleshooting: don't change anything not directly related to the current symptom being fixed. But /var/lib/ldap is such a mess, with the main database files strewn around. I created a new directory ./couchnet.db , mode 750 owned by ldap:ldap, and I moved all the database files into it, so now /var/lib/ldap contains just ./couchnet.db, ./slapd.d, and ./sync.d (for housekeeping to import changed flat files like /etc/passwd). And I edited ./cn=config/olcDatabase={1}hdb.ldif , olcDbDirectory: /var/lib/ldap/couchnet.d .
Now, upon starting, slapd
complains config error processing
olcOverlay={0}syncprov,olcDatabase={1}hdb,cn=config
. After I re-read
my writeup on
Setting Up LDAP plus
section 18.3.3 of the OpenLDAP 2.4 Administrator's Guide (about
multi-master replication), I had a brainwave: suppose openldap2-2.4.41
either has replication compiled into the program or knows where to find
the module, but openldap2-2.4.45 needs an explicit
olcModuleLoad: syncprov.la
. (prepend {3} or whatever to make a
unique list index.) This is in
./cn=config/cn=module{0}.ldif . Sure enough, when I loaded the module
explicitly, slapd started working.
I tried this command line:
ldapsearch -x -H ldap://xena.cft.ca.us -b uid=jimc,ou=People,dc=cft,dc=ca,dc=us -LLL
And it returned my People page, with or without -ZZ (use TLS), except
omitting the password because I omitted authentication. A good sign.
I have a utility ldapsync -v -n
to sync flat files like /etc/passwd .
It said that it wanted to modify the passwd and group rows for lxdm (yes,
it's different in Tumbleweed) and add a row for mdm (yes, it's new in
my Tumbleweed package selection), but all the other data is identical.
So LDAP is entirely operational.
A VNC (Virtual Network Computing) viewer and server use the RFB (Remote Frame Buffer) protocol to transfer graphical data from the server to the viewer, where the user can view it, and to transfer keyboard and pointer (mouse) events from the viewer to the server. On Unix the server imitates a X-Windows server by making a framebuffer available to server-side programs. This can be the physical framebuffer or a virtual one. In either case the server-side programs use normal X-Windows libraries to draw on the framebuffer, the same as they would for a user physically present at the physical display. If the server is configured to allow multiple connections to the same framebuffer, e.g. if a user is logged in to the physical display and a VNC connection to it is allowed at the same time, authentication and access control are important. But if sharing the framebuffer is forbidden, the normal paradigm is that the server's display manager puts a greeter on the framebuffer, which takes care of authentication and authorization, same as on the physical display. Then the user's preferred desktop environment is started. The RFB protocol is not intrinsically encrypted, and in most cases a separate secure tunnel is used for the data transfer.
In my environment, VNC is used for developing and testing the display manager and the desktop environment sessions; it is rare for a user to have a desktop environment on the viewer host and to do real work in a separate desktop environment on the server. But rarely does not mean never. Even so, an important goal is that VNC support should be unobtrusive and light on resources. The socket activation paradigm of systemd is ideal: systemd on the server listens for a connection, then starts the VNC server, which exits when the session is over. Xinetd is an obsolete alternative. It's overkill for me to start a VNC server at boot time.
The default connect port for VNC is 5900/tcp, corresponding to display :0.
However, the qemu virtual machine emulator uses 5900 by default to send out
the VM's physical
display over VNC, so I plan to use 5901 to serve a
virtual framebuffer on the host. The VNC server can also serve to a (web?)
browser on 5800, or can call back to the viewer on 5500, but I don't use either
of these features.
The normal paradigm is that the Xvnc server (package xorg-x11-Xvnc-1.8.0) contacts a XDMCP-capable display manager on localhost via port 177/udp. They negotiate a session cookie, and the display manager puts a greeter window on the virtual framebuffer, and after authentication spawns the user's session. XDM, GDM and LightDM are XDMCP-capable. But unfortunately XDMCP (and a lot of other stuff) has been decommitted in MDM (2013-09-25 by Clem).
So how am I going to get a greeter going?
xdmcpc or PXDMCP by Peter Ã…strand, original (?) author Peter Eriksson (2005). This is actually an alternative to the -query option of the X-server; in other words it tells a XDMCP-capable display manager to put a greeter and user session on the display indicated on the client's command line. So it won't help me.
rxmgr ; this is proprietary from Attachmate. Not quite sure what it actually does.
Neither of these programs is going to help me.
How's this for a kludge: MDM will constitutively run what it thinks is
an X-server on display #1, but it is really a script which waits forever.
The [daemon] MdmXserverTimeout parameter will have to be set to the duration of
forever
. Systemd will wait for a connection to the VNC port. When a
connection is made, the socket (on stdin) will magically be transferred to the
waiting script and it will be released and will run Xvnc.
This is going to take some work to set up, so I'm going to defer it until it's actually needed. PROBLEM
Until recently, Iris' main role was to run MythTV to record TV shows. However, the source for my wife's preferred shows has dried up, and most likely we will rely on Internet streaming from now on. This section probably ought to be moved to a separate document, but for now it stays as part of the upgrade history. We have been using MythTV and like it, but PHP has grown to PHP7, whereas MythTV still uses PHP5, and installing it makes a mess.
Other packages also use PHP5 and not PHP7: Roundcubemail and ownCloud.
The solution was to evict the php7 package, zypper addlock php7
, and
then reinstall php5 and the needed extension modules.
Goals for the home theater machine and software:
It needs a user interface that is easy for a non-geek to use. And the system administrator needs a reasonable setup tool too.
Its interaction with the operating system has to be reasonably sanitary. In particular, I prefer to be able to advance to PHP-7. [Not going to happen.] Windows software is excluded.
I would prefer if Iris could sleep (S3 state). In the past this was impossible due to bugs in the video capture device driver. But that is not under the control of the home theater software.
It has to record programs reliably and without a lot of handholding.
It needs to show these media types:
It goes without saying that the software has to be open source.
What's currently available? Here are some recent 10 best
lists.
Top 10 Best HTPC Software for Your TV by Tuukka (updated 2017-07-01). I think these are in order of the author's preference. I've excluded non-Linux software like Windows Media Center.
Kodi. Formerly called XBMC. …is clearly the best, but
it is not the easiest to set up to make it user-friendly.
The
provided skin is elegant, and others are available. Link to
(his?) customization guide.
Plex. Its backend is compatible with a variety of frontends including Kodi (and of course its own frontend). Its strength is in finding movie and music metadata. It can transcode. It can record and display live TV. Neat feature: you can pause on one device and resume viewing on any other. Link to setup guide.
Emby. Never takes more than a few clicks to find the latest show.
You can set up a custom view for each user with individual view
points. You can pause, then resume on a different device. Does
live TV. It has a module for Kodi; I assume this means that Kodi
can read files on the Emby backend.
MythTV is a great program for the advanced user.
Unfortunately, MythTV development has been discontinued.
Jimc research: PackMan just got an update to
mythtv-backend-0.28.1+git.20170712.eef6a480b0, note the date.
Not clear what may have been discontinued.
5 Best Home Theatre and Media Center Software (Published by Ashutosh KS, in Internet, Google says date is 2016-09-06.)
Kodi. Pro: lots of codecs; streams from popular sources like Spotify, Pandora, Youtube; access to a big library including metadata; various add-ons. Con: Lots of options and add-ons, which take time to configure. More complex than others. Fullscreen is required (no fractional windows). (Commenter says, alt-enter pops into a fractional window.)
MediaPortal. Forked from Kodi. Exclusively for Windows.
Plex. He describes some of the features as being
premium
, which suggests to jimc that Plex is not exactly
open source.
Emby. Pro: Lots of codecs. Record live streams and OTA for later viewing. Lots of plugins and mobile apps. Con: He says it's less user-friendly than Kodi or Plex.
MythTV. Pro: Lots of features and add-ons. Lots of codecs. Con: Not as good a UI as others.
Conclusion: He likes Kodi and Plex best.
Home Theater Software Showdown: Kodi vs Plex by Eric Ravenscraft, 2015-12-06 (or 2015-06-12) on Lifehacker.
mostlyfree. Viewers for iOS, Android and a few others cost money. The cloud server requires a subscription.
XBox Media Center; XBox was its first architecture.
Krypton.
I had to take some time out to deal with a few problems on the non-upgraded machines on CouchNet.
My wild-side host certificate is about to expire. I opened this can of worms and out crawled out a very gross Taenia saginata. I have used StartCom for several years, but they have been bought out by WoSign.com (a Hong Kong CA) and relocated from Israel to Spain. Both WoSign and StartCom have been blacklisted by Mozilla and Google for fatally sloppy operating practices (at the very least). So I dropped them like a blighted potato.
I signed up with Let's Encrypt
, a free service recently
(2016) founded by the Electronic Frontier Foundation, Mozilla
Foundation, and University of Michigan, and with additional well-known
supporting sponsors. Both their business model and their
administrative procedures are unique, and it took some work to
integrate the new provider with my setup, which assumes long-lived
certs. Anyway, the job is done and most services seem happy.
OpenVPN had issues with the new certificates, mainly in getting the correct intermediate and root certificate into the configuration files. Now the tester, the client and the server are all using the same certs, and it seems to be happy.
Tigase (a XMPP server) is not so happy. Coincident with the advent of the new host certificate, it started complaining about a format error in the certificate and getting a segfault. I haven't resolved this satisfactorily, and am temporarily omitting XMPP service in order to concentrate on the upgrade. PROBLEM
Alsasound.service and/or alsa-restore.service have become confused on Oso and I need to do some research to find out which I really should be using. alsa-restore appears to be the primary service and alsasound is just an alias, so it is bypassed in scripts.dat.
chkstat (fixes the mode and owner of system files) exuded this
error message: /var/lib/named/dev/random: don't know what to do
with that type of file
. This turns out to be bogus.
Bug report here.
It looks like Oso is pretty close to ready, and it's time to put Tumbleweed into production, Here's the order for doing the upgrades:
realmachine to be upgraded. Successful.
I repeated the upgrade process on Petra, but not taking detailed notes. Since I had improved the procedure from experience with Oso, it went smoothly on Petra, except for coddling needed for the bizarre networking.
Since Claude is mission critical, I wanted to clone it, upgrade the clone, and then atomically replace the old Claude with the clone. I use the name Orion for machines that I'm installing, before an atomic replacement like this. The cloning process is sufficiently complex, and sufficiently useful in the future, that I wanted to make a record of the steps.
In hostgroup ( /usr/diklo/lib/site_perl/hostgroup.db ), change orion to be up, and in the v99.8 (Tumbleweed) hostgroup. For the rest of the hostgroups, just clone the line for Claude. Only has to be on Diamond but it will eventually propagate everywhere.
systemctl reload firewall.J. Local convention: the first three octets are the KVM vendor code (52:54:00) and the rest are the last three octets of the host's assigned IPv4 address (in hex). Though not strictly necessary, eventually you should install this file on all hosts and reload the firewall, because they will reject packets from Orion until this is done. At least do on Diamond.
Since we have working NFS, import the installer CD like this.
ln -s /net/diamond/s1/SuSE/SuSE-build/x86_64/99.8/iso/openSUSE-Tumbleweed-NET-x86_64-Current.iso /s1/kvm/
The XML definition file's CD definition relies on a symlink in the orion directory to the network installer, or the DVD, or /dev/null. Since we're not installing, the latter is the operational one. Due to a crock, probably starting in v12.2, a link to the actual /dev/null is not allowed, and there is a separate character device in the containing directory.
ln -s ../devnull orion-cd.iso
How much disc space does Orion/Claude need? At present it has a MSDOS disc label with 1Mb unallocated at the beginning for Grub, 0.5Gb swap, and 9.5Gb for the rest, which has 9433916 kb total, 6271856 (6.3Gb) used, and 2659788 (2.7Gb) available. This seems fine; I'm going to copy it exactly.
Without shutting down Claude, copy claude.xml and claude-disc1.raw
into ../orion/ . Using dd
for the disc may be slightly faster.
Copy speed (with dd blocksize=1M): 38Mb/sec, expected time 5 mins.
Mount Orion's disc on Jacinth, using these steps:
parted orion-disc1.raw
(parted) unit B #Everything in bytes
(parted) print #Note the start of the root partition, without the B: 534773760 (bytes)
(parted) quit
losetup -f # Prints e.g. /dev/loop0, an unoccupied loop device
losetup -o 534773760 /dev/loop0 orion-disc1.raw
fsck -C -f /dev/loop0 # Orion only, not Claude. Fix problems if any.
mount /dev/loop0 /mnt #-o rofor Claude, not Orion.
Semantically copy Claude's disc to Orion's. Only 17 files were
actually copied. I could have done the first copy step in
this way, but since both VMs are on the same host, using dd
greatly speeded it up.
rsync -a -x -O claude:/ /mnt/
To be completely straight-arrow, shut down Claude, connect its disc
to a loop device (readonly, but the offset is the same, because it was
copied), and
mount it on a different mount point. Then use rsync to copy from
Claude's disc to Orion's disc. -x is not needed this time.
About 45 files were copied, probably mostly date changes.
Unmount Claude's disc, losetup -d /dev/loop1
, and restart Claude.
Now turn Claude into Orion.
claudeto
orionwherever occurring. The name must be unique. Elements being changed include <name>, the hard disc, and the CD drive.
uuidgen -r. Insert in the <uuid> element. The UUID must be be unique.
orion.
Unmount Orion's disc and disconnect the loop device.
Attach the VM to KVM, and try starting it. This is with v42.1.
virsh define ./orion.xml
virsh start orion
checkout.sh discrepancies:
Now upgrade Orion to Tumbleweed (v99.8) following the procedure above.
Next step is to re-do Turn Claude into Orion
but reversed.
claude.
Problems encountered when upgrading Claude. Most but not all are related to MDM customization.
The greeter window is in the northeast corner, as planned.
The greeter widgets avoid jumping around. (Good.)
The greeter window assembly uses available space efficiently (unlike the Circles theme). (Good.)
For a bogus loginID and/or password, it shows the PAM error message in a light salmon color. (Good.)
The theme engine is capable of showing a language selection and a clock, but jimc considers these, in my situation, as a bourgeois affectation. At the Math Department, the language selector would be a lot more useful.
If ~/.dmrc is not owned by the user or is world writable, the greeter backend will not read it, noisily. (But it is owned by the user, mode 644.) Hao bun! ~jimc/ is owned by 1000:user mode 755, and several interior files including .bashrc. [Fixed.]
A serious problem: when the user's session begins, something sources /etc/profile.d/alljava.csh, which uses CSH syntax, and it looks to me like Bash is the shell and doesn't like it.
The session output is redirected to ~/.xsession-errors by the daemon. The first progress message from /etc/mdm/Xsession is printed, then it sources /etc/profile and ~/.profile (if existing).
Idiot! SHELL = user's shell, i.e. /bin/tcsh, and /etc/profile -> /usr/diklo/default/bashrc auto-switches between /etc/profile.d/*.sh and /etc/profile.d/*.csh according to $SHELL, which of course is totally bogus. This has been working for 20 (?) years, why does it start failing now? Auto-switch was removed, works now.
In virt-viewer, the mouse is ineffective in the greeter and in the user session. It works in the installer, so virt-viewer is probably not the culprit. I don't know yet whether the mouse fails on a physical machine.
The issue turned out to be this: Sometime in the past the driver for the PS/2 mouse was flaky; I think the guest and host cursors were chronically out of sync. It was recommended to use a USB tablet, which I did. There is now a problem with the tablet, and reverting to the PS/2 mouse fixed the problem. And the cursors look in sync again. I'm speculating that there is a parameter for the tablet that used to have a useful default but which now must be specified explicitly. I'll want to try to figure this out. Later. PROBLEM
Tidbit of information: If the VM runs Microsoft Windows 10, virt-viewer will capture the cursor and you need to send a special key combination to get it loose: Ctrl-Windows for me, Ctrl-F11 for someone else. It's indicated in the titlebar. Workaround: auto-grab and release doesn't work with a mouse on Windows, but the tablet works fine, including auto-grab.
mdm emits a lot of GLib-CRITICAL: g_key_file_get_string:
assertion 'key_file != NULL' failed
. Lots of them when opening
the greeter window, but two or three when starting the user session.
See above, Session Configuration for
MDM
under Diarrhea of the Log File
, for the ridiculous
fix and a link to the bug report.
mdm is not using /usr/X11/xdm/Xsession, i.e. my Xsession. Time for a symlink. [Done.]
Where does the greeter write the name of the chosen session? Official place is /var/lib/xdm/dmrc . If mdm has its own special place, Xsession will need to look there. Looks like it writes directly in ~/.dmrc .
These items that Xsession uses are messed up:
/var/log/Xorg.0.log complains:
No input driver specified, ignoring this device.Same for /dev/input/mouse0 .
CUPS (printing) is a royal pain. The CouchNet design is, one host (Diamond) has the printer and 3 queues for it (normal, color, photo). Cups on Diamond publishes them via mDNS for use by DNS-SD (service discovery). These records are published:
The command to elicit these records is:
dig @ff02::fb -p 5353 _ipp._tcp.local. PTR #(or ANY)
This is working, and apps (e.g. Firefox) on Diamond and on leaf nodes can find the printer(s) and print on them.
However, leaf nodes run Cups, and are configured to poll mDNS periodically and to build transit queues to pass jobs submitted locally to the host with the printer. With cups-2.2.3 (Tumbleweed 2017-08-10), it gets into a mode where every time it re-discovers a printer it rebuilds the transit queue including downloading the PPD (about 10kb). This incessant net traffic was discovered because it prevents Diamond from sleeping when idle.
In addition, it is possible to subscribe to a RSS feed publishing changes in the status of printers. Cups purges expired subscriptions every 10 secs and logs a message to that effect, filling up /var/log/cups/error_log.
As far as I can tell, Cups on leaf nodes is never actually used (formerly it was essential). The cure is going to be to suppress Cups on the leaf nodes. [Done.]
Following along in Template for an Upgrade
.
On aurora(42.1), backup-host . All but 4 files were already backed up.
pushconfig -C aurora -- 1 file to update, the Squid tester.
checkout.sh -- 1 test failed -- alsasound. Fixed (I hope) in TW.
updaterepo -v -- Did it, reindexed 3 repos.
df on Aurora: root has 9Gb free. Plenty.
In /usr/diklo/lib/site_perl/hostgroup.db put Aurora in v99.8
audit-repos -v -i aurora -r 99.8 -u -k #Install Tumbleweed repos.
audit-pkgs -v -r 99.8 -U -c -I # Distro Upgrade
Issue: Phoronix-test-suite requires php5. Toss Phoronix.
1765 packages to upgrade, 255 to downgrade, 539 new, 10 to reinstall,
225 to remove, 78 to change vendor, 8 to change arch. 2698 total items.
1.85Gb download, 1.7Gb additional space used.
Do not reboot the target yet. Post_jump will update grub.cfg adding the command line parameter which keeps the network on eth0/br0.
post_jump -r 99.8 aurora |& tee $j/jump.aurora # Minor complaints.
New systemd unit virtlockd.socket, I assume we want it, add to /m1/custom/scripts.dat. [Done, and created hostgroup vhost, and aurora is not in that hostgroup, so none of the VM daemons were enabled.]
Packages found by capabilities: [All fixed.]
/etc/openldap/ldap.conf is missing. It is created or checked by /usr/diklo/lib/daily/sorthosts.J which may not have been run yet. The file did eventually get created, at the end of post_jump.
Now reboot the target host.
Checkout.sh discrepancies:
Special checkout items:
su root.
insomnia -sputs the machine to sleep. It wakes up on USB (keyboard activity) and appears functional.
dateand rsyslog print localtime() in timezone -0700, but systemd journal (e.g. systemctl status alsasound) report in timezone -0600, i.e. normal time at 15:24, journal at 16:24. PROBLEM
To discover hidden problems, I need to start eating my own dogfood as soon as possible, hence Xena (my laptop) is the next on the upgrade schedule.
Upgrade procedure:
Following along in Template for an Upgrade
.
Slapd Was Hosed.]
Slapd Was Hosed.]
Slapd Was Hosed.]
Session Configuration for MDM.]
Once the basic problems were dealt with, I found these issues in the user
session. See Session Configuration for MDM
for fixes (for most of them).
In the MDM greeter, the Actions menu choice to shutdown or reboot the machine has no effect and no (findable) error messages. This is confirmed on Xena but probably affects all v99.8 hosts.
When MDM starts up, frequently (but not every time) it starts in the password phase. [That's not a bug, that's a feature!]
My ~/.xkeyboard file is supposed to swap ctrl and capslock. It also is supposed to register the alt key as a modifier, which of course doesn't happen. xkbcomp got moved to /usr/bin. Fixed to adapt to either location of the binary. It also didn't get executed at startup.
The button zones on the trackpad are not effective. The zones are set up but do not produce button codes. Evidence: in xev, a single click in the left zone produces a button1 event, but clicks in the middle or right zone produce no event at all.
xf86-input-synaptics is not installed [added to extra.sel] and the libinput driver is used. Installed that package, we're getting closer, synaptics driver is now being used (vs. libinput). Scroll events are produced. Button3 is produced, but only 2 button areas. Someone has an explicit SoftButtonAreas option for this: 70-synaptics.conf , which overrides my 60-synaptics-J.conf . Renumbering… Now it all works.
Something wacko with the beroot command. A change in the interpretation of ${@:0-1} when $# == 0. Worked around.
The session's SSH agent was started (under startxfce, not by sys.xsession nor by pam_ssh) but not loaded.
Features of the XFCE session that are horked: Most settings carried over correctly.
Remaining Problems.
gstreamer-propertiesinstalled, the Settings app could start it.
These were checked and appear OK:
Firefox has decided to use a light blue background that I don't like. Also the font is different and isn't my favorite. PROBLEM
Gnome Games (specifically, Mahjongg) are not installed. Wrong: in v42.1, gnome-games-3.10.0- requires a list of games, whereas in Tumbleweed, gnome-games-3.24.1 requires no games and you have to ask for them individually. Changing couchnet.sel accordingly.
On all v99.8 hosts and every day, /var/cache/man is set to be owned by man:man, whereas some /etc/permissions* wants it man:root 755. Also, Petra and Xena set /var/log/btmp to root:utmp while the files want it as root:root. The cure: override in /etc/permissions.local going along with what mandb sets the files to.
Diamond (and presumably other hosts) hack packet statistics generator report the uptime and total time as 0. Jacinth reports uptime 23.9hr, total 24.0hr, which is correct. PROBLEM
Kermit (see hardware review) is an AMD E-350 which has bounced between several roles but is now doing audio playback. The only challenge in this upgrade may be the sound. Following along in the upgrade instructions:
/s1/SuSE/bin/updaterepo -v (on Diamond). Issues:
/srv/ftp/updaterepo.rpmqa is up to date.
checkout.sh >& $j/check.out -- Petra failed to automount Kermit's NFS exports (this combination often gives trouble), and alsasound is in a strange state (should self-heal after the upgrade). Other than that, Kermit's services are all working.
df / -- 16.3Gb total, 7.6Gb used, 8.0Gb available. This is plenty.
Moved Kermit from hostgroup v42.1 into v99.8 .
audit-repos -v -i $target -r 99.8 -u -k
audit-pkgs -v -r 99.8 -U -c -I |& tee $j/distu.$target
Start 13:02, installation done 16:50 (3h48m), %post scripts done
17:27 (37m). What a slug!
Hit the problem seen before with php7-imap. Solution: toss Phoronix
Test Suite. Similar issues with mythweb-0_27.
1688 packages to upgrade, 216 to downgrade, 519 new, 10 to reinstall,
211 to remove, 76 to change vendor, 7 to change arch.
2545 total items. download 1.64Gb, installed size increases by 1.0Gb.
post_jump -r 99.8 $target |& tee $j/jump.$target
Took 62min. Discrepancies found:
These wanted packages could not be installed:
Reboot Kermit now. Boot issues:
no such deviceand croaked. This happened both in the initrd and in the active boot. Let's blacklist this module and see if it helps.
checkout.sh >& $j/check.out
Checkout discrepancies:
Other issues:
Iris is the home theater PC, running MythTV and also doing audio playback. Following along in the upgrade instructions:
Preliminary steps, while Iris is still in v42.1:
In /usr/diklo/lib/site_perl/hostgroup.db put Iris in v99.8 from v42.1.
Stop and disable restarter.timer and cronj. You don't want stuff restarted that you're upgrading.
audit-repos -v -i $target -r 99.8 -u -k
audit-pkgs -v -r 99.8 -U -c -I |& tee $j/distu.$target
Issues:
Don't reboot yet.
post_jump -r 99.8 $target |& tee $j/jump.$target
Issues found:
These packages were installed via capabilities (same issue on Kermit):
MySQL was not available, need to switch to MariaDB. Affected packages: mysql-community-server mysql-community-server-client mysql-community-server-tools PROBLEM
Reboot Iris now. It rebooted.
checkout.sh > /tmp/check.out
Discrepancies:
Does it play music? Yes! OTA to FMRadio device to ezstream to Icecast to meow.sh to GStreamer playbin.
I'm sure MythTV is hosed, but except for that, I believe Iris is operational and is successfully upgraded.
I had intended to upgrade Diamond after Iris and before Jacinth, but
Microsoft Windows 10 (on the VM Baobei hosted on Diamond) self-destructed
just before I started the upgrade. Obviously
monkeying with
Diamond's KVM will not affect problems on Baobei, but it's prudent to not
touch Diamond until Baobei is dealt with.
Steps in upgrading Jacinth:
Preparation:
Edit /usr/diklo/lib/site_perl/hostgroup.db moving Jacinth to v99.8 from v42.1 .
Stop and disable restarter.timer and cronj.service .
audit-repos -v -i $target -r 99.8 -u -k -- On Diamond.
audit-pkgs -v -r 99.8 -U -c -I |& tee $j/distu.$target
Package selection issues:
2733 packages to upgrade, 250 to downgrade, 825 new, 10 to reinstall, 228 to remove, 94 to change vendor, 9 to change arch. 3971 total items. Download 2.16 GiB. Additional to use: 1.7 GiB.
Took 78min and aborted on item 2855/3971, http://download.opensuse.org/tumbleweed/repo/oss/suse/x86_64/libprojectM-qt5-2-2.1.0-14.1.x86_64.rpm?proxy=http://distro.cft.ca.us:3128 is temporarily inaccessible. Zypper retried 4 times but Squid got a 503 each time. A lot of %posttrans scripts were skipped. I'm redoing the dist-upgrade and watching it closely, and I plan to ignore the failure on this package. Famous last words: connectivity to download.opensuse.org (and other sites) is lost. eth1 is down. But access to Diamond remains. Retrying but letting it skip unavailable repos. (Find out what TCP_DENIED means in /var/log/squid/access_log .)
Retrying, 1191 total items remain. Upgrading the desktop environment (XFCE) when you're using it to do the upgrade gives some interesting effects. Whenever it installs or removes a font, all the XFCE plugins hog the CPU for 30 secs or so. It tried to load 20 to 30 packages from offsite, which I told it to ignore. Idiot, it's upgrading Phoronix Test Suite, not tossing it. But it had to retrieve from offsite: ignored/skipped. Finally finished after 78min. It ran all the %posttrans scripts that I remember from other machines: looks like before the abort it saved them in /var/adm/update-scripts and re-ran them when the job was retried.
Don't reboot yet. /etc/default/grub has net.ifnames=0 .
post_jump -r 99.8 $target |& tee $j/jump.$target
This is going to be a challenge; it will probably botch the install and dist-upgrade steps.
Slapd Was Hosed). Evidently fm.J is on Jacinth and needs the patch that Iris has [installed]. /etc/sysconfig/dhclient got changed.
Reboot. Cross fingers. What did not come up:
dnsmasq: bad IPv4 address in /etc/dnsmasq.d/dhcp.conf at line 76. Yes it still supports 121=classless-static-route. The issue was, 0/0 used to be a legal IPv4 address; now it wants 0.0.0.0/0 . But will this be translated to the address of the host running dnsmasq? Hope it wasn't. [Verified.]
upower (coredumps repeatedly). Eventually it started and kept running. Command line for later testing: just /usr/lib/upower/upowerd The issue was that it's dbus activated, and if it comes up before dbus does, it reports an assertion failure (no dbus connection) and things go downhill from there. It's supposed to hang, or keep retrying, until dbus appears. My fix: add "After=dbus" to the systemd unit file. Wrong, it needs to be After dbus-org.freedesktop.login1.service and also getty.target because login1 won't start until something coincident with getty.target. Bug report here.
mdm (fallback to xdm)
ldap / slapd: I made the fixes in
Slapd Was Hosed
and it works now. Except no communication with Diamond (yet).
strongswan (charon dumps core repeatedly). This self-healed.
network6: Needs ifconfig command. (And needs systemd unit.) Edited to use ip command. Works now but still no IPv6 connectivity. dhclient-hooks -v for status check: our wild side IPv4 address has changed. Add -c to re-register. Registered on tunnelbroker.net, members.dyndns.org, admin.mailroute.net. IPv6 is back.
checkout.sh discrepancies:
avahi-daemon: Never worked right on Jacinth.
network6: status check says 2001:470:1f05:844::3 is missing, which is not a lie. Eventually I re-registered the wild-side IPv4 address, which brought back 2001:470:1f05:844::3.
domoticz: Is not installed. It's in obs:home:Guillaume_G. Downloading and installing. We got v3.4834. On startup it wants to upgrade to version 8153. Testing, it always gets an error sending a command to the switches. It turns out that v3.4834 was compiled without Z-Wave support, hiss, boo! The old version (domoticz-2.0.2276) required openzwave libopenzwave1_2 libopenzwave-devel but these are not available on Tumbleweed.
The Z-Wave protocol is non-free, but has been illegally reverse-engineered to produce the openzwave set of packages. Could that be the reason it is banned from SuSE?
Solution: I copied domoticz-2.0.2276 from my SBS stash, and downloaded the openzwave things from SuSE v42.3 (into local SBS repo). The database had been trashed in the failed upgrade, so I restored it from backups. This combination is installable and domoticz is once again operational.
dovecot: Is not installed. Installed and not dead, but TLS establishment fails, no shared cipher. The weasels! They just overwrote everything in /etc/dovecot/conf.d . Need to restore. [Done and tested.]
openvpn@server443 is dead (but openvpn@server is OK). Looks like a connection from researchscan351.eecs.umich.edu. (141.212.122.96) killed it. Restarted successfully. Message: Aug 24 16:23:35 jacinth openvpn[2386]: TCP connection established with [AF_INET]141.212.122.96:26630 ; --mtu-disc is not supported on this OS ; Exiting due to fatal error. Next day an unrelated web crawler caused the same failure. Bug report here.
alsa-restore: Needs to rebuild /var/lib/alsa/asound.state (done). I added this to the tester. But on Iris this fails at random times. Still investigating. PROBLEM
strongswan: Dumps core continuously. I didn't do anything; it self-healed.
postgresql: postgresql96-9.6.3 is installed. The database is from v9.4. Got to reload. [Done, successfully.]
Now that I have DNS and networking unscrambled, I can install missing packages. audit-pkgs -v -i -c -I . Package selection issues:
I have too many packages that depend on PHP5 and aren't built for PHP7. Reverting to PHP5.
I cleaned up some packages: MythTV, MySQL, squirrelmail. These were already commented out and were removed: sogo, sope, prosody and its dependencies. MySQL did not actually go away; something depends on it. 32 packages removed.
Installing packages according to the alterations. Unavailable packages: dhcpcd libmm14 perl-Danga-Socket php5-pear-Crypt_GPG . The first 3 are obsolete; removed from /m1/custom/extra.sel . php5-pear-Crypt_GPG is used by Roundcube's Enigma and is important. Downloaded and installed.
Except, several PHP5 modules are back version and have an incompatible ABI: exif fileinfo pdo_pgsql pgsql pspell zip zlib , and intl needs libicui18n.so.52.1 which is not installed.
Look for /usr/lib64/php5/extensions/exif.so . The culprits are dated
2017-03-20; Currently updated: 2017-07-30. The culprits are from v42.1
and did not get dist-upgraded. php5-exif-5.6.30 requires php5-5.6.30
which is uninstallable. Needs to downgrade php5-5.6.31 to php5-5.6.30
together with a bunch of modules depending on it. Wait a minute, the
one on Tumbleweed is php5-exif-5.6.31, why isn't that being installed?
Because of priorities. Strategy: retrieve-pkgs ; updaterepo ; zypper
dist-upgrade. The dist-upgrade wanted to upgrade everything to php7.
zypper addlock php7
, that was honored by dist-upgrade. Now the
correct packages are installed.
Upgrading Dovecot from v2.1 to v2.2. There are just a few configuration changes, none of which are non-default on my system, so the upgrade should go smoothly. Cross fingers. It starts… It is creating custom Diffie-Hellman groups in /var/lib/dovecot/ssl-parameters.dat to resist Logjam. Looks like it's not going to open for business until that's finished. Later, when it regenerates the DH parameters it is willing to use the old ones until finished.
Can't do TLS on any ports. Weasels! They overwrote everything in /etc/dovecot/conf.d without doing the rpmsave thing! Restoring from backup. Now TLS works. Actual mail retrieval coming soon.
Need to test Roundcube. The krb5 extension is required for GSSAPI
authentication…
IMAP login Problem after upgrade to v1.2, OP quwax, 2016-05-26. (Useless.)
The name of the plugin appears to be roundcubemail/plugins/krb_authentication. See the Official Plugin Repository for installation instructions and a searcher. Unfortunately it doesn't find either krb_authentication or krb5.
This plugin is on GitHub. It turns out that this plugin is included in the core. Now, how to enable it in configuration. I edited /srv/www/roundcubemail/plugins/krb_authentication/config.inc.php filling in my own hostname@REALM. But what it's really asking for is the PHP5 Kerberos (krb5) extension. Package php5-krb5 is not on SuSE Build Service. See the PECL package page; the latest version is 1.1.2 dated 2017-04-08. Apparent installation procedure from Can't build krb5 php extension by G4schberle (2017-06-15) on Stack Exchange.
Build dependencies:
config missing.
Now it complains: The gssapi_cn parameter is required for GSSAPI authentication. This is the Kerberos ticket cache as propagated to the server, e.g. KRB5CCNAME=DIR:/run/user/500/krb5cc_XXXXX. But this plugin ought to be bypassed if KRB5CCNAME is empty. I'm not propagating the credential anyway, rather, relying on split authentication and authorization to get Dovecot to deliver the mail. I don't want to debug the client-server interaction. I'm going to try to sabotage /srv/www/roundcubemail/plugins/krb_authentication/krb_authentication.php Still demands the gssapi_cn parameter. This is checked for in /usr/share/php5/Roundcube/rcube_imap_generic.php . Conditional on $type == 'GSSAPI'. This value is the 3rd arg to function authenticate($user, $pwd, $type). Function connect() picks the type. If I'm reading the code right, if the mail server (Dovecot) supports GSSAPI, it is used, and can fail if the browser has not sent the needed parameters. I tampered with /usr/share/php5/Roundcube/rcube_imap_generic.php by removing 'GSSAPI' from the mechanism list. That did it; Roundcube is now showing mail. And jimc's generic REMOTE_USER plugin still works.
Enigma had a problem but finding the error message was hard. I eventually found it in /var/log/roundcube/errors . /home/httpd/htdocs/kerberos/roundcube/plugins/enigma/home is not writable (not that Enigma is going to create a user homedir today). Its perm is root:root 755, should be wwwrun:www 755. Once that's fixed, it's now signing and checking my mail.
Now that I can read mail I discover that there is a plethora of
messages complaining that
/usr/bin/php /home/httpd/htdocs/owncloud/cron.php
and
/home/jimc/public_html/radio/icekiller -k -u admin/hackme http://localhost:8000
were not being executed.
The problem was in cronj. Whenever the job was to run as other than root, the file that receives stdout-stderr had to be re-owned to the target user, and formerly that worked fine, but now you can't create the file as root and then change its ownership (at least when it's in /tmp); you have to setUID to the target user and then create the file. icekiller and ownCloud are now being executed happily.
Need to test ownCloud. Web interface works, cellphones are syncing.
The Windows VM Baobei has been reinstalled and appears to be working normally, and the guest is known to work on a host with Tumbleweed, so I'm ready to upgrade Diamond fromm v42.1 to Tumbleweed. Preparation steps:
/s1/SuSE/bin/updaterepo -v -r 99.8 -a x86_64 >& /tmp/system/updaterepo.log
Get this started early; it's on the critical path. Allow time to
read the prolix changelogs.
Pre-removed Phoronix Test Suite.
Diamond is the host for the VM Oso. Power off Oso.
retrieve-pkg -v -w $HOST for all hosts, plus reindexed PackMan.
/home/post_jump/sync_jump -C #-- All conf files are up to date.
Checkout.sh outcomes are either successful or excusable.
Installed packages list is available.
Doing the upgrade:
Check disc space: 15.3Gb total, 12.3Gb used, 2.2Gb available. Should be enough, but space is getting tight. /var/cache/insetup.jail has over 3Gb of cruft, all for obsolete OS versions. Deleting it. Now 5.6Gb available.
Diamond hostgroup was changed to v99.8 from v42.1. This is the last v42.1 machine.
Stopped restarter.timer and cronj.service.
Installing repos: audit-repos -v -i $target -r 99.8 -u -k
Installation and refresh succeeded. The main repo's metadata is
enormous and slow. Oops, it was supposed to edit baseurls to use
the local directories rather than HTTP to $target. Fixed (bad regexp).
Doing the upgrade:
audit-pkgs -v -r 99.8 -U -c -I |& tee $j/distu.$target
Package selection issue: apache2-worker-2.4.27-1.2.x86_64 conflicts with apache2-worker-2.4.16-18.1.x86_64 and other 2.4.16 packages. Toss the old ones.
2649 packages to upgrade, 210 to downgrade, 845 new, 12 to reinstall, 240 to remove, 71 to change vendor, 8 to change arch. 3859 total items. Download 2.0Gb, installed 1.1Gb additional (predicted) vs 1.4Gb (actual). Total runtime 54min.
Don't reboot yet, or networking will be horked.
post_jump -r 99.8 $target |& tee $j/jump.$target
post_jump problems:
Tossed and locked out php7; reinstalling wanted packages (audit-pkgs -i). It's trying to install cups-browsed which is no longer wanted on CouchNet (removed from /m1/custom/extra.sel). The rest of the missing packages were installed.
Rebooted. Boot time discrepancies: slapd is hosed; no IPv6 for Oso, which was started at boot.
checkout.sh discrepancies:
lp -d lp2 /etc/issueprints the file. On Xena it maunders
Bad file descriptor. It's trying to connect to the nonexistent cupsd on localhost. I edited /etc/cups/client.conf to direct leaf nodes to Diamond. Xena can print now; and Diamond can still print.
self-healed, probably when something got restarted.
Updating Diamond is now finished, and this is the last host that needs to be updated.
wakeup.J.service is actually a LSB script. Need to extract the code into a separate script and to make a proper systemd unit for it. bridge.j has the same issue. PROBLEM
These are the remaining scripts in /etc/init.d :
retrieve-pkgs does one host at a time, including reindexing every time. Need to do multiple hosts and reindex only once. PROBLEM
It is possible to install evince with no backends, whereupon it exits with no error message. You need at least one backend. Jimc is installing evince-plugin-{pdf,ps,dvi,djvu}document ; also available on the Tumbleweed DVD are comics, tiff, xps. Bug report here, comment #3.
Logging in with the MDM greeter and my custom theme, I was delayed while entering my password. The daily housekeeping report popped up and received focus (and several bytes from the password), and the only way to get back to the greeter was to use that window's close button. The greeter should positively grab the keyboard. Note, the housekeeping report generator runs as root, and uses /var/lib/mdm/authdir/:0.Xauth as its XAUTHORITY file to gain access to the server. Bug report here.
This occurred with mdm-2.0.16 which I swear was from the main Tumbleweed repo, but now as of 2017-08-22 it has been upgraded to mdm-2.0.18-2.7.x86_64 from obs://build.opensuse.org/X11, not in the Tumbleweed main repo.
Tigase XMPP server complains about a format error in the new host certificate, and it gets a segfault. I should either upgrade to the latest version and try again, or switch to yet another XMPP server. Metronome is a new one. Fork of Prosody (~2013), written in LUA, actively developed, with several public deployments for social media. Not on SBS, install from source. PROBLEM
workrave-1.10.1 is installed from SuSE-13.1, hiss, boo! It gets
a segfault on Tumbleweed. It is not present in any of the configured
repos. I need to find a recent version, copy it to the appropriate
local repo, and install it.
Hung up trying to find typelib(Workrave). Also libgdome.so.0.
I tried doing this the right
way with SBS, but it's turning
into a real can of worms, and also, the last maintenance to the package
was in 2011, so I'm giving this up. PROBLEM
Claude and Oso are chattering with Diamond, on port 631 keeping it awake. Oso (and presumably Claude) downloads the PPD for all 3 printers (about 10k each) every 10 secs, as if it's polling. [Fixed by killing cups on leaf nodes; it's totally useless there.]
When an entire directory has vanished on the host, backup-host removes the directory from the backup instance before removing the content, failing. I thought I fixed this already. Use -n to not copy/remove and to retain the temp files. Fixed again, and checked.
updaterepo runs on Diamond at midnight, when most hosts are down, and it tries to retrieve rpmqa from them, failing. Fixed: leaf nodes mail lists to reports@jacinth, and updaterepo on Diamond retrieves them from http://jacinth/~reports/rpmqa.d/$HOSTNAME.
My custom greeter window on lightdm has a black background and a white border. The one on MDM lacks the border. I should add the border. But without documentation it isn't a quick fix. I also have a user complaint that the linespacing is awkward, and the box should be in the southwest corner, not northeast. OK, I moved the box and provided the white border, but wasn't able to deal with the spacing.