Introduction | Configuration | Subsystems | Installation | Recommendations |
SuSE 9.3 has kernel 2.6.11.4, including a number of patches for issues that are troublesome in SuSE 9.2 (kernel 2.6.8). Plus, of course, its own collection of screwups. See here for a list of issues that are important to me on this machine. Here is a plan and checklist for upgrading from SuSE 9.2 to 9.3.
SuSE 10.0 has kernel 2.6.13, and even more patches; v10.1 has kernel 2.6.16. I used the same general procedure to upgrade to these versions. Where significant, the relevant kernel version is indicated in the plan.
Clear out fafnir:/home/upgrade/xena. Dispose of non-public files like this:
find . -type f \! -perm -04 -print0 | xargs -0 -n 25 shred -u
rm -r etc ...
Special backup of /root /boot /etc /usr/local /home and all backup.pln files. I'm making a tgz file of each directory onto Fafnir, the server. Here are the command lines for tar and friends:
cd /The backup files for root, etc and home need to be mode 600; the others don't have any secret data in them.
tar czf - etc | ssh fafnir "cat - > /home/upgrade/xena/etc.tgz"
tar czf - home --exclude-from=$j/exclude.ls | ssh fafnir "cat ..."
locate backup.pln | sed 's/^\///' | tar cTzf - - | ssh fafnir "cat ..."
Also I need to copy the old /etc to /home/upgrade/xena/etc on xena itself so later it can be compared to the upgraded files.
Do the installation.
OK to use the regular installation method. On an older Inspiron model there was a problem initializing the I/O APIC, so the failsafe kernel args were needed. Now either the BIOS gets it right, or it's been blacklisted.
Under Update Mode
, be sure to mark these
choices:
Conflict reports: have to give up alsaplayer and audacity. mpg123 is no longer supported but can be kept. Major shock, they've junked Heimdal and gone over to MIT Kerberos. Bad move, I should have refused to upgrade Heimdal. See below. There were some miscellaneous old packages that it wanted to delete, and I went along. No other major software losses. Apparently fglrx (proprietary ATI driver for FireGL including Radeon X300) is bundled on SuSE and they have a better version than I do. This will have to be checked in detail later.
[9.3] There was 2.6 GB to be installed in 710 packages; estimated time 70 minutes. Actually took 43 minutes. 10.0 was similar.
In post-installation, bypass the network test and online update. We need to be sure that the VPN is working, and also, Mathnet's mirror site won't download the patch files until tomorrow AM.
Major items from the release notes:
Important features on the initial checkout:
The Inspiron 6000d has a SATA disc controller, and therefore Linux uses SCSI emulation. The primary disc (UDMA 100 MHz) is /dev/sda. The CD is /dev/sr0 due to ATAPI being enabled in libata and ata_piix.
They didn't mangle my /etc/X11/xorg.conf file (but see below), so
X11 can start (but see AAAAUGH
below) and the touchpad works from
the beginning, including double but not triple clicking (same as on the
previous kernel with patches). But the file will have to be checked
carefully later for unwanted modifications.
I can log in and su
to root. In kdm, default
now
means KDE. You need to select custom
to use your .xsession.
[10.0] kdm does not call PAM to close the user's session; specifically,
the Kerberos credential is not wiped. Gdm gets this right, and I switched
over to use it.
I can log in as root using the backdoor (sulogin) on tty3, a
special feature
of jimc's setup in case Kerberos won't let me in.
Network configuration works same as before, except eth0 is now wireless and eth1 is wired. Next reboot: they're interchanged again. I need to figure out how to set persistent device names; it's a royal pain when they switch all the time.
Jimc but not root has a current and valid Kerberos ticket. Evidently /etc/krb5.conf and the MIT library and PAM module works well enough that Xena can get tickets from Fafnir. But the KDC on Xena itself is not running in 9.3.
NTP configuration file was not trashed. The clock is synced to Fafnir.
birth controlrules are not honored. Why? With the standard /etc/udev/udev.conf, udev looks at /etc/udev/rules.d/*.rules, and [for the udev version with SuSE 9.2] if a rule lacked any action then no action would be taken, i.e. that device inode would not be created. Starting with SuSE 9.3 it wants to see
OPTIONS="ignore_device"
(with quotes)
to suppress creation of the device inode. With SuSE 10.0 I discovered this
change and revised my birth controlrules, which are effective again. However the SuSE initrd creates /dev/isdn* (16 of them) and /dev/md* (32 of them) by cowboy programming, and if you don't like them you'll need a special startup script to delete them.
Initial /etc reconfiguration: restore the original profile, csh.cshrc, csh.login, pam.krb91. With this done, you might think the system had not changed at all. Not very likely.
AAAAAAugh... Disaster! [9.3, not 10.0] After a reboot, the X-server will not start. To fix: in /etc/X11/xorg.conf, comment out the whole section for the Synaptics touchpad, and replace with a generic mouse, protocol ExplorerPS/2. It can tap, including double click, but of course no fancy features. At least it's useable. Likely before the reboot it was using the generic xorg.conf file, but then it replaced that with the original xorg.conf file. The issue is that the X driver cannot elicit a response that it recognizes from the pad.
Compare files in /etc with the old ones saved in /home/upgrade/xena/etc.
diff -r -w /home/upgrade/xena/etc /etcHere are particular items that will have to be examined later (and, in hindsight, their resolutions):
[9.3] It was a mistake to let them install MIT Kerberos. Revert to Heimdal. All servers must be upgraded in a coordinated way. There's a section later on what I had to do to revert to Heimdal. In 10.0 I converted everything to MIT Kerberos and got it working.
/etc/cups/printcap is not getting CouchNet printers appended. Because it was owned by root:root. Re-own to lp:lp and it starts working. You need to put a line in /etc/permissions.local so this happens automatically.
Study /usr/share/doc/packages/hal/spec/hal-spec.html and learn how to make HAL work for us, not against us.
Formerly, hotplug was jiggered to allow the sound devices to be created. This has stopped working. Look at /etc/hotplug/generic_udev.agent and generic_empty.agent, sound.agent. The problem was that /etc/udev/rules.d/50-udev.rules accidentally got deleted. For future reference, here's the command line to extract one or a few files from a RPM package:
rpm2cpio whatever.rpm | cpio -i -t #To list the contents-d means create containing directories as needed. It's safest to
rpm2cpio whatever.rpm | cpio -i -d etc/udev/rules.d/50-udev.rules
cdto /tmp, do the extraction there, then find your file and move it to the final location. That way, a mistake won't trash a lot of files that you didn't want to reinstall.
[10.0] /etc/hotplug/blacklist has an entry for all sound modules including snd-intel8x0, but alsasound relies on hotplug to load it. Comment out the blacklist line and your sound will start working.
[10.1] The whole hotplug concept is out the window in kernel 2.6.16. Udev, Hal and Resmgr, inter alia, handle everything. Except...
[10.1] Formerly subfs was used to handle mounting removeable media. Subfs was a horrid kludge, imitating the way Windows does mounting, and is now down the drain. After creating the device inode if necessary, udev notifies hal (via dbus) when a device becomes ready, hal notifies resmgr to give the console user permission, and then it notifies the Gnome Volume Manager (or the KDE equivalent) to actually mount the volume, as the user. If one has a Gnome Volume Manager. Since I use fvwm, not Gnome, my volumes don't get mounted. Hiss, boo!
Formerly, /dev/hdc was the CD drive. Now it is /dev/sr0. Review its mount options in /etc/fstab. In particular, are we still using subfs, and should we be? Or is HAL taking over? Yes, apparently subfs should be used. However, this is an extra package in SuSE 9.x, and [9.3] I need to locate the source package and build it for kernel 2.6.12.
/etc/init.d/alsasound -- how can we leave the sound running at shutdown, so the shutdown sound can be played? It looks like ALSA mutes everything when the driver is unloaded. Let's see what happens if we sabotage the stop procedure, not unloading drivers. Probably on some hardware this produces a horrible squawk. Ineffective, still no shutdown sound.
There are now a bunch of pairs earlykbd
-- kbd
and
similarly for kdm and syslog. The early component tests some stuff and
execs the late file if they're true. Apparently the late component has
runlevel links for a later phase when contingencies are more likely to
be favorable, e.g. NFS filesystems are mounted.
New in the boot script ordering comments: X-SuSE-Should-Start. linkconf.J will have to be updated accordingly. (Done.)
The UID-GID of the test accounts should be changed to the 500 range. Xena and Fafnir both.
/etc/permissions and included files have a format change: uid:gid (with colon) rather than the former uid.gid (with dot). These are converted automatically. Don't change it back. But permissions.local is not converted automatically (hiss, boo). (Report to SuSE.)
/etc/sysconfig/network/scripts/ifup-route installs
a link local special route for 169.254.0.0/16. This turns out to be
an address range reserved for auto-DHCP
. Windows and Mac machines
which attempt DHCP and time out will pick an address in this range.
I don't really want this to be supported on my machine.
(Nasty message to SuSE.)
[9.3] Sad news, I can't suspend to disc under kernel 2.6.11.4. It writes 0% of pages to the swap partition, hangs for about 30 seconds, and then gets a kernel panic: cannot satisfy kernel paging request at... It did not get far enough to write anything useful. Cure: use kernel 2.6.12-rc1 with patches, or upgrade to SuSE 10.0.
Apparently SuSE 9.3 uses the initramfs feature. It's a CPIO archive compressed with gzip, and apparently the kernel copies the content into a ramdisc somewhere, rather than mounting the initrd image and working out of that. With an old style initrd it could be remounted onto /initrd, but this doesn't happen with an initramfs.
[9.3 only] For reasons unknown, the ahci driver was added to the initrd. But on my ICH6M chipset it failed (error -12) when probing the SATA controller. There's a comment somewhere that ahci can't handle the ICH6 chipset.
[10.1] Now it uses (successfully) ata_piix with ahci as a dependency.
Audit-scripts: this script turns services on and off according to my own list. This list was reviewed carefully. The default installation includes these services that I want turned off; they were not added to the list:
These services were added to the list:
These services were present before, but are no longer in the distro:
Execute audit-suse, a Mathnet local script to recognize missing and unwanted software. Remove junk software (a ton of it). Install forgotten software (not too much this time). The major problem item here is hevea, which was last seen in SuSE 8.2. Plus the Kerberos issue.
Online update. Done, no problems.
[9.3] Install Heimdal and Heimdal-client from SuSE 9.2. Bringing Kerberos to life was not easy. If you pull on one thread the whole sweater unravels. (In SuSE 10.0 I converted all the servers to MIT Kerberos.)
About 50 packages are compiled to use the MIT krb5 and GSSAPI libraries, so we aren't going to be able to switch to the Heimdal client library. It will have to coexist with MIT. Need rpm --force because both packages want to write on /etc/krb5.conf.
MIT Kerberos insists that in /etc/krb5.conf, comments must begin in column 1, while Heimdal allows whitespace before comments. Fixed.
Re-installing Heimdal: it wants libdb-4.2.so while the distro provides libdb-4.3.so. They are not backward compatible, and given a symlink 4.3 -> 4.2, Heimdal servers and tools die. Solution: rdist the old libraries from Fafnir.
Probably it would have been sufficient to rdist the database, but I ended up running hprop, which does not honor [kdc] and puts the database where it feels like (heimdal.db). OK, I found it. Now all the Kerberos stuff seems to work.
But any Kerberos login, including su
done by root, takes
a minimum of 16 seconds. It turns out that (a) orion.cft.ca.us is
listed in /etc/krb5.conf (necessary for my laptop scheme); (b) Orion
is not in /etc/hosts, so resolution is by DNS; (c) the FQDN is not
actually fully qualified, not ending in a dot, and so searchlist
elements are appended before the raw name is tried. The
latter steps require off-site lookups, except DSL is turned off, hence
there's a 2 second timeout for each one. The whole inefficient mess
is attempted 4 times before pam_krb5.so gives up. Fixes: add Orion
to /etc/hosts; append a dot to each hostname. But when the dots are
appended, this precludes successful lookup in /etc/hosts. I'm going
to revert that fix
. The whole thing is a mess.
In SuSE 9.2, when you load module tun it creates /dev/net/tun, taking 12 seconds due to timeouts in the hotplug-udev interaction. In SuSE 9.3, the device appears promptly. But I think there is still a short time after modprobe returns until the inode is created.
Install Hevea from SuSE 8.2. Not there: 8.1? Find out on Orion where it came from.
Final propagation of configuration files.
Review of error messages during boot.
tls-remote harlech.math.ucla.edu.
Do I need to download and install the lastest wireless driver (ipw2200.ko)? An experimental driver like this advances quickly. No, the installed kernel has the same sources I have (v1.0.1).
Hardware and software activity checks.
ACPI readout of battery still works.
Processor. Does it process? :-) Yes. Powersaved raises the CPU frequency to 1.6 GHz when a compute-bound job runs, and the current to the processor jumps up about 0.9 amps, but productivity is only about 1 work unit/sec, i.e. not as good as on previous tests. It's using speedstep_centrino, should use acpi_cpufreq. Fixed, but speed is the same. Investigate later.
Display. Still works.
Graphics. Is 3D acceleration happening? [9.3] The distro includes the fglrx module in the kernel-default-nongpl package. Trying it out: According to Xorg.0.log, it's working perfectly. Sadly, the screen is blank. Possibly there's version skew somewhere. This will have to be worked on later. [10.0] fglrx now works, with fast 3D performance.
Memory. Is it all there? Yes. memtest86-3.2 is in the distro, and works.
Sound card: Works.
Modem: Not tested, probably has to be recompiled. To be done later.
Hard disc: Working, attached by ata_piix.
Optical disc. Activities to be tested;
Network devices.
Wireless NIC: Works. It's being automatically configured, successfully. Distro's driver is v1.0.1; don't have to download and recompile.
Wired Ethernet: Works.
PCMCIA card (the old Xircom wired NIC): Hotplugging it works.
USB devices. (Not re-tested yet.)
Suspend power modes. Suspend to to RAM and to disc. [9.3] Something makes it hang during suspend, after writing 0% to disc. Hiss, boo. See above for fix; also see the complete writeup here.
Are boot.udev and udevpurgeJ working? Yes, but see the earlier description of how to get /dev fully purged.
Automatic mounting: [9.3] If you insert a CD, nothing happens. But when you mount it manually, HAL will mount subfs on its /media mountpoint. Reason: /etc/fstab refers to /dev/hdc, should be /dev/sr0. Fixed, works now. Hotplug (?) signalled by HAL seems to mount the /dev/sr0 (type subfs) on its own mount point as soon as subfs on the other mount point mounts the device for real. This is probably harmless though annoying.
Command line forcefsck is recognized? Yes and no. boot.localfs honors it, but in SuSE 9.3 the root is mounted read-write from the beginning, and boot.rootfsck is incapable of checking it. (Bug has been reported to SuSE.) [10.0] The initrd is easily hacked... [10.1] They reverted to readonly mounting; boot.rootfsck again actually runs fsck and can be hacked to recognize the command line forcefsck.
boot.laptop.J -- are the tweaks still appropriate? Yes. See /usr/src/linux/Documentation/laptop-mode.txt, but the disc wants to manage power its own way. Discussion here.
boot.loadmodules -- are the preload modules still appropriate? Preloading b44 so it will become eth0 consistently. This is a really dumb kludge. I should figure out how to do persistent devices the right way.
boot.idedma -- are the tweaks still appropriate? /etc/sysconfig/ide is empty. With ata_piix, DMA is always used; don't mess around.
boot.sched -- are the timeslices still appropriate? Made smaller. This feature seems to be going away in kernel 2.6.12.
Does this machine have a hardware random number generator /dev/hw_random? No, hiss, boo.
Any weird-outs with firewall.J? Service ipsec-msft is now named ipsec-nat-t (port 4500). Bug during shutdown in detecting if the firewall was up, fixed.
Syslog events being logged? Yes.
Whatever we're using portmap for, does it happen? Used for FAM; don't know how to test.
At jobs (like xalarm) happening. Yes.
DNS: Working.
CUPS printing. Again, need to make /etc/cups/printcap owned by lp. Also again it had the wrong default destination (to fix: lpadmin -d lp). All done.
Keyboard font is being loaded and compose table is valid.
xntpd does sync the clock.
powersaved, can it adjust the CPU speed? Yes, 800 MHz when idle, 1600 MHz when busy. Using module acpi_cpufreq. But production is only about 1.05 work units/sec; should be higher. Do you suppose that there's a big improvement in acpi_cpufreq starting in kernel 2.6.12? No. This is a big mystery that will have to be worked on more. (Powersaved is the culprit, probably checking the battery too often. Fixed in SuSE 10.1.)
xdm, is the login screen right? Yes.
Cryptovault being mounted? Yes.
Kerberos -- Working. MIT libraries for PAM and other apps seem to coexist with Heimdal programs, e.g. kinit, klist, and the Heimdal server.
Postfix, test TLS mail to Harlech. Oops, session cache database support has changed; sdbm is no longer supported. btree is known to be useable. Fixed, mail is again working.
Apache -- Working, including TLS and CGI.
SSH -- Works, including GSSAPI and credential propagation.
FTP (vsftpd), which is started by xinetd -- Works.
OpenVPN -- Works.
Does Windows still boot? Yes.
These services are in use but were not properly tested.
At the end of installing the unmodified (or not very much modified) SuSE 9.3, I have the following problems:
Can't suspend to disc (software suspend). Now the problem is some non-obvious bug while writing the image to the swap file. In 2.6.12-rc1, one of the drivers gets stuck during resume; I have not yet figured out which one. All fixed in 10.0.
Radeon M300 graphics: fglrx (ATI proprietary driver for FireGL including Radeon X300) shows a blank screen. Reverted to the standard Radeon driver, without 3D. Hindsight: possibly I'm using the wrong AGP driver. I was using intel-agp.ko. It's an ATI graphics chip; should I be using ati-agp.ko? All fixed in 10.0 with fglrx-8.19.10.
The X-windows driver no longer recognizes the Alps touchpad. Workaround: configure it as a generic mouse, without touchpad features. Later, fixed.
CPU frequency: the acpi_cpufreq module is being used but it does not give the performance achieved on an earlier test. Fixed by dumping powersaved.
But ATAPI DMA is working.
Clearly I have to keep working to have the system the way I want it. My strategy for dealing with these problems is to advance to the most recent kernel, currently 2.6.12-rc3.
For my first attempt, the only patch I applied was the ATAPI DMA fix. The section that turns on the ATAPI features was rejected, for no reason that I could see (possibly a whitespace issue?), and I had to apply that one by hand, just two lines. It compiled uneventfully. I also jiggered /etc/modprobe.conf.local with these features:
The psmouse module must be loaded before evdev, otherwise it cannot successfully communicate with any PS/2 mouse or imitation, specifically with the Alps touchpad even in a generic configuration. Kernel 2.6.11.4 does not have a separate psmouse (and can't talk to the touchpad either, but can use it as a generic mouse).
The ati-agp kernel module should be loaded before either the radeon or the fglrx modules. I think. This will have to be tested. ati-agp.ko is present in or before 2.6.8.
Test results after this attempt:
Only if psmouse is loaded before evdev, the Alps touchpad is once
again fully functional. There's a kernel message when it's loaded:
input.c: calling hotplug without a hotplug agent defined.
Suspend-to-disc is now working. So far (cross fingers) it has not crashed. The SuSE initrd ends with a second attempt to resume, after the needed drivers are loaded, and this succeeds. Suspend has these good and bad behaviors:
Frequently but not always, /sys/power/disk is set to
reboot
. You need to echo shutdown
into this file
before suspending.
A 30-second timeout that formerly occurred while resuming is now gone.
The ipw2200 driver still successfully suspends and resumes. The beep upon resume is caused by this driver. There is no corresponding syslog message and the driver is able to associate with the AP and send packets.
Upon resuming, the Alps touchpad's configuration parameters are restored. With 2.6.8 you had to flip to another VT and flip back, to get them restored.
The fglrx (ATI 3D graphics) driver (version 8.12.10) is not going to fly because the kernel module lacks conditionalization for kernel 2.6.12; it uses (apparently in an essential way) some data structures that got changed between kernel 2.6.11 and 2.6.12.
Wrong. With some web research and digging in kernel sources I got the thing to compile. However, now the problem is that it attempts to assign a region 0x3fff0000 bytes long on a MTRR. The MTRR size must be a power of 2. I wasn't able to find where this is getting set.
Fglrx driver version 8.19.10 has no such problems.
ATAPI DMA still works. It can read sequentially from a CD at 2.1e6 bytes/sec (14X).
The processor still runs at 1.06 work units per second.
Modules compiled for kernel 2.6.12-rc1, specifically ipw2200,
do not work with 2.6.12-rc3 due to a disagreement
about the
version of symbol struct_module. This may be due to a change in
kernel configuration, but I copied the .config file and only
added/removed a few drivers. Recompiling makes the modules work again.
The rules have changed for reading one of the files which my stripchart displays, and it's getting charged for 10-20% of CPU time. This will have to be investigated and made to stop. The re-read interval was lengthened to 20 secs.
Current problems:
[9.3] Can't suspend to disc (software suspend). Now the problem is
some non-obvious bug while writing the image to the swap file.
In 2.6.12-rc1, one of the drivers gets stuck during resume; I have
not yet figured out which one. Update: I think the problem is with
ata_piix, and in 2.6.12-rc3, occasionally both on writing and reading
the image, it reports an ATA timeout (20 secs) saying Radeon M300 graphics: fglrx (ATI proprietary driver for FireGL
including Radeon X300) does not compile cleanly in kernel
2.6.12-anything. This fixed, it sets a MTRR range 0x3fff0000 bytes
long (fatal). So I'm still using the non-accelerated Radeon driver.
Use fglrx-8.19.10 or later.
In 2.6.11.4 the X-windows driver no longer recognizes the Alps
touchpad. Workaround: configure it as a generic mouse, without
touchpad features. Real fix: advance to 2.6.12-rc3, which seems to have
all necessary patches merged in, and make sure that the
psmouse module is loaded before evdev, because it cannot connect if
loaded afterward. You will need these lines in
/etc/modprobe.conf.local:
Unfortunately psmouse is not modular in SuSE's kernel 2.6.11.4
(although it could be) and so this fix will not work for SuSE 9.3. I
have not actually checked if a modular psmouse.ko, loaded in the
correct order, works properly in 2.6.11.4.
In any case, the order dependence is a crock.
CPU frequency: the acpi_cpufreq module is being used but it
does not give the performance achieved on an earlier test. This is
not caused by using the There is no Linux driver for the Ricoh RL5c476 chip that operates
the MMC-SD card slot. Hell will freeze over before this is solved.
[10.0] Well, I guess the ice age has come.
See here for discussion.
ata1: command
0xca timed out, stat 0x50 host_stat 0x24
, then retries the
operation successfully. Another suspend
misbehavior: occasionally the mode will switch to reboot
.
Keep your finger on the power button. [10.0] All this is fixed in
2.6.13.
install evdev /sbin/modprobe psmouse ; /sbin/modprobe --ignore-install evdev
remove evdev /sbin/modprobe -r --ignore-remove evdev ; /sbin/modprobe -r psmouse
Dothan
voltage table as
previously thought; it is caused by powersaved (not acpid). But I don't
know what powersaved is actually doing. The fast performance is
restored if powersaved is shut off; a drastically simplified substitute
does not spoil the performance.
Introduction | Configuration | Subsystems | Installation | Recommendations |