Starting in v9.2, SuSE includes a User Mode Linux kernel and utilities in their distro. However, the documentation is somewhat scanty. How do you make it useful?
My intention is to tell how to use SuSE UML as it is, warts and all. Several problems are discussed which could be worked around by rebuilding the UML kernel, which would be a good idea for an individual user on a home system. However, sometimes there are operational or political considerations which make it necessary or prudent to stick with the distro as-is. That's the focus of this document.
SuSE's UML will improve over time, and so where this document says certain features are inconvenient or broken, you should try again after upgrades and see if they've improved.
These notes pertain to the SuSE Linux version 9.2 (Professional) packages with these build numbers (after online updates):
It is hard to determine exactly which version of User Mode Linux is on the SuSE 9.2 DVD, or which patches it has, since the patch archive is not in the source RPM. It is dated 2004-10-04, so patches issued up to August are most likely in the kernel. The host does have the SKAS3 patch; this is kernel 2.6.8 with SuSE hacks.
You are working on a project that might well trash your system,
so you need an expendable virtual machine. For example, I am debugging an
installation script, and whether or not it's correct, it will replace
every package on the (virtual) machine. Very helpful for this kind of
project, User Mode Linux includes a copy on write
feature
that makes it easy to erase all the writes, thus reverting to a
prespecified initial state.
You have a network service which may be or is known to be insecure. When the evil hackers take over your virtual machine they can use it for their nefarious activities (if your firewall allows), but when you discover them you can simply kill the machine and revert to a known good copy.
When training system administrators (yourself or your students), you need to give them root access and let them modify any configuration file on the machine. If you can't give each one a dedicated physical machine, you can give each one a dedicated virtual machine. You can set up one initial configuration, and the per-student copy on write areas will be much smaller than if each student had the whole distro.
Suppose you have a particularly paranoid or demanding user, but
political or operational needs force you to accomodate him on your server, not
his private desktop workstation. He may be more comfortable if you give him
his own machine
. Of course you control the physical host and its
firewall, and you have at least read access to the virtual filesystem, viewing
it from outside. (Write access would break the virtual machine.)
You have a network project and you need a herd of interacting machines without the expense of physical processors, routers, etc. For example, a high availability cluster concept was debugged using two servers and two routers, all virtual machines on one box, whose real instance served as the client. Servers and routers could be killed at random intervals, only affecting the test session and not production services.
You want most of your services to use the latest kernel or distro version, but you have one or two applications that need an old version. The host and guest operating systems do not have to agree in version.
Virtual CPUs don't multiply by magic; the total compute power of your machine is shared between the real instance and the various virtual machines. And each virtual machine needs a root and (often) a swap filesystem, which tend to be large.
For designing and for system administration generally, you should treat your UML machine the same as your real machines. In particular, before starting the UML installation you need to decide several items, which are the same as for a real machine and for which you'll use similar policies. More details on most of these are given further on.
What are you going to use the machine for? This dictates the answer to the next item:
What software do you want on the machine? Software selection for a
minimal configuration is listed below, but if you want to provide services
to the outside world you will need software to do that work, for example the
Apache web server and its various modules and their numerous dependencies.
Conversely there's virtually no chance that you'll have a user running KDE on
the console
, so it's a complete waste to install the KDE or Gnome
desktop environments.
Given the software, what simulated hardware resources will it require? How much physical memory, swap space (if any), disc space for software, and disc space for writeable payload? What kind of networking will it require: none, low bandwidth, high bandwidth? Can you use a fixed IP address, or will you be forced to use DHCP? Are network broadcasts needed? (Yes, for DHCP.) Will wild-side (global internet) clients need to be able to reach the UML?
What security features are desired? How likely is it that your UML will get hacked? Which user will execute UML? Do you need a chroot jail? What services need to be let through, and blocked, by the network firewall on the host and on the UML? Do you want copy on write (COW) for quick disaster recovery?
On this point, a hacker executing as root on the UML can overwrite the
UML kernel, or install his own module to extend
the kernel, and
execute arbitrary code as the user executing UML. The kernel cannot be
improved to resist after a successful root exploit (with the possible
exception of selinux), same as on a real machine. Also actual security
flaws in the UML kernel may be discovered that let ordinary users do stuff
on the host using the executing user's identity. If the UML is doing
things that open it to hacking, it should be run by a special user who can
do no further damage (beyond having the UML itself trashed), and it should
be run inside a chroot jail, which is not easy to get out of.
What are its system administration needs? It needs online updates of
the SuSE distro, installation and maintenance of local software and scripts,
and local network services such as DNS and NIS. It would be a mistake to set up
a UML and then not maintain it just because it isn't real
.
You need to install these RPMs. (*) indicates packages required by yast2-uml, which are checked for at the time of creating the UML instance and installed if needed. Knowing what I know now, I think the minimal installation is kernel-um (on both the host and the guest) and uml-utilities (host only).
First you need to create plain files in the host OS which will be the discs of the guest OS, and you need to populate these filesystems. SuSE provides a script (uml-install-suse) and a YaST2 module (yast2-uml) to assist the process, but I found that both approaches had problems. I found that the script was hard to customize, but its default behavior installed too many packages for my small UML system and overfilled the default filesystem (512 MB) by 40 MB. The YaST2 module wants to create a completely new user to run the UML as, and you have to create the root device yourself anyway. And its assumptions about how you are going to use the UML were not exactly aligned with mine. Both the script and the module were not willing to do the installation over; you had to manually find and remove all their work before trying again.
I hand-customized the uml-install-suse script, but reinventing the wheel like this was inefficient, particularly in analysing package dependencies. So I gave up that approach.
Instead, I used the Install into Directory
YaST2 module, which
first appeared in SuSE 9.2. Here are the steps:
Install the packages that you are going to be needing, from the list above.
You need installation media for your guest operating system. If you're
using the distro DVD, mount that now. Most people will put the same version on
the guest as is on the host. If you need a different version, use Change
Source of Installation
so the right medium is turned on; but you'll have to
eventually go back to the SuSE 9.2 DVD (not CDs) to get kernel-um onto the
guest, even if the rest is back-version.
Decide where the images should go. They tend to be large, so pick a filesystem with enough space and consider the owner's disc quota (if restricted). For the root filesystem you will need a minimum of 512 MB, and I recommend 768 MB, same as for installing SuSE on a real machine. Depending on what you're doing you may want a swap partition. If used, it should be big enough to hold all the writeable storage of all the processes to be run simultaneously: same criteria as for a real machine. The steps below show creation of the swap partition; omit if you don't want one.
su
to root, so the image file can be mounted.
For easy description of subsequent steps I'm assuming you will now
cd
to the directory that will contain the images.
Now create the root and swap images. You can have up to eight drives
with 15 partitions on each, if it helps your project's design, but it's
simplest to stick with just these two. Here is the procedure (count = size in
megabytes, with the given blocksize of 1 megabyte). You can use any name you
want for the various image files. mkfs will complain root.img is not a
block special device. Proceed anyway?
Tell it y
.
dd if=/dev/zero of=root.img bs=1M count=768
mkfs -t ext3 root.img
dd if=/dev/zero of=swap.img bs=1M count=256
mkswap swap.img
mount -o loop root.img /mnt
I specified a filesystem type of ext3, to get the protection of the journal. Follow your local policies to choose the filesystem type, e.g. ext2 or Reiser, same as you would on a real machine.
Decide which user is going to execute User Mode Linux. For a jail
application you would execute it as a special user dedicated to the jailed
service, i.e. one that could do no further damage if the Bad Guys escaped
somehow from the jail. The executing user will need to be able to write on the
image files, except if copy on write is engaged, in which case he will need to
be able to write on the copy but only to read the original. Do the
appropriate chmod and/or chown commands (as root) to root.img and swap.img.
Normally the installation is done by root, not the executing user.
In a training situation you can chown /mnt (i.e. the mounted filesystem) to
the user, but you'll have to work out a separate mount point for each student.
The user
or owner
mount option in fstab can be helpful.
I'm going to assume that root will finish the installation here.
Now the root filesystem has to be populated. After trying the
uml-install-suse script from the uml-utilities package, and after trying the
yast2-uml module, I decided instead to use the YaST2 install into
directory
feature. The target directory must not exist at startup (that's
a crock). So here's what you do:
Create a filesystem as described above; 512 MB will be enough space if you delete a lot of software and don't add big packages like Apache.
Mount the filesystem on /mnt.
Decide what software you want. I was trying for a minimal system.
What SuSE calls minimal
can be trimmed down even more, but if you
delete everything it's very hard to set up and investigate the system, and
some packages drag in surprising (and ridiculous) dependencies. Here's the
list I finally settled on (in alphabetical order). If your project
requires additional software, such as Apache, add it to the list.
pscommand to list running processes.
fusercommand and friends.
vieditor clone. A reasonable alternative would be uemacs. Gnu Emacs is much too bloated for a minimal system.
Run YaST2. Under software
, make sure the source of installation
is right. Select Install into Directory
. Installation options:
Installation Summary. Mark the
autocheckcheckbox at the bottom. Now de-select everything (use the minus key) except the packages listed above and/or specifically required for the project. Initially, all packages are marked with a checkmark and a white triangle (selected in the basic list). Dependent packages won't go away; they change to a checkmark and a black triangle. If the package that requires them is later killed, the dependent package will also be deselected. There are dire warnings when you try to deselect a number of the packages, but they are dependent and stay selected.
Let 'er rip. Remember to get kernel-um onto the guest by an alternate route if you're installing from media that doesn't have it. Installation of the above minimal set of software will take only about 5 minutes and will use up 302 MB, leaving 180 MB for payload in a 512 MB (gross) filesystem.
mv /mnt/gorf/* /mnt ; rmdir /mnt/gorf
After the installation, edit or replace important files in /etc and /root. If you have to do the job repeatedly (e.g. if you make mistakes), you will find it convenient to make a directory, let's call it preset/, with subdirectories etc and root which contain the files listed below. Then copy them all at once thusly:
(cd preset ; tar cf - .) | (cd /mnt ; tar xpf -)
Important files that I had to edit were:
linux.site.
Discs), you will want it in fstab.
Device | Mntpt | Type | Options | Freq | Pass |
---|---|---|---|---|---|
/dev/ubda | / | ext3 | defaults | 1 | 1 |
proc | /proc | proc | defaults | 0 | 0 |
devpts | /dev/pts | devpts | defaults | 0 | 0 |
/dev/ubdb | swap | swap | defaults | 0 | 0 |
hwclockto read the CMOS clock, and since there is no underlying hardware, it complains. You could jigger the script so it just exits, or add a harmless lie to etc/sysconfig/clock (q.v.)
exec $command -d -f, where command = halt or reboot. At least with SuSE's UML kernel build, if you run UML as root, this will do the reboot(2) system call on the UML kernel, which will call the BIOS to halt or reboot the machine -- the host machine. Until this is fixed, if you might ever run UML as root, you should edit /etc/init.d/halt (the last command) to print a message and sleep forever. Then use the management console to give the "halt" command.
S0:12345:respawn:/sbin/mingetty --noclear ttyS0 xterm. Specify --noclear so the last page of console messages remain visible in the xterm. If you aren't using the virtual consoles you should also comment out their gettys; otherwise init will try to start a getty on each one and it will fail, producing a load spike and a swarm of error messages every five minutes. In short, adjust inittab to match the console and serial line arrangement that you are going to specify on the kernel command line.
export LD_ASSUME_KERNEL=2.4.21. With the default SuSE /etc/profile, it will be sourced when root logs in. If you use (t)csh on your UML, make an analogous uml.csh file that does
setenv LD_ASSUME_KERNEL 2.4.21. The reason for this is discussed under
Gotcha: Thread Local Storage.
ttyS0so root can log in to it.
HOSTTYPE=s390. The IBM System/390 architecture also lacks a CMOS clock, and the whole boot.clock script is bypassed neatly.
chroot /mnt insserv -d /etc/init.d/boot.proc # Etc, give full path insserv -d /etc/init.d/network # Do in approximate startup order exit # To exit from chroot
#!/bin/ash case $1 in net | usb ) : ;; * ) export UDEV_NO_SLEEP=1 export UDEV_NO_DEVD=1 ;; esac exec /sbin/udev.static $*
Don't forget, unmount the filesystem now.
Assuming you want COW (copy on write), create your COW file: uml_mkcow
root.cow root.img (i.e. the new COW file, then the readonly instance which
it overlays). You can let the kernel do this automatically when first
booting, but it's neater to do it explicitly. Do the needed chown and/or
chmod so the user to execute UML can write on the COW file. The COW file
is sparse: do ls -sl root.cow
and you will find that it is slightly
longer (by addresses) than the base file, but occupies only 20 blocks of
512 bytes (when first created). To use, give a kernel parameter of
ubda=cow,readonly
, i.e. ubda=root.cow,root.img
.
If the type of your root filesystem is other than ext2, e.g. ext3 or Reiser, you will need an initrd. As a matter of fact, I've tried booting the SuSE UML kernel without an initrd and it was not able to fall back to ext2 as a real machine would; I never discovered why. So it would appear that you always need an initrd. Here is the command line; you need a symlink to the UML kernel to be in the current directory for this to work. It must be done as root.
ln -s /boot/linux ./linux
mkinitrd -b $PWD -k linux -i initrd.img -m ext3 -d /dev/ubda
Command line arguments:
After your UML has modified the filesystem, you can propagate those changes (if you believe they're not going to have to be reverted) back to the readonly copy with one or the other of these commands:
uml_moo root.cow root.new #(makes a new copy of the root)In both cases, uml_moo knows the filename of the original root from the header in the COW file. The second form is a lot faster and doesn't require disc space for two copies, but if anything goes wrong, the original is half updated and half not, i.e. it is destroyed or seriously damaged.
uml_moo -d root.cow #(overwrites the original)
uml_moo does not reinitialize the COW file. To reclaim the disc space, delete the COW file and create it again as in the previous step. It doesn't hurt to not reinitialize the COW file (because the COW data is equal to the readonly data after uml_moo, unless later modified), but it wastes disc space.
Now the root filesystem is set up, but you also need to arrange for networking (if used), and you need to set up console access. These will be reflected in the kernel command line parameters, and there are several miscellaneous parameters that will be necessary.
A key decision on the network is whether you will have a fixed IP address, or whether you will have to use DHCP. If DHCP, you will need to create an Ethernet bridge on the host so the guest's broadcasts can reach the DHCP server. I bypassed this complication by using a fixed IP address. If you need bridging, see Rusty Russell's HOWTO for a good description.
There are several ways to handle networking. In the descriptions below it's assumed that you are configuring the UML's eth0, but you can use any eth number and you can have arbitrarily many differently configured ethernet interfaces. You will also have to configure your host's and guest's firewall; this is discussed briefly later. These are the currently supported networking transport types:
Packets are injected directly to the host's packet bus.
This is by far the simplest and the highest in performance. A
forerunner of TUN/TAP in kernel 2.2.x was called Ethertap
.
On the kernel command line you will specify
eth0=tuntap,,,ipaddr
, e.g. eth0=tuntap,,,192.168.1.1
.
The IP address is for the host side of the tap, and usually
is the assigned IP address of the host, but if that is inconvenient,
it works just as well with a random address, but not copying
the IP address of the UML guest. The omitted parameters are the
name of the tap device (you're letting UML obtain one) and its MAC
address (some random number).
It relies on a setUID helper uml_net to set up the tap device.
Note: for security reasons SuSE does not install it setUID;
you need to do it yourself and add an entry to /etc/permissions.local
to make sure the setUID bit stays set. On a public server, a hostile
user can assign any IP address he wants so as to make his UML
impersonate other machines such as your authentication server. If your
security policy forbids such a possibility, root can preconfigure the
tap device, in which case you would put on the kernel command line
eth0=tuntap,tap0
(substituting the tap device that you got).
See the HOWTO for details of how to pre-configure the tap device.
Another note: if you use udev, it has a compulsion to run
the if-up (interface initialization) script on /dev/net/tun, even
though it is not a configured (or configurable) interface, and it takes
fully 12 seconds, so the helper will time out. So root should do this
ahead of time, for example loading that module at boot. Add tun
to the space-separated list in MODULES_LOADED_ON_BOOT in the host's
/etc/sysconfig/kernel.
The tuntap transport works reliably provided you do these items:
Configuration issues: If the guest is going to communicate off-site, e.g. for DNS, NIS or web access, or for offering services, someone will need to configure the host to act as a router. /usr/bin/uml_net takes care of this and the proxy_arp item, but if you preconfigure the tap device you need to do these yourself:
echo 1 > /proc/sys/net/ipv4/ip_forwardIf systems other than the host need to be able to originate connections to the guest, the easiest way is to turn on proxy ARP: if the tap device is tap0:
echo 1 > /proc/sys/net/ipv4/conf/tap0/proxy_arpThen assign the guest a unique IP address on the same subnet as the host, and the host will attract guest packets to itself when outsiders send an ARP broadcast. But a networking purist would want you to assign the UML guest an IP address in a separate subnet, and to set up proper routes through the host on all machines needing access.
An alternative to proxy ARP is to use the masquerade feature of iptables, for connections originated by the guest, or DNAT for connections from outside to the host, which are to be handed off to the guest. In any case, the host's firewall will need to allow thru traffic (on selected ports that are actually important to your mission) from the guest to the outside world, and vice versa if you're offering net services, neither of which it would have needed to do before UML.
It uses a userspace helper, not setUID root, to
masquerade IP packets from the UML as if they were coming from the host
system. On the kernel command line specify
eth0=slirp,,/usr/bin/slirp
, giving the full path name of the
userspace helper (or wrapper script). If you want to use all the
features (see the HOWTO) you may need to write a simple wrapper script
that execs the helper with the desired command line options.
Unfortunately SuSE doesn't include the man page for slirp and so
you'll have to read it from the web.
Firewall issues are limited to letting in the types of packets important to your mission, but since all packets are coming from and going to the slirp helper on the host, the firewall does not need to take into account the existence of the UML.
Use this to let multiple UMLs on one ethernet
segment to talk to each other. But it can't communicate with machines
not so configured: specifically, it can't talk to the host itself.
It's sufficient if each UML has eth0=mcast
on the kernel command
line. See the HOWTO for additional possibilities.
For within-host communication, only a very aggressive firewall would even see the packets, but for transport between hosts the host firewall would have to be configured to let the multicast packets through.
This is another way for multiple UMLs on one host to talk to each other. Performance is said to be better than with multicast, and you can optionally tie into a TUN/TAP device for offsite comunication (but if you do that, you might as well use TUN/TAP directly). See the HOWTO for details on the switch daemon. There are no firewall issues with the switch daemon.
Using the SLIP protocol into your UML is supported but is obsolete, being more trouble than it's worth.
Special for intrusion detection machines, its purpose is to monitor packet traffic on the host system. Firewall issues are the same as if the sniffer were used directly on the host, i.e. you need to let in the packets that you expect to sniff.
As with networking, you can have multiple virtual terminals (consoles)
and serial lines on your UML, connected to the host by various transports.
con1
is the device identifier for virtual terminal number one, generally
accessible on the UML as /dev/tty1, and so on for any number. con
without the number is used to configure all consoles except those explicitly
listed. Similarly, ssl
is the generic designator for serial lines,
whereas ssl0
will refer to /dev/ttyS0. In the examples we'll be
configuring con1. Remember that for most of these to be useful, you need a
getty running on the guest's tty, normally configured in /etc/inittab, and if
root is going to log in, the tty basename (e.g. ttyS0
) must appear
in /etc/securettys.
Unfortunately, in this kernel build the con (console tty) devices are broken. You will need to use serial lines (ssl) exclusively until it is fixed.
Here are the available transports:
Xterm will be executed with the appropriate arguments to attach to the UML's console or line. It is assumed that the user executing UML can use X-Windows. This is by far the easiest way to handle the main console.
(Example: con1=port:9000) If you telnet
to that port of the host (even from off-site), you will be connected to
the guest UML's console or serial line. For this to work, the
telnet-server
package must be installed on the guest, and you
should not have a getty running on the guest's tty. There is
no requirement that the network be configured on the guest. You're
allowed to attach multiple devices to the same port, like all of them:
ssl=port:9000
. Each connection will get a separate telnet
session and tty until all devices are in use, after which connections
will hang. Remember that telnet is not encrypted and plain-text
passwords will be visible if it is used across the net. Ssh is more
secure.
You can direct particular devices to the file descriptors of the host's UML process. The assignment shown here is present by default, and con0 is used as the main console unless overridden. For all transports you can list the input, comma, then the output device, but that's useful only for fd transport.
The device is directed to /dev/null, i.e. output is thrown out and it blocks if you try to read it.
The device disappears completely.
With devfs or udev the device inode is not even created; with a
conventional /dev the device cannot be opened. You should not
configure a getty on a none
device.
UML will allocate an available pseudo-tty (/dev/ttyxx in the first case, or /dev/pts/n in the second, depending on which the host is configured to support). The kernel messages at boot will announce which pseudo-tty goes with which console or serial line. You then attach to it with a terminal program such as minicom or kermit.
UML will attach to the designated real tty. This is most useful when attaching to one of the host's virtual terminals (that doesn't already have a host's getty on it). You can connect to the slave (tty) end of a pty/tty pair provided a terminal program is already connected to the master.
My project involves a lot of use of YaST2 to create installation scripts.
One way to run it is to slogin -X name_of_uml
(as root), assuming you
have installed the yast2-qt package and that you have set up openssh to allow
root to get in. If you prefer to use a text console for reasons of security
or UML disc space, the different console and serial line transports have
more or less limitations. Let's discuss how to control YaST2 in text mode,
using the help function
on the start screen as an example. Its label has H
highlighted as a
hotkey, so various key combinations involving H
will activate it.
For all transports, escape-H and F1 are recognized, or if all else fails, hit tab or shift-tab enought times until the label is highlighted and hit return. On the host console with gpm (mouse daemon) running, you can click on a label with the mouse, but as far as I can tell, none of the UML transports have access to the mouse. On the host console and when a UML serial line is attached to a host virtual console (e.g. ssl1=tty:/dev/tty8), alt-H will be recognized, but alt is useless for other transports; also for slogin without -X or a local xterm. (con1=tty could not be tested; probably it will also recognize alt.)
The discs of UML are referred to as ubd
(User Block Device) followed
by a disc letter and possiby a partition number. On the kernel command line
you would typically write ubda=root.img
, designating the first disc and
giving the name of the root image file, or ubdb=swap.img
to put the swap
image on the second disc. If you are doing copy on write, you list the COW
file first, e.g. ubda=root.cow,root.img
. In the guest you would mount
/dev/ubda on the root.
In principle you can put your image on a real disc partition or a whole hard disc, but this is rarely done.
There is currently some confusion about the device names. Formerly ubd0 to ubd7 were preferred, but the driver now advertises ubda to ubdh, which will be the names created by udev. The kernel recognizes either name set on the command line, but uml_mconsole only recognizes the ones with numbers. The default SuSE /dev provides ubda through ubdd.
The kernel will complain that the block device has a foreign
partition table, and will proceed to treat the whole file as a single
partition. However, you can create multiple partitions on your disc,
same as on a real one, after which you can mount /dev/udba1,
/dev/udba2, etc. If you use the YaST2 partitioner (System category)
from within the guest, it will signal the driver to re-read the
partition table, something will create the device inodes (udev? yast?),
yast will create the mount points you specify, add them to /etc/fstab,
and mount the new partitions on them.
Note that even if you have udev, you will need to pre-create the /dev inodes for your root and swap devices, and the /dev/root symlink, since SuSE's /etc/init.d/boot.udev runs after swap is turned on and the root fsck occurs. (It's feasible to run it first thing if you mount a RAM-based filesystem over /dev, like devfs did, but SuSE doesn't get that aggressive.) udev will create inodes for only the devices listed on the kernel command line. If you do need to create the inodes by hand, the major number is 98 and the minor number is 16*disc number plus the partition number if any:
mknod /mnt/dev/ubda b 98 0
mknod /mnt/dev/ubda2 b 98 2
mknod /mnt/dev/ubdb b 98 16
etc.
If you append r
to the device name, e.g. ubdar, the
disc becomes readonly. If you append s
, writing is
synchronous, i.e. the file is opened with O_SYNC and the guest will
block until the data is actually sent to the underlying hardware:
taking a big performance hit for some protection if the host crashes.
You specify r or s on the kernel command line only; everywhere else
you give the unmodified device name.
Here is how to directly access one host directory on the guest.
This happens as the user running UML. Although ls
reports the owner
and mode of files same as if run on the host, suggesting that root can write
on lots of files, you will get permission denied.
It is possible to make a hostfs directory the guest's root and boot from it. For details see the HOWTO.
In your kernel command line you will need some miscellaneous parameters:
It's very helpful to give the UML guest a name, which you can use with the management console. The UML's hostname is easy to remember. If this parameter is missing, a default of random letters will be provided.
Unless you've configured it properly, you should make sure selinux (the high security feature) is turned off, at least on a SuSE kernel.
Set this to 1 to see all kernel messages on stderr immediately (for kernel debugging). But if this is your only console, init will fail to recognize the console as being a tty and will give nasty messages. It's supposed to be off by default, but evidently SuSE turned it on.
The amount of simulated physical memory on the guest, in bytes, but units of k or m may be appended.
It's recommended to stick the RAM image in a tmpfs filesystem. This
is a ramdisc that can grow or shrink as needed and which the host can
push off to its swap space, whereas if the RAM image is put on a file
in /tmp, the host has a compulsion to keep the disc file up to date,
which detracts from performance. /dev/shm (a directory, not a device,
even though bogusly in /dev) is a good candidate on SuSE 9.2. To make
this happen, export TMPDIR=/dev/shm
when starting UML.
Tmpfs is bounded in size; the default is half of physical RAM. If
the UML RAM image is bigger than this, you need to mount (or remount)
/dev/shm with a bigger limit (in bytes): mount -t tmpfs -o
size=268435456,mode=1777 tmpfs /dev/shm
or the equivalent in fstab.
If the guest(s) use more memory than is on the host, the host's swap space will be used, same as with any other process, and performance will go way down. But if the memory is not actually used, there is no performance penalty. If the guest has too little memory for the processes it is running, it will use its own swap file (if provided), again with a performance penalty. Think about swap issues when allocating tasks to UMLs.
If UML needs over 256 MB ram, take the memory size in bytes and divide by 4096, rounding up. As root, echo that number (or more) into /proc/sys/vm/max_map_count, whose default is 65536 (2^16). This is a denial-of-service resistance feature, which unfortunately interacts badly with how UML manages memory. The map limit is per process, i.e. separate for each guest.
Specifies the filename of a compressed initrd image. I was not able to boot without an initrd.
Which device has the root filesystem. The default is hardwired as /dev/ubda.
Which device is the main console.
The default is the first con device listed on the command line.
You may put the console on a serial line. You may have several
console= parameters to send console messages to multiple devices.
It's common to say stderr=1 console=stderr
in addition to
your other console assignment, to get kernel messages on stderr
of the UML kernel itself.
To get the kernel to print all the command line parameters.
A multi-threaded process like UML needs someplace to put its writeable static data, separate for each thread. Different kernels have different styles of mapping this memory. The 2.6.x kernel allocates one of the memory management registers for this purpose. But the UML kernel has no access to the underlying hardware. Libc is able to fall back to older styles if necessary, but the decision is made per the kernel version, not on whether the kernel can actually get the required register. (This is a crock.)
Therefore, when you run UML on a 2.6 kernel it will start the boot process, abuse the TLS, and give itself a mysterious stop signal. To make the library fall back to a feasible TLS style, do this:
export LD_ASSUME_KERNEL=2.4.1(SuSE uses 2.4.21 TLS style, which seems to be effective.)
Also, once you've logged in to UML you need to do the same thing; otherwise any system utilities that are multi-threaded will segfault: network utilities in particular. In the section on configuration files a suggestion was made to set this in a file in /etc/profile.d.
At last we're ready to start the UML machine. Here is the script that I'm using (minus comments):
#!/bin/sh export LD_ASSUME_KERNEL=2.4.1 export TMPDIR=/dev/shm exec /boot/linux \ umid=petra \ selinux=0 \ mem=256m \ initrd=initrd.img \ ubda=root.cow.img,root.img \ ubdb=swap.img \ eth0=tuntap,,FE:FD:C0:09:C8:C3,192.9.200.195 \ stderr=1 \ console=stderr \ con=none \ ssl=none \ ssl0=xterm \ console=ttyS0
You need to substitute your own values for several parameters: the umid (instance name), memory size, image filenames, and the tap's ether and IP addresses.
On my 1 GHz laptop, with this configuration, it writes boot messages on stderr, due to the kernel parameter of stderr=1, and within a second or two it starts an xterm as the main console which also shows the boot messages. (The advantage of stderr=1 is that you still see the boot messages if the xterm cannot be started.) Since I am using udev, it churns for about 25 secs creating devices for the initrd, and then takes almost 60 secs to create more devices in the real root. (The latter step takes about 300 secs if udev rules allow all registered devices to be created (550 totally useless pseudo-ttys). The initrd always uses udev, but if the main system doesn't use it, the latter delay will not happen.) If, however, you replace /sbin/udev as detailed previously, device creation in the real root takes five seconds flat.
Watch the kernel messages for signs of problems. If the console xterm fails to start, watch the system load; when it drops, the boot process is over and you may (or may not) be able to get in using slogin and/or the management console.
You can now log in on the console xterm (as any user with a password in the guest's /etc/shadow or network authentication), or you can use slogin to connect if you configured it. If the console driver were not broken and if you had a getty on tty0, you could also log in to stdin-stdout of the UML process itself. This document of course does not cover how to accomplish useful work on your UML machine: basically, treat it like a real machine.
The correct way to shut down UML is to do shutdown -h now
or
/sbin/halt
or telinit 0
(all nearly equivalent), or to give the
cad
(control-alt-delete) command from the management console, which
signals init to do the ctrlaltdel action, which on a SuSE system runs shutdown.
All of these end with a UML kernel panic, which is disconcerting but is not
really harmful if it happens after the discs have been unmounted.
If you just kill it from the outside, it's like pulling the plug on a real
machine: the disc will not be synced and data likely will be lost.
For UML there is no difference between reboot and halt; both kill the UML process. Basically the same is true for a real machine: the kernel terminates by jumping into the BIOS for the reboot.
UML catches a number of signals, but none of them terminate it gracefully. I tested SIGTERM, SIGINT and SIGUSR1 (didn't test USR2). All of them produce a UML kernel panic. With INT, UML just exits after the panic. With TERM it gets a segfault. With USR1 it spews random bytes on stderr before dying. I tested each signal only once, and the behaviors might vary randomly according to data in memory.
When UML starts up, it deposits in $HOME/.uml/$UMID (the latter being the
value of the kernel parameter umid) a PID file and a socket. If you execute
uml_mconsole $UMID
you can give commands over this socket.
magic sysrq keyon a real machine, e.g. to sync the discs, unmount everything, or reboot, except that the provided kernel lacks sysrq (unlike the SuSE kernel for the host).
arch/um/kernel/reboot.c -- whatever it's doing, the UML kernel panic when you halt should not be happening. The i386 reboot.c seems to also be in the kernel, and if UML is run as root and if this syscall is performed (by /sbin/halt), control actually passes to the BIOS to halt or reboot the host. (So don't do that!) The UML version of reboot.c ought to completely replace the i386 version.
I see that quite a number of signals are being caught, but when you give those signals to a running UML process (from the host), a UML kernel panic ensues followed in some cases by a segfault or spewing lots of garbage on stderr. SIGTERM, at least, should do something useful such as sync, unmount, boot as if given through the sysrq key.
It should be possible to map I/O on a guest device to I/O on
the same host device, within reasonable limits. This came up for
/dev/random -- since the guest has little entropy production and no
access to the hardware random generator (if on the motherboard), crypto
key generation was hanging for a long time. Jeff Dike suggested that
someone
should implement a pass-through driver that could do this.
The append
option of hostfs is documented, but I'm surprised
that readonly
is not also implemented.
I'm a firm believer in udev, which is why I'm putting it in my real and UML machines before my distro does, but the UML application apparently shows udev in its worst light. In SuSE 9.1 and 9.2 the initrd invariably uses udev to obtain an inode for the real root device, which in important cases may be on a RAID device that ought to be modular in a distro kernel.
First, udev runs incredibly slowly: about 12 seconds per block device, 5 seconds per tty, and 2 seconds per mem device. Here are the overall times (in seconds) to initialize /dev in different conditions, pre-excluding unwanted inodes.
System | Initrd | Main | Inodes |
---|---|---|---|
Host | 7 sec | 12 sec | 74 (pruned) |
UML Guest | 25 sec | 55 sec | 60 (pruned) |
UML Guest | 25 sec | 300 sec | 636 (all) |
Strace reveals that the 2 seconds is spent sleeping, waiting for files that aren't available for the particular device type, whereas the longer times are spent running scripts in /etc/devd which for the involved devices ultimately accomplish nothing. Earlier I showed a wrapper script for /sbin/udev that sets UDEV_NO_DEVD and UDEV_NO_SLEEP. It would be a lot less brutal if the program could somehow be smarter about recognizing when these actions were actually going to be useful.
Udev goes to a lot of work to populate /dev on the initrd with all devices known to drivers hardwired in the kernel or loaded in the initrd, including 512 pty-ttys which the initrd can't possibly use, so it can get just one device inode to mount as the root. Better that it should first load the prespecified drivers, and then translate the root device by searching in /sys -- seemingly a very fast operation -- and then use udev to create just that one inode.
If the full set of device inodes actually have to be created in the initrd, the boot scripts should copy /initrd/dev/* to /dev, and udevstart should do a lot less work with inodes that already exist than with those that need to be created. Gnu tar can save and restore device inodes.
In the bad old days of devfs, you would refer to a device inode, and if it did not exist the open would block. Devfs would call modprobe, giving the user-requested filename or an internal equivalent (e.g. block-major-22) as an argument, to look up and load the driver using /etc/modules.conf. Devfsd would then create the device inode, which would then be opened.
Now, if a device gets plugged, specifically when the tun/tap module gets inserted, a long chain of events occurs which takes fully twelve seconds for tun.ko. (It runs the if-up script on /dev/net/tun even though this is not a configurable net interface.) Any application using it, like the initrd's linuxrc, or a UML init script that sets up tun/tap networking, needs to poll the device inode waiting for it to appear.
I would be overjoyed if the kernel gurus could make insmod block until module initialization was complete, and specifically until udev had created as many device inodes as its rules allow.
Some drivers recognize minor numbers only for actually existing
minor devices, e.g. specific disc drives and the partitions on them. But
character drivers tend to allocate a fixed large number of minor numbers,
specified when the distro's kernel is compiled.
For example I get eight nvidia devices. Give me a break; I have only one
head on my nVidia Hydra III-Max
graphics board. I don't need 16
RAMdiscs, 512 pttys, 63 virtual consoles, etc. But in special cases a user
may need more than the distro gives him. As a gross kludge I have
put in udev rules to prune the minor devices; however, this is
fixing
the problem with blunt tools in a place that should not be
involved. Every driver of this type should have a parameter to set the
quantity of minor devices, preferably handled in a generic subroutine
(analogous to tty_io.c) and not coded individually in each driver. Zero
should be a legal quantity, in which case the driver would not be
initialized at all even if hardwired in the kernel or dynamically loaded
during coldplug.
It's a historical crock that various kinds of ttys share major
device 4: virtual consoles, generic serial lines, and the UML
stdio_console. Another command line parameter for ttys should be
minor_start, because the various devices' minor ranges can, and for SuSE
UML, do overlap, with disastrous results. Better that each major device
should have a separate number. There are only eight unassigned major
numbers, but 98% are completely absent from any one host. Sun-Solaris
assigns major numbers sort of on the fly
, and having the major
number as a parameter to the driver would make a lot of sense.
Forget /usr/bin/uml-install-suse; encourage users to use the improved yast2-uml.
Often the installation has to be done over repeatedly. Allow the sysop to specify an existing user to own the image file. Let the sysop choose whether to overwrite an existing image from scratch, or to adjust the software content, taking into account a COW file. Use the -U switch of rpm to bypass reinstalling already installed packages.
Let yast2-uml depend on all the packages it's going to use, such as uml-utilities, rather than checking for them at runtime. Include in the documentation of the other packages, a note that the user is encouraged to install yast2-uml, so as to drag in the kernel and all needed utilities, and then to use yast2-uml to set up his UML image.
Speaking of which, uml_utilities needs man pages for the utilities such as uml_mconsole, uml_mkcow, uml_moo and slirp, plus a /usr/share/doc/packages directory.
Somewhere there should be documentation of exactly which version of UML is provided and which patches are applied. The patch archive should actually be in the source RPM, as described in the spec file.
Users, particularly newbies, need a lot of help setting up /etc on the guest during installation. See my long list above of files that needed to be configured. Yast2-uml should make this happen.
Some important applications of UML involve back-version guests. Yast2-uml should be flexible enough to install a back version of SuSE.
There seems to be some confusion about the kernel and the modules. As it is now, you need both the modules and the kernel on both the host and the guest. The modules should end up only in the guest, but the kernel should end up only on the host, under a unique name per version. I suggest putting it in /boot, same as other kernels. If you really need /usr/bin/linux, make it a symlink to /boot/linux which is a symlink to the latest (or most recently installed?) UML kernel.
When the initrd is built (executing on the host), the guest's modules, libraries, config files (/etc/udev/rules.d) and mkinitrd script should be used, so if a back version is installed, everything is self-consistent. But mkinitrd needs to see the kernel to know its version, and the initrd has to be delivered to a directory on the host, not on the guest. Some fancy symlinks will be needed.
The prestuffed initrd (package um-host-install-initrd) is not very useful if you need a module for booting, e.g. if the root filesystem is ext3 or Reiser. You should generate the initrd anew for each installation (perhaps with an option to re-use a customized initrd that is the same for multiple UMLs). Include in yast2-uml's disc form the ability to choose the filesystem type, including swap, and somewhere (options form?) provide a space to specify other miscellaneous modules.
The startup script that SuSE provides seems to use the most difficult networking arrangement, the switch daemon with an auxiliary tap device and an Ethernet bridge, in order to allow the guest to get an IP address with DHCP. Certainly there should be an option to skip the bridge, and I think TUN/TAP (with the setUID uml_net helper to set it up) is much easier for the user. If the setUID involvement is not acceptable, slirp probably would be my second choice. Perhaps the network transport mode should be selectable in yast2-uml.
It's a matter of judgment whether the stderr console driver should be turned on by default. With it off, the first (non-stderr) con device becomes the console by default; with it on, stderr would be the only console, and the user has to explicitly list console=con1 or whatever, but you get the advantage of seeing why UML fails to start, if the console xterm doesn't appear.
Whatever happened to make the minor device range of the real
virtual terminals (4:1 to 4:63) overlap the range of UML stdio_console
(4:0 to 4:15), it's almost certainly SuSE's fault, and badly needs to
be fixed. I'm guessing that the VT's were supposed to have been suppressed
entirely, but weren't.
There are an awful lot of drivers (9.7 MB) provided for UML, but it's not clear how an ordinary user could have the permissions needed to make some of them useful, particularly the RAID drivers.
The UML kernel should have the magic sysrq key configured, same as the kernel for the host.
If the distro can do anything to mitigate the disaster with Thread Local Storage, SuSE should do it. It ought to be possible to suppress v2.6-style TLS with a linker parameter, like LD_RUN_PATH can help with libraries in a nonstandard place, rather than lying about the kernel version in LD_ASSUME_KERNEL at runtime.