Network for Xena's Virtual Machine

James F. Carter <jimc@jfcarter.net>, 2018-07-05

I have a laptop (called Xena) with a Wi-Fi network connection, which hosts a virtual machine (called Petra) using libvirtd, qemu and KVM; a second VM is possible. The normal way (for me) to put a VM on the net is to create a bridge on the host, which becomes the host's main NIC, with the wired Ethernet NIC as a member, as well as a virtual network connection from each of the VMs. This does not work when the egress port is Wi-Fi.

Wi-Fi emulation of Ethernet is not complete, and in particular, there are corner cases which don't work. Therefore the Wi-Fi developers have flatly forbidden membership of a Wi-Fi NIC in a bridge or equivalent constructs. Specific problem areas are:

When the Wi-Fi client authenticates to the access point, the Wi-Fi specs interpret the resulting promise as meaning that traffic will be disclosed to or originate from only the entity that authenticated, which is certainly not the case if the client NIC is a member of a bridge. However much we may know about trust relations on our own LAN, the developers have not provided any way to relax this restriction. (The hack to sabotage it is simple.)
Broadcast or particularly multicast packets are emulated poorly in Wi-Fi. They have to be sent individually from the AP to each client. The AP is not going to take into account multiple clients on a bridge.
The Wi-Fi packet layer has space for four addresses, two of which are the MAC address of the client's Wi-Fi NIC, and of the sending station, for use in a mesh or in AP to AP forwarding. Normally these two are identical, and (as an optimization?) the sending station's address is normally left blank. The driver can be gotten into four address mode, and it works, and a four address NIC can be accepted as a member of a bridge, but it doesn't work enough to be used functionally in production.

So I'm going to have to put together another network solution for the VM. Here are my requirements, in approximate order of importance.

My net uses both IPv4 and IPv6 (dual stack). Permanent hosts are assigned fixed IPs, which are available by DHCP, and there are DHCP pools for transients (guests). Router advertisements are sent and RFC 4862 addresses get configured in parallel with the fixed IPv6 addresses.
Except for special cases, each host has its own firewall and is safe from infected neighbors on the local LAN. The gateway has additional features to keep wild side peers in their cages.
The guest must be able to originate a (unicast) connection to the KVM host, to other hosts on the local LAN, and to the wild side.
The KVM host and other hosts on the local LAN must be able to originate a (unicast) connection to the guest. Also peers on the wild side, if they would have been able to do so to a non-virtual host, which would generally involve a VPN.
The guest should be able to use broadcast and multicast services such as DHCP and mDNS, including when the host is on foreign LANs.
The guest's networking should function when the KVM host is on the local net, and also when it is on a foreign (wild side) net.
Access to the local net from the guest on the wild side should follow the same security and authentication requirements on the guest as on the host.
The KVM host uses NetworkManager, and I want to keep it because of its flexibility and nice GUI support. The guest uses whatever the rest of the hosts use, currently Wicked. I need the guest to be flexible and to have an environment that looks normal, so I can test new network infrastructure such as systemd-networkd.
When the KVM host is on the wild side and has a VPN to home, as it usually will, it would be a nice feature if the guest had automatic access to the local LAN, but this is not required.

Xena as a Router

I tried several networking schemes without success, detailed below. Now I'm reverting to an earlier concept: Xena as a router.

KVM creates on the VM host a network interface called vnet0 which bridges packets to or from the guest's eth0. In a simple setup the number 0 is fairly consistent, but it can vary randomly at boot time, and you need a setup procedure that works with any interface name.

KVM and libvirt have several styles to give network access to the guests. One is bridged networking; this is what I normally use on non-wireless VM hosts. It actually basically works without a bridge, but libvirt is going to be a lot happier if you create the bridge that your VM XML file says that vnet0 should be put into. Before starting the guest.

brctl addbr br0 -- Creates the bridge.
brctl show br0 -- Lists bridge characteristics and members.
brctl stp br0 off -- The spanning tree protocol is for breaking bridge loops automatically. It can produce strange effects (if you don't quite understand what it's trying to accomplish), and loops are impossible with only one bridge, so don't bother turning it on. Off is the default.
brctl delbr br0 -- Deletes the bridge and evicts all the members.

brctl is deprecated and its facilities have been added to the ip command from the iproute2 package.

ip link add name br0 type bridge -- Creates the bridge.
ip link set dev br0 up -- It's not up by default; you have to turn it on. After you have configured its addresses and routes.
ip link set dev $IFC master br0 -- libvirt will do this for you, but if you did need to add some other member, here is how to do it.
ip link show master br0 -- Lists the bridge members.
ip link set dev $IFC nomaster -- How to remove a member (rarely needed).
ip link del br0 -- Deletes the bridge and evicts all the members.

NetworkManager has a connection type of bridge and in the GUI you can configure the bridge to be created at boot. Gotcha: if NetworkManager gets restarted, vnet0 will be evicted from the bridge and will not be reinstated automatically; you will need to add it by hand using one of the above commands.

Normal networking is designed to work with subnets, not individual hosts. I was able to get networking mostly working with an independent vnet0, not in a bridge, but the results are a lot cleaner if I create the bridge.

At present, CouchNet has these subnets. The default gateway for all of them is Jacinth, except as noted.

192.9.200.0/25 -- Future expansion
192.9.200.128/28 -- OpenVPN dynamic addresses, port 1194/udp, 4 useable
192.9.200.144/28 -- OpenVPN dynamic addresses, port 443/tcp, 4 useable
192.9.200.160/29 -- StrongSwan dynamic addresses (7 useable)
192.9.200.192/26 -- Main CouchNet
192.9.200.220-253 -- Main CouchNet dynamic addresses (33 useable)
2001:470:1f05:844::/112 -- Main CouchNet
2001:470:1f05:844::1:0/112 -- Main CouchNet dynamic addresses (01-1f configured)
2001:470:1f05:844::2:0/112 -- StrongSwan (IPSec) dynamic addresses
2001:470:1f05:844::3:0/112 -- OpenVPN dynamic addresses, port 443/tcp
2001:470:1f05:844::4:0/112 -- OpenVPN dynamic addresses, port 1194/udp

To this collection I'm going to add:

192.9.200.169/29 -- Bridge for VMs on Xena (5 useable); the bridge itself will have 192.9.200.169 under the name xenavm.
2001:470:1f05:844::5:0/112 -- Bridge for VMs on Xena; the bridge itself will have 2001:470:1f05:844::5:1 .
52:54:0:9:c8:a9 -- MAC address of the bridge on Xena.

Do this to activate forwarding. The equivalents of these commands could go in /etc/sysctl.conf or /etc/sysctl.d/01-whatever.conf . When changed they reset most parameters to their defaults, so they should be executed early, and sysctl.d fragments are executed in lexical order.

echo 1 > /proc/sys/net/ipv4/ip_forward
echo 1 > /proc/sys/net/ipv4/conf/all/forwarding
echo 1 > /proc/sys/net/ipv6/conf/all/forwarding

The bridge on the VM host (or the naked vnet0 device) needs these addresses and routes. Although it's legal to use the same IP address for both the bridge and the egress interface (wlan0), I have found that fewer strange things happen and it's less confusing if the bridge has its own IP address (called xenavm). IPv4 is shown but these need to be duplicated for IPv6.

ip addr add 192.9.200.169/29 dev br0 -- br0 needs its own IP address (xenavm).
ip route add 192.9.200.168/29 dev br0 -- This is the prefix route. It will be added automatically when you assign the address (with CIDR bits). A problem with re-using the host's address is that a prefix route will appear, whereupon the host will send all local LAN traffic to the guest, which will toss it. To avoid that, omit the CIDR bits and/or append the keyword noprefixroute. Then make the above route explicitly. If you gave the bridge its own subnet and its own address therein, all that would happen automatically.

Xena, my VM host, uses NetworkManager, and I created a NM connection which creates the bridge automatically with these parameters as fixed IP addresses. One minor detail, I want to use a fixed MAC address for the bridge, because the guest's firewall requires that its peer's MAC be registered as trusted. Also I want to be able to recognize bridge traffic when seeing tcpdump output or error messages that contain the EUI-64. As a local convention, for my VMs I use 52:54:0: which is the assigned OUI (MAC range) for KVM, followed by the last 3 octets of the interface's fixed IPv4 address. I'm using the same convention for the bridge. The NetworkManager GUI for Edit Connections does not have a text box for overriding the MAC address. But see this blog post about making NetworkManager set a fixed or random MAC address by Thomas Haller (2016-08-26). He shows how to use nmcli to modify the connection file (/etc/NetworkManager/system-connections/br0, on Xena), as well as how to display the available parameters and the connection name. For example:

nmcli connection modify br0 bridge.mac-address 52:54:00:09:c8:a9

All other hosts on the local LAN need a route via the VM host to the bridge's subnet:

ip route add 192.9.200.168/29 via 192.9.200.195 -- Sending to the guest via the VM host.

The guest needs theese addresses and routes. The goal here is to have this machine be as normal as possible. One component of being normal is to have the address and routes appear by DHCP and/or IPv6 Router Discovery.

ip addr add 192.9.200.170/29 dev eth0 -- The guest's own address.
ip route add 192.9.200.168/29 dev eth0 -- The prefix route that is provided automatically with the above address.
ip route add 192.9.200.169 dev eth0 -- A host route to the default gateway is required, unless a prefix route applies, which it does in this case.
ip route add default via 192.9.200.168 dev eth0 -- Normal hosts use the LAN's gateway for the default route, but the guest's packets go via the VM host (which forwards them to the LAN gateway or to the correct LAN member).

At this point, bidirectional communication is achieved between the guest and these peers: VM host, LAN gateway, other LAN neighbor, offsite peer. This is on both IPv4 and IPv6. (The IPv6 commands are analogous but are not shown.)

Normal networking does not require a bunch of manual commands every time you start up your VM. I'm setting up dnsmasq and radvd to provide the required address and routes by DHCPv4, DHCPv6 and IPv6 Router Discovery. Global issues about dnsmasq:

dnsmasq is required by libvirt-daemon-driver-network but I should put it in /m1/custom/extra.sel anyway. This is the list of keystone packages specially installed on each host.
I'm using systemd-resolved, and /etc/hosts contains the addresses and name of each permanent host including Petra, so systemd-resolved will pass it out in response to a DNS query. I'm turning off DNS on dnsmasq.
systemd-resolved listens on port 53 address 127.0.0.53 for DNS queries. If dnsmasq is going to also listen to port 53, it will fail to start up. Therefore dnsmasq has to start before systemd-resolved, which is not inhibited by a competitor on the generic bind address. But systemd-resolved provides a file similar to /etc/resolv.conf which is supposed to be a symlink to that file, and until systemd-resolved has started, /etc/resolv.conf cannot be read. Catch-22.

Features for /etc/dnsmasq.d/*.conf:

interface = br0 -- Only serves the bridge, not wireless. It's too complicated if dnsmasq has to serve both.
dhcp-range = 192.9.200.170, 192.9.200.174, 12h -- Required to turn on DHCP. But it won't be used; there should not be any foreign guests on this subnet.
dhcp-range = 2001:470:1f05:844::5:2, 2001:470:1f05:844::5:1f, 112, 12h -- Same for IPv6. The mode (if used) comes before the prefix length.
enable-ra -- You need to tell it to send IPv6 Router Advertisements.
ra-param = br0, 30, 1800 -- The interface, (priority, not set here), interval (secs) between RA's, and the lifetime (secs) of the announced prefixes and routes. On the wireless net, multicast RA's are often lost, so I set a long lifetime. Not really needed for br0.
quiet-ra -- Normally it announces in syslog when it sends a RA every 30 secs. Shut that up.
dhcp-option = option:router, 0.0.0.0 -- Announce the bridge as the default router for IPv4. 0.0.0.0 is replaced with the interface IP. There is no option6:router; the clients have to get that from Router Advertisements.
port = 0 -- The port to listen on for DNS queries. I'm using systemd-resolved; 0 positively disables DNS in dnsmasq.
no-resolv -- Suppresses reading /etc/resolv.conf. See Catch-22 above for the reason.
dhcp-host = 52:54:0:9:c8:aa, 192.9.200.170, [2001:470:1f05:844::5:2], petra.cft.ca.us, 24h -- I found it necessary to have explicit dhcp-host commands for the client(s) with fixed addresses. Normally I would do read-ethers, which makes it read /etc/ethers and /etc/hosts, and create the equivalent of dhcp-host lines from them. But it stopped working and I never figured out why.

The hosts on the local LAN need to be told to send traffic for the VM guest (Petra) via the VM host (Xena). Dnsmasq on the local LAN's gateway sends out this route for IPv4, and I'm using radvd on the VM host to send it for IPv6. Key features in /etc/radvd.conf:

interface wlan0 { … } -- This is for the wireless interface. In another incarnation I served the bridge also from radvd, but it seemed to have less screwups and junk in syslog if I used dnsmasq.
IgnoreIfMissing on; -- Can't guarantee that the interface will be up when radvd starts.
AdvSendAdvert on; -- For actually sending Router Advertisements; this is not on by default.
AdvDefaultLifetime 0; -- You do not want local LAN peers to use Xena as a default route.
AdvManagedFlag on; AdvOtherConfigFlag on; -- These tell local LAN peers to try to get a fixed IP with DHCP and also to ask for additional information like the MTU and DNS server. These flags have to be set the same on all RA senders and each of them will complain when receiving an inconsistent RA from another router.
route 2001:470:1f05:844::5:0/112 {
AdvRouteLifetime 1800;
}; -- At last, the route to the VM subnet. Radvd implicitly uses the link-local address of the interface from which the RA is being sent, as the via parameter of the route.

Additional flies in the ointment that had to be fixed:

Petra's firewall was rejecting DHCP and Router Advertisements from xenavm because its MAC address was not on the list of trusted hosts.
It's important that the bridge should have a consistent MAC address. The procedure to set it was described earlier, search for nmcli.
At various times various hosts would lose their IPv6 addresses. All the hosts including Petra run a script that checks for a reasonable network setup and fixes it. The 112 bit prefix length for main CouchNet is new, and the fixup script would fix the prefix route with a 64 bit length, rather than routing all but bridge partners via xenavm.
Petra would accept Router Advertisements and set the correct IPv6 default route through xenavm, but after 20-40 secs that route would disappear. I never found out why, but Router Advertisements were definitely involved. I wrote another network fixup script just for Petra (called jdhcp-dflt) which, when the default route goes away, emplaces an identical one explicitly. RA's can't override it, and the problem is solved.

With those miscellaneous fixes, Petra can configure its network autonomously at boot, and can make and accept the connections listed in the requirements. So this project has come to a successful conclusion, if we're not too picky about various kludges.

Pseudo-Prerequisite: systemd-resolved

Avahi-daemon gives endless trouble, losing its addresses for no obvious reason. I've thought about just not having a mDNS service. But systemd-resolved is a new entry in this area and I'm going to try to get it to work before giving up the whole service. Here's what systemd-resolved does. Most of these sub-services can be turned on or off in the configuration file, /etc/systemd/resolved.conf .

DNS stub resolver: it can't be authoritative for an entire zone, but its main purpose is to keep aware of real DNS servers on the local net and to forward queries to them, or to wild-side DNS servers. Signed responses are validated with DNSSEC; unsigned responses can be rejected or accepted without validation. Responses are cached locally until their TTL expires.
LLMNR: Link Local Multicast Name Resolution. This is a Microsoft-ish protocol (RFC 4794..5) to elicit A and PTR records from hosts using link-local addresses, possibly exclusively. Systemd-resolved can make and respond to such queries.
Named link local addresses are not a major part of my operation, and to avoid waking sleeping dragons, I have turned off this feature.
Systemd-resolved can respond to queries over dbus, the recommended mode of use.
There is a NSS module for the hosts map in glibc, that submits such dbus queries. Glibc subroutines like getaddrinfo and gethostbyname can reformat the results and deliver them to the caller.
On my system this module is in use and works fine.
Systemd-resolved listens on 127.0.0.53 port 53 for unicast DNS queries, and it maintains a file similar to /etc/resolv.conf with this server IP and port. It's recommended (but not required) to make a symlink from /etc/resolv.conf to this file.
Hosts on my net have this link, and the content is delivered reliably.
For mDNS (Multicast Domain Name Service), systemd-resolved listens on port 5353. Clients can send to this port on the mDNS multicast addresses 224.0.0.251 and ff02::fb, or unicast to the server's own address, and systemd-resolved will get the queries (verified by strace). Whether it will respond is another matter. Responding can be disabled globally, and also has to be enabled explicitly on each desired interface (e.g. not on the wild side) in /etc/systemd/network/xxx.network . Systemd-networkd has to be running to pass this configuration information to systemd-resolved.
However, I was never able to elicit a mDNS response. (The same tester succeeds with avahi-daemon.) Comments in changelogs suggest that OpenSuSE Tumbleweed (systemd-237) may have mDNS responses disabled but I don't see where (or why) this is done in the spec file.
mDNS is only used to query the zone .local., and .local. is only available via mDNS or by the dbus interface, not by conventional DNS on port 53.
Avahi-daemon and systemd-resolved can coexist, with Avahi (and not systemd-resolved) responding to mDNS queries on port 5353. Systemd-resolved still receives registration information so it can serve the .local. domain from its dbus interface. This is the mode I am operating in, successfully.
Systemd-resolved responds to 'A', AAAA and PTR queries for local addresses or names. If an address is known locally this information is preferred and a trans-net query is not made. When the information sources change, such as /etc/hosts, systemd-resolved updates its cache. These are the addresses handled as local information:
- Everything in /etc/hosts.
- The local hostname from /proc/sys/kernel/hostname, using the addresses of all network interfaces.
- localhost and localhost.localdomain.
- _gateway, which is the target(s) of the default route(s).
On my net, all of this local information is available.
As for 1-component names, on my net they are all in /etc/hosts and systemd-resolved can send that authoritative information. But if they weren't in /etc/hosts, systemd-resolved would resolve them by a multicast LLMNR query, if it were enabled, which it isn't on my net.
Queries for multi-component names and IP addresses (not otherwise known) are forwarded to DNS.
DNS Service Discovery (DNS-SD) records are maintained and delivered by systemd-resolved, for services that register themselves with DNS-SD. Presumably registration happens by a dbus protocol. The DNS-SD records are these, where $SERVICE represents e.g. ssh, $PROTO is the protocol used for that service (normally udp or tcp), and $HOST is the host that provides this service. Examples are shown for ssh/tcp, which I use for (successful) testing because all my hosts provide it.
- _$SERVICE._$PROTO.local. , _ssh._tcp.local. -- PTR(s) to host(s) providing the service.
- $HOST._$SERVICE._$PROTO.local. , kermit._ssh._tcp.local. -- a SRV record for the service, which normally would give $HOST.local. (kermit.local.) as the host field.
- _$SERVICE.local. , _ssh.local. -- a TXT record with service-specific information; e.g. my Kerberos has the realm name in its TXT record. The record is empty for ssh.

In conclusion, I did not succeed in my main goal of replacing Avahi with systemd-resolved. However, I think I have made progress in cleaning up an area of my network infrastructure that was making a lot of trouble on my VM, that I didn't want to deal with while debugging VM networking.

Alternative Methods that Didn't Work

Proxy ARP

One possible solution involves proxy ARP. (2009-06-24, OP bodhi zazen.) Remember that ARP (specifically proxy ARP) is only defined for IPv4. Here's a summary of his howto:

The tunctl program is in package tunctl in OpenSuSE, Fedora, Centos. Try uml-utilities in Debian or Ubuntu (may be obsolete).
Create a tap device, specifying the user to own it: the one that KVM runs as. During booting, either this should run before sysctl or you will need supplementary commands to turn on proxy ARP (next step).
tunctl -u $LUSER
Turn on proxy ARP. You will want to add these to /etc/sysctl.conf or a fragment in /etc/sysctl.d .
echo 1 > /proc/sys/net/ipv4/ip_forward
echo 1 > /proc/sys/net/ipv4/conf/wlan0/proxy_arp
echo 1 > /proc/sys/net/ipv4/conf/tap0/proxy_arp
This all assumes static IPs and routes. For readability let's set some variables. I'm using his values; of course I would have to change them to represent my own net.
net=192.168.0.0/24
gateway=192.168.0.1 (DNS server is here too)
hostip=192.168.0.10 (/24)
guestip=192.168.0.20 (/24)
Turn on the tap:
ip link set tap0 up
ip route add $guestip/32 dev tap0
KVM command line arguments to do networking through the tap:
-net nic -net tap,ifname=tap0,script=no

Corresponding domain XML. Set the MAC address according to local conventions (I use the last 3 octets of the IPv4 fixed address). I think the address can be omitted and will be auto generated.

	<interface type='ethernet'>
	  <mac address='52:54:00:19:b2:bf'/>
	  <script path='no'/>
	  <target dev='tap0'/>
	  <model type='rtl8139'/>
	  <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
	</interface>

Give the guest a static IP of $guestip/24, a default route of $gateway, and add $gateway to the DNS server list. Since the guest is a VM and you have direct access on the host to its disc, you can mount the disc before starting up the guest and edit files, or once it's running (but without the net), log in to its console and use your favorite configuration GUI such as yast2 lan. The guest should now be on your net.

It looks like the proxy ARP method won't fly because IPv6 is impossible.

Tunnels of Various Kinds

Another family of solutions involves tunnels. The basic design would go like this:

The emulator uses bridge networking. Its emulated NIC appears on the host as a tap device which libvirt creates, and libvirt also puts it into a bridge on the host. (The guest thinks it was given a normal Ethernet NIC.)
The host has a tunnel endpoint which it has also put into the bridge. Thus packets from the guest are copied down the tunnel, and packets emerging from the tunnel are made available to the guest. There are several varieties of tunnel.
Bearer packets are ordinary IP traffic and leave or enter the host via its default route. There are no quibbles about the Wi-Fi NIC being tangled up with bridges.
The other end of the tunnel is going to be on my gateway. It is part of the bridge which includes the wired Ethernet to the local LAN, as well as the gateway's own VM. Thus all kinds of traffic on the local LAN are manifest to the guest, and all the guest's traffic goes out to the local LAN. (The bridge has optimization rules so packets are omitted if they are irrelevant to a particular bridge member. Basically, the guest gets a packet if it's broadcast, or multicast and the guest has registered in that group, or is unicast to the guest's MAC address.)
How the bearer packets get to the other end is irrelevant. The route could be direct to the Wi-Fi access point on the gateway, or to a foreign AP and from there, likely after NAT, through the global Internet to my gateway's wild side interface, or through a VPN from the host to the gateway. All three paths require authentication which is not part of the tunnel mechanism.
Some tunnels provide cryptographic privacy and integrity. I'm assuming that the guest tunnel will provide the same kind of protection (i.e. none) that the host's own traffic receives. For complete protection for the host's traffic as well as guest traffic on the tunnel, the host should use a VPN. Another mode is to use application-level security such as TLS or DNSSEC (integrity only) on both the host and the guest. On my net, generic wild-side packets are admitted only after authentication.
Fly in ointment: Both the gateway and the host, if on the wild side, have dynamic DHCP addresses. Fortunately these change rarely, lasting days to months. But when either one changes the tunnel will be broken and will have to be reestablished. Payload packets may be lost, but this is a fact of life on the Internet, and assuming quick reconnection, the payload datastreams will recover with no fuss.

What kind of tunnel might I want to use? These are some key aspects of the tunnel:

Authentication means that the remote end expects the local end (laptop) to prove who it is (so it can decide if the client is authorized), and the local end expects the gateway to prove who it is, versus some Black Hat in the middle.
On the wild side the traffic generally passes through various equipment and agencies that are not chosen by the user and may not even be ascertainable. Any of them may be, or are already known to be, infested by Black Hats. Privacy means that the Black Hats cannot obtain the payload information being transmitted. (But generally the Black Hats can see the IP addresses of the endpoints; ways of obfuscating these, such as TOR, are out of scope.) Impossible is a relative term, and generally if the Black Hats can crack the encryption using a few billion dollars of equipment working together for a year, that is considered to be adequate privacy.
Integrity means that the tunnel is aware if data arrives that was different from what was sent. This can happen due to noise on the communication line, or software errors, or attempts at fraud. There is a range of strength choices for integrity checksums. Not all of them can resist a competent Black Hat.
On the local LAN I can use whatever protocols I please, but on the wild side many hotel nets are very restrictive, e.g. blocking IPSec, or all use of UDP. For this case my laptop uses OpenVPN on TCP port 443: this is far from ideal, but that port is normally used for HTTPS and if it were blocked the hotel's network would be useless. I plan to choose the protocol freely, but if it's blocked I will use the VPN, same as if application protocols on the host were blocked.

I require authentication, so only the authorized laptop can connect to the gateway's endpoint and so it can be sure that the intended endpoint is being connected to. I'm providing for the guest the same privacy and integrity that the host gets, i.e. none. However, some protocols like SSH cannot turn off those features, and I won't reject them just for that reason.

Here is a promiscuous list of varieties of tunnel, with evaluations.

SSH

While normally it is used as a point to point link to a single client session, it can be switched to generic tunneling at layer 2 or 3 (link or network, i.e. IP packets).

I'm familiar with SSH tunneling, it is widely trusted including by me, and I already have the authentication infrastructure (public and private keys) in place. But you can't turn off encryption. SSH remains one of the front runners. But beware of TCP meltdown!

OpenSuSE Tumbleweed is using package openssh-7.7p1 at the time of writing.

SSTP (Secure Socket Tunneling Protocol)

(Not to be confused with Simple Symmetric Transport Protocol.) It runs PPP (point to point protocol) over TLS (Transport Layer Security). Authentication is required for both layers.

Assuming PPP can be made to defer to TLS for authentication, this looks like a possible winner. I already have the X.509 certificates needed for authentication. But beware of TCP meltdown!

There is a package NetworkManager-sstp for OpenSuSE Tumbleweed, community contributed (several instances). Except it has a missing dependency (libnm-gtk.so.0()(64bit)) that I can't find.

OpenVPN

On CouchNet, it is normally used at the network layer (3), but it can be switched to the link layer (2).

I'm familiar with OpenVPN and use it regularly on the host. That's both good and bad: it would have to continue to run at the network layer. I'm a little worried about committing to have OpenVPN running on the host at all times. A more serious complaint is that I still need a tunnel protocol for the guest: OpenVPN provides a tunnel from the host's whole network stack to the local LAN, but I need part of that stack to be a tunnel that carries the guest's traffic to the gateway, and OpenVPN can carry the tunnel, but would have trouble to be the tunnel itself at the same time.

Another possibility is an OpenVPN tunnel from the guest itself to the gateway, but this would mean that I could not test or develop generic networking on the guest.

IPSec

It does privacy and integrity, and to establish the Security Association the peers need to mutually authenticate. In IPv6 IPSec is implemented just as another packet header identifying the Security Association and giving the Message Integrity Code (HMAC); all that follows is encrypted, and as part of removing and obeying the header the kernel decrypts subsequent headers and the payload. But IPv4 headers aren't so flexible, and a separate IP protocol (ESP and/or AH) is used. In Linux the payload is considered to be received on the same interface as the bearer was, so for routing purposes IPSec isn't really a tunnel.

PPP: Point to Point Protocol (RFC 1661)

Wikipedia article about PPP. It's a link layer (2) protocol designed to work over alien links including ATM, in addition to IP. PPP includes (or could include) authentication, privacy and integrity (just a CRC, not cryptographic). Also compression. It is very modular and includes setup modules (Network Control Protocols) for most known protocol families.

Generally PPP is not an independent tunnel but is used as the stuffing for another tunnel protocol.

gre, ip6gre

Wikipedia article about GRE, q.v. for relevant RFCs. It can do encryption using RC4 which is deprecated. Can do integrity, but it's not too clear how cryptographically robust the checksums are. PPTP uses slightly modified GRE packet headers. GRE was developed by Cisco and they have appliances that use it. GRE is popular in Windows shops.

Given the questionable security and the hassle of PPP/PPTP authentication, I'm not going to waste time trying to set up GRE.

L2TP

Wikipedia article about L2TP. It's a hybrid of Cisco's L2F and Microsoft's PPTP. It encapsulates PPP (point to point protocol), but L2TPv3 can bear other link-level protocols. The outermost bearer packets use UDP. L2TP doesn't do authentication, privacy or integrity, but prepended IPSec headers can do so, and the interior protocols also can do so.

This protocol looks viable, but probably it has various frusrating issues which will turn up when I try to actually implement it. I will investigate it if none of the front runners pan out.

VXLAN

Wikipedia article about VXLAN (Virtual Extensible LAN), RFC 7348. Its goals are being scalable to large cloud nets. It has the equivalent of VLANs. A lot of vendors and software support it. Open vSwitch is one of these. Its main target is cloud isolation within a multi-tenant datacenter. Bearer packets use UDP. They contain the entire (almost) Ethernet frame that the guest would have sent on a wired connection.

It would be a big commitment to learn how to make this work. I don't see a whole lot of support for authentication. I doubt I will be using this one.

ipip, ip6ip6, ipip6

Encapsulates IP packets; does not handle non-IP packets.

The lack of authentication makes this protocol unsuitable.

isatap and SIT

Two IPv6 transition mechanisms that transmit IPv6 payload packets over a IPv4 network. No IPv6 bearer packets.

I need to handle mixed IPv4 and IPv6 traffic and I already have IPv6 payload capability; these transition solutions will not be helpful.

vti, vti6: Virtual Tunnel Interface

I wasn't able to find much information about this mode.

So the front runners are SSH and SSTP. I think I'm going to try SSH first,

Tunnel Details

One way to create the tunnel is by

ip tunnel add NAME mode any? remote ADDR local petraguest pmtudisc dev br0

However, this doesn't include SSTP and so this approach is useless.

There are a lot of companies that publish setup guides for SSTP. The idea apparently is, your client establishes a SSTP tunnel to their server, which is not free, and the result is a VPN. According to the ExpressVPN docs, SSTP is owned directly by Microsoft and is available for Windows only, which doesn't exactly match with jimc's experience.

On Github there exists sstp-server by sorz (Shell Chen). A package of it apparently is available on Arch Linux but not OpenSuSE. pppd is a prerequisite.

I think that I'm going to follow my original plan and try ssh first.

ssh -w any -o Tunnel=ethernet

-w any means to create a tunnel device (with any number) on the client host and similarly on the server. You can also specify fixed tap numbers.
Tunnel=ethernet puts it into layer 2 mode, relaying Ethernet-type packets, versus point-to-point (layer 3) which relays Ethernet payloads only, without the MAC layer header.

It also needs in the server's /etc/ssh/sshd_config:

PermitTunnel yes, allowing both layer 2 and 3; you can also allow one but not the other. CouchNet does have this turned on.

The infrastructure manager will want to create a tun/tap device and put it in the relevant bridge. Here's a tutorial on doing this by waldner (2010-03-26). He has discovered an undocumented feature of iproute2 (the ip command); do ip tuntap help for a usage summary.

ip tuntap [add|del|show|list|help] mode [tun|tap] [user U] [group G] [name itsname]
ip tuntap add mode tap name tap8

Cutting off the tunnel idea. I got a lot of it working, but one issue killed it: Xena communicates over Wi-Fi and the normal channel ends in Jacinth's br0. Xena also creates a tunnel, whose endpoints are in Xena br0 and Jacinth br0. This creates a loop. Various maneuvers were used to keep Petra traffic in the tunnel and Xena traffc on Wi-Fi (including tunnel bearer packets), including turning on the Spanning Tree Protocol on one or the other bridge, but they were not effective enough, or were too effective, killing transport between various endpoints.