Routing on CouchNet (Interim)

James F. Carter <jimc@jfcarter.net>, 2016-10-29

This is an interim report on changes to routing on CouchNet. Eventually it should be merged into a complete document.

Since early 2016 or maybe before, I have struggled with two behaviors seen only on Xena, the Wi-fi laptop, only on IPv6. First, the default route would expire at random intervals. And it would sometimes get into a mode where SSH would timeout during banner exchange, in other words, the connect(2) system call returns OK but no packets come back from the peer.

Network Geometry

The basic geometry goes like this:

Jacinth is the main router, bastion host, master server, etc. I know that putting all these functions together is risky for security, but mine is a small home net and there aren't enough resources to split up the functions.
Jacinth hosts the virtual machine Claude, which runs a webserver (HTTP/80). Jacinth does DNAT so this webserver is accessible from the wild side without authentication.
Jacinth has a bridge containing hostapd's wireless NIC, the vnet for Claude, and wired Ethernet for the local LAN. Jacinth also has a point-to-point 6in4 tunnel (protocol 41) via Hurricane Electric (tunnelbroker.net) for IPv6 .
Xena is my laptop, which connects via 802.11n Wi-fi to Jacinth. It also has wired Ethernet, which is used only for initial installation and for distro upgrades. It has a virtual machine which is currently unused pending resolution of network routing issues: a 802.11 client is apparently forbidden from being a member of a bridge.
All other bare metal hosts have Ethernet to the local LAN. They all have a bridge containing the Ether NIC, plus the vnet for their virtual machines (if any; some VMs run only for special projects).
There are two virtual machines that run all the time: Oso for general and administrative development, and Claude for the webserver (HTTP 80) accessible from the wild side. All the virtual machines use paravirtual wired Ethernet: no recursive bridges and VMs.
There are three VPN services on Jacinth:
- IPSec
- OpenVPN on port 1194/udp (the default)
- OpenVPN on port 443/tcp. Repressive regimes and overly paranoid hotel networks often block VPN ports, and you can use 443/tcp to cheat; however, censors may notice that traffic is going to other than a webserver. You are also vulnerable to TCP meltdown.
All of these services offer IPv6 transport, inner IP addresses (on Jacinth's net), and use of Jacinth's DNS server. However, some clients are unable to use some of these features.
My network has a globally routeable IPv6 prefix from Hurricane Electric, and a tunnel through them to the global Internet. For IPv4 it uses a forgotten address range similar to but separate from those specified in RFC 1918. Jacinth's wild side interface has an aleatory IPv4 address from my ISP, and Jacinth does NAT from internal addresses for wild-side connections. My ISP is firmly stuck in the 1900's and does not know about IPv6.
Every machine has an aggressive firewall. Jacinth has a service called OOBA Out of Band Authentication which wild side clients must use to get to internal net services. Only the VPNs, SMTP on 587, HTTP on 80 (DNAT to Claude), and OOBA itself are accessible to an unauthenticated client.
The entire local network uses dual stack IPv4 and IPv6. Route information for IPv6 is sent by radvd on Jacinth, together with the net prefix; it is sent by DHCP for IPv4 clients.
All permanent hosts including the VMs have fixed IP addresses. The IPv6 address is $PREFIX::$OCTET where $PREFIX is our assigned network prefix and $OCTET is the last octet (in hex) of the host's fixed IPv4 address. All address resolution services (DNS, LDAP, /etc/hosts) know about these addresses, as does DHCP, which sends them to the clients when asked. IPv4 and IPv6 both.
The hosts, if capable, are expected to do RFC 4862 auto-configuration, giving themselves an IPv6 address of $PREFIX:$EUI-64 (see the RFC for how to compute the EUI-64). DNSMasq keeps track of and serves PTR records, but not AAAA records, for these addresses. The RFC 4862 addresses are a holdover from a prior design and are not normally used.
Various DHCP-like entities have pools of aleatory addresses in separate ranges. These are for the use of guests, and on VPNs.

Default Route Timeouts

The first problem, default route timeouts, occurred because 802.11 is flaky about sending multicast packets. In one test I had Router Advertisements coming from Jacinth about every 60 secs (random); in a 71 minute test I caught 65 RA's going to ff02::1 (all nodes multicast), but Xena received only 23 of them (65% lost), and there were 6 incidents where no packet was received for over 300 secs, which then was the default route lifetime, so it expired. Long intervals of lost packets seem to occur more often than expected by chance.

It was helpful to max out the lifetime at 1800 secs, but this did not give a complete cure. So I wrote a daemon that on most hosts checks if there is a default route, and if not, it sends a Router Solicitation, reliably receiving a unicast Router Advertisement. But on Wi-fi hosts it unconditionally sends the RS every 300 secs. I wanted to send the RS only when the default route was about to expire, but I could not find the expiration time in any of the /proc/net/ files. [Update: now I less efficiently do ip -6 route show, get the expiration, and wake up the daemon just before.]

In another incident, Jacinth's DNSMasq was configured to send Router Advertisements, but it didn't, causing the default route to expire on all hosts. When restarted it sent the RA's for about 5 minutes, then went back to not sending them. I reverted to using radvd to send RA's, and the default route was again reliably present. I never found out why DNSMasq stopped sending RA's. This is dnsmasq-2.71.

Black Hole Route

The SSH timeout turned out to be a routing issue. SSH was not the only affected service; HTTP also would not connect, but the clients successfully failed over to IPv4 after a while, which SSH did not do. Before interventions, these were the routes on all hosts. Where the device is shown as br0, virtual machines had eth0 and Xena had wlan0, when that is the master device on the local LAN. Jacinth's addresses end in c1 (fixed) and 3044 (EUI-64 from RFC 4862).

2001:470:1f05:844::/64 dev br0 (local LAN's prefix)
fe80::/64 dev br0 (link local +EUI-64)
default via fe80::201:c0ff:fe12:3044 dev br0 proto ra (all but Jacinth)
default via 2001:470:1f05:844::c1 dev br0 (static on all but Jacinth, Oso, Xena)
2001:470:1f05:844::/108 via fe80::201:c0ff:fe12:3044 dev wlan0 proto ra (Xena only)
2001:470:1f05:844::c3 dev wlan0 (Xena's own address)
2001:470:1f05:844:42b8:9aff:feb1:9c85 dev wlan0 (Xena's own address)

I'm not sure (after fixing stuff) where 2001:470:1f05:844::/108 came from or why it was only on Xena, but it covers incorrectly the VPNs that are on Jacinth. The actual address ranges are:

2001:470:1f04:844::0/64	192.9.200.128/25	Entire local LAN
2001:470:1f04:844::0/112	192.9.200.192/26	Static IPs on main net
2001:470:1f05:844::1:0/112	192.9.200.224/27	DNSMasq dynamic adr (on link)
2001:470:1f05:844::2:0/112	192.9.200.160/29	StrongSwan (IPSec) dynamic adr
2001:470:1f05:844::3:0/112	192.9.200.128/28	OpenVPN-443 dynamic adr
2001:470:1f05:844::4:0/112	192.9.200.144/28	OpenVPN-1194 dynamic adr

I fixed /etc/dnsmasq.d/dhcp.conf to send out

dhcp-option = option:classless-static-route, 0/0, 0.0.0.0, 192.9.200.160/29, 0.0.0.0, 192.9.200.128/28, 0.0.0.0, 192.9.200.144/28, 0.0.0.0

For IPv4 DHCP in option:classless-static-route, the default route must be included and should come first, in addition to option:router, and 0.0.0.0 is replaced by the IPv4 address of the net interface on which the DHCP response is being sent out. It would have been neater to use DHCP6 to send out IPv6 routes, but there is no corresponding option for DHCP6. There is, however, a RFC draft for DHCP6 routing which is slowly moving through the standards process.

I fixed /etc/radvd.conf to send out a prefix of 2001:470:1f04:844::/112 (on link, not autonomous because too few bits), and it has routes for the three VPNs. I could combine two of them into a /111 route, but I had a lot of trouble with the getting the bits to come out right, so I kept all three routes separate. Radvd has a nasty habit that it truncates all routes to the length of its prefix, e.g. if your prefix is 2001:470:1f05:844::/64 and you define a route to 2001:470:1f05:844::2:0/112 and dump the Router Advertisement (with radvdump) you see that the route is allegedly to 2001:470:1f05:844::/112, bits not changed but the ending :2:0 is lost. This is obviously a bug (or maybe a feature) of radvd. A workaround was to make the prefix length equal to the route length, here 112 bits. That's neither a bug nor a feature; it's a mess! But it works. And the clients ignore AdvAutonomous off and do RFC 4862 autoconfiguration: they truncate the prefix to 64 bits, append the EUI-64, and assign that address to the NIC. Which is a welcome development for me but is radically not RFC compliant.

Was the business with route truncation the cause of the SSH timeouts during banner exchange? As a test, I restored the failing route configuration on Xena, but left the routes correct on the target Iris. Again, SSH timed out during banner exchange. Xena's initial SYN packet went through Jacinth and reached Iris. Iris sends a SYN-ACK, repeated 5 times since Xena never replies. Jacinth sends the first one to Xena but not the duplicates. Xena sends a different packet to Iris, duplicated several times until the timeout, which Iris does not receive.

The route that should have attracted VPN traffic to be routed through Jacinth instead ran all traffic through Jacinth, including to on-link hosts. But Jacinth has the bridge between Wi-Fi and 802.3 wired Ethernet. There is a difference, though, between a packet routed on-link with the MAC address of the target (Iris), versus going via a router, with Jacinth's MAC address. The former packet is handled by the bridge, and is not destined for Jacinth at level 2, and everything works as if the source (Xena) were on wired Ethernet. But the latter lands on Jacinth, which has to make a routing decision. I never intended for Jacinth to handle this particular route, and either there are options that need to be set to make it work, or (more likely) the packet wanders through code paths that should never be used and which never work together reliably.

I'm concluding that the botched route probably did cause the timeouts because Jacinth sometimes forwarded the traffic and sometimes didn't. I can probably add a firewall rule that detects a packet going out the same net interface it came in on, and reject it with net unreachable, which will trigger a prompt failover to IPv4, improving useability.

Although the correct Router Advertisements were going out, the client (Xena) failed to install the advertised routes. This was caused, and fixed, by more than one setting in /proc/sys/net/ipv6.

/proc/sys/net/ipv6/conf/all/accept_ra_rt_info_max_plen: This value is the length of a route, and routes, received in Router Advertisements, longer than the value are tossed. (/usr/src/linux/Documentation/networking/ip-sysctl.txt lies, saying that routes equal to or longer than the value are tossed. See /usr/src/linux/net/ipv6/ndisc.c , look for in6_dev->cnf.accept_ra_rt_info_max_plen .) It was 0, the default, which means to accept the default route but none other. Changing it to 128 made that test accept all routes. The specific values of 0 for the various interfaces do not seem to be honored. But the routes still did not appear.
/proc/sys/net/ipv6/conf/*/accept_ra_rtr_pref: This boolean value (0 or 1) must be true to accept any routes (other than the default route?) from Router Advertisements. It was 1 for all, default and lo; 0 for wlan0 and eth0 (which should have followed the default configuration). Setting it to 1 for wlan0 allowed routes to be accepted. Except…
/proc/sys/net/ipv6/conf/*/accept_ra_defrtr: This boolean value (0 or 1) must be true to accept the default route from a Router Advertisement. it is set like accept_ra_rtr_pref: 1 for all, default and lo; 0 for wlan0 and eth0.
/proc/sys/net/ipv6/conf/*/accept_ra: This boolean value (0 or 1) must be true to accept Router Advertisements. For me it is set like accept_ra_rtr_pref: 1 for all, default and lo; 0 for wlan0 and eth0. I have a line in /etc/sysctl.conf to set it to 1 for all at boot, but something else -- the finger of blame points at NetworkManager -- changes it to 0 again.

I put a test in my Router Solicitation daemon that turns on these settings, and now the routes are accepted reliably.

Conclusion

Routing is kind of a black art on IPv6 because the services that send out the Router Advertisement packets, radvd and DNSMasq, appear to each have their own bugs that have to be worked around. However, it is possible to get a route design that radvd will send out without mangling it, and that the clients will accept and install.