Installing WireGuard

James F. Carter <jimc@jfcarter.net>, 2021-10-07

Some info about WireGuard, a new VPN:

Project website: www.wireguard.com
Inception 2015.
Lead developer: Jason A. Donenfeld. WireGuard and service marks are registered trademarks of Jason A. Donenfeld.
Website sponsors: ZX2C4 (Jason A. Donenfeld) and Edge Security.

I have just gone through yet another audit of my VPNs, making sure that they work for all relevant clients and that the vpn-tester program can competently report if they are or aren't working. Currently (2021) my two servers run StrongS/WAN IPSec (strongswan-ipsec-5.9.3 on SuSE Tumbleweed) and OpenVPN (openvpn-2.5.3 on SuSE Tumbleweed). The clients have Linux (same versions) and Android: strongSwan VPN Client (version 2.3.3, org.strongswan.android) and OpenVPN for Android (version 0.7.25, de.blinkt.openvpn). Both VPNs work well when properly configured, but they have a number of less than wonderful features:

The learning curve is steep for routing packets properly through the tunnel (of course excluding bearer packets) and for providing credentials in the required form to authenticate the two ends.
The network paradigm for IPSec is, if an IPv6 packet has an IPSec header, the corresponding Security Association has the information (crypto algo, key, etc.) which the kernel can use to decrypt it. The headers and payload thus revealed are than processed normally. Outgoing packets are selected by a Traffic Selector (different from normal routing) and the inverse transformation is performed, after which the encrypted packet is sent out through normal routing. IPv4 uses ESP and AH protocol packets instead. I found that it was often a challenge to get the Traffic Selector right, and it was also a challenge to extract the cert's Distinguished Name in the format Charon wants. (Using a SAN turns out to be easier.)
OpenVPN uses a tun/tap device, and payload packets pop out of it or are stuffed into it by normal routes, as if it were a physical net interface. It's a lot easier to handle routing in this context, which WireGuard shares.
IPSec connects fairly promptly, initially or after a net disruption, but OpenVPN takes several seconds to do this.
Both IPSec and OpenVPN have a lot of code in the packages: around 400,000 and 600,000 lines of code. This doesn't affect the end user directly, but I remeber a quote from Weitse Venema, the author of the Postfix mail transport agent: he says his code has about one bug per thousand lines, and if you introduce complexity (he was talking about TLS for mail transport) you should think about exploits against the bugs and accidental loss of service.

Responding to shortcomings in existing VPN software, Jason A. Donenfeld in 2015 began to develop WireGuard, a new VPN. The project website describes these features; whether they're scored as good or bad depends on the user's goals.

Drastically reduced complexity; features not absolutely essential were sacrificed for this goal. He claims only 4000 lines of code.
A lot fewer configurable aspects; for example there is [currently, 2024] only one symmetric crypto algo, ChaCha20Poly1305 by Daniel Bernstein, so no configuration and no negotiation with the peer. Negotiation is very complex for both IPSec and OpenVPN.
An ED25519 (Edwards Curve Diffie-Hellman) public key, locally generated, serves as both the authentication credential and the foundation of the tunnel's encryption, similar to the style of SSH.
WireGuard does UDP only, no TCP and no unusual protocols like ESP. Check out udptunnel and udp2raw for add-on layers if you need TCP (which I do).
Very fast connection setup: the initiator sends one handshake packet, the responder sends one back, whereupon they both can infer the symmetric keys to encrypt payloads. The CPU time needed to do this is obviously minimal.
The ChaCha20Poly1305 symmetric crypto algo (AEAD type) is faster than the competitors.
The protocol isn't chatty: the only packets sent are payloads and key establishment (and rekeying). You can configure keepalive packets (zero-length payloads) if your net needs them. The responder doesn't reapond to and doesn't expend resources on unauthorized initiators.

Some details of the Edwards Curve Diffie-Hellman key establishment procedure are interesting. See this Wikipedia article about ECDH, which I've summarized. See also the EdDSA (Edwards Curve Digital Signature Algorithm) article, the section on ED25519.

Parameters for ECDH are agreed on in advance; WireGuard has only one set of parameters built in. It uses a modular field. NSA guidance says that a field of size in the range of 2^256 is sufficient for protecting Top Secret data; that is, to crack the crypto the Black Hats would have to run billions of dollars worth of computers for a year or more to crack one key (and WireGuard re-keys about every 2 minutes). The actual modular field size is 2^255-19.
A private (secret) key for ECDH is a randomly chosen point in the modular field, basically a 255 bit random number excluding the 19 that won't fit. Call it S. For the public key, a number G is agreed on, and it is added to itself S times; that is, G is multiplied by S. Call the product Q.
An attacker can recover the private key by dividing Q by G. This would be easy if the operands were integers, but in the modular field, doing the division will need similar effort as in cracking a 128 bit symmetric key, half the ECDH bit length. This is the effort level currently considered adequate for protecting Top Secret data.
For each connection (or re-key), an ephemeral Diffie-Hellman key pair is created by each peer. Generating the private key would normally include one or more whitening steps through a pseudorandom generator, requiring a 256 bit multiplication, plus another multiplication for the public key, but this is a lot less effort than generating a prime factor key pair (RSA).
The initiator sends its static (permanent) public key and the ephemeral public key that it just generated; neither is encrypted, so the attackers know them. The responder sends back its ephemeral public key, but the initiator is supposed to have the responder's static public key in a configuration file. Each peer also sends a counter which prevents replay attacks, and an encrypted dummy payload which, if successfully decrypted, assures each peer that the other one holds the private keys that correspond to both public keys that it proffered or that was configured.
Each peer, for each of the static and ephemeral keys, multiplies the other end's public key by its own private key. Remember that the other end's public key is its private key times G, so the resulting product is the local private key times the other end's private key times G (but done in the opposite order at the other end). Since multiplication in modular rings (including fields) is commutative, both will get the same answer: the Diffie-Hellman shared secret. The peers hash up the shared secrets with an agreed-upon algo to produce the symmetric key which they will use to encrypt or decrypt payloads.
For authentication, the dummy payload includes a HMAC, or the symmetric encruption algo is AEAD type which includes a HMAC, so each end can tell authoritatively whether it decrypted the payload successfully, and if so, they know authoritatively that the other end used the private key corresponding to the public key that was proffered, in other words, they can be sure of the identity of the peer (unless its key was stolen).
For authorization, the initiator has the responder's static public key in a configuration file, so it can be sure that it is talking to the intended responder. The responder has a list of public keys of every initiator authorized to connect. It knows on the first packet who the initiator claims to be, and will only respond if that public key is on its list. The list can be added to or pruned on the fly by the provided wg utility, which would be called by out-of-band facilities that aren't part of WireGuard, analogous to the charon key management daemon of StrongS/WAN (IPsec) and related functions in OpenVPN.

What are my goals for the VPNs, and how much hassle will it be to make WireGuard deliver what I need, so I can add it to my collection?

Resistance to bit-rot: changes in the system configuration tend to have a bad effect on operation of the VPNs, and I hope WireGuard will be affected less than IPSec and OpenVPN.
While I always use UDP if feasible, avoiding the dreaded TCP Meltdown syndrome, I've found that hotel Wi-Fi often blocks UDP in general and VPN ports in particular. Thus I support and use TCP on port 443, which the Wi-Fi access points have got to pass. (Note: some authoritarian nations block VPN ports nationally. 443/TCP can bypass this, but the Secret Police could recognize that it was not normal HTTPS traffic, even if they can't decrypt the payload, with baleful consequences for the perpetrator.)
OpenVPN can multiplex VPN traffic and another protocol (e.g. HTTPS) on its listen port, so the VPN server can host a normal webserver as well. For WireGuard, it's probably better to not mess with protocol conversion or tunneling from TCP to UDP. Rather, 53/UDP for DNS is another port that the hotel weasels can't block, and I might try to have DNS listen only to localhost:53(UDP) while WireGuard listens to 0.0.0.0:53(UDP), ignoring occasional DNS packets.
There are four VPN routes that need to work:
- The segment tunnel, from the main router Jacinth to a host in the cloud. See A Server in the Clouds for what is being accomplished here. Basically the local net's default route for IPv6 goes out via the cloud host.
- Xena and Petra to Jacinth: Xena is my laptop, which roams, and Petra is a virtual machine on it for development. See Network for Xena's Virtual Machine. But how do other hosts know where to send packets destined for Xena and Petra? Solution: they always send via Jacinth, and Xena always initiates a VPN to Jacinth even when it's not roaming.
- Selen to Jacinth. Selen is a cellphone with Android, and it also roams. To get to the webmail and PIM server from off-site it needs a VPN. See Jimc's Awesome PIM Server for the packages I'm using. Unlike Xena, Selen would prefer to not use the VPN unless it is both roaming and using PIM, or is working with local LAN hosts, or is doing something for which privacy or integrity is critical.
- For testing the VPNs, two local LAN hosts are chosen (not necessarily the same every time, depending on which are up or down). Host A connects to Jacinth's VPN server and sends packets to B; the tester checks if the packets go direct (the NoVPN case) or via Jacinth, and whether their content is really encrypted. For this to work, every LAN host has to be able to connect to Jacinth's WireGuard. The test is done daily; most of the time the LAN host is not using WireGuard.
For authorization, VPNs can be set up two ways. In the historic design each connection has individual credentials installed, typically in the form of pre-shared symmetric keys. Modern versions, such as IPSec and OpenVPN (starting in version 2.x), install a credential (normally a X.509 certificate and private key) on the server and on each client; the client certs are all signed by one Certificate Authority (or intermediate cert) which the server requires in the client's trust chain. The server doesn't need the clients' certs individually. I don't really have a big herd of users and I can handle either arrangement, but X.509 certs are what I'm using now. Preview: for WireGuard, each connected pair of peers needs to be configured at each end, but the credentials are Edwards Curve Diffie-Hellman public keys of 255 bits (32 octets) that are also used for the crypto, not pre-shared symmetric keys.

There's an issue that makes a lot of trouble for designing a net with VPNs: some clients always use the VPN and some don't. I'm implicitly assuming a central server that all the clients work through. For the always VPN case the right routing setup is to assign the client's hostname to a fixed address on the VPN tunnel device. The same address is on the client's LAN interface, and by policy routing its peer(s) send bearer packets to that interface. Normally only the server would have that policy route, but there might also be multiple peers. The server always has, and advertises, a route to this client through its own VPN tunnel endpoint, and the rest of the LAN (non-peers) sends to the client via the server. My laptop and my cloud server operate this way.

The harder case is when the client sometimes uses the VPN, and sometimes doesn't, like my VPN tester and my cellphone. It's a total can of worms to set up a route via the server when the client connects, and to make this route go away when it disconnects, particularly when other LAN members need to originate connections to the VPN client, directly or via the server, depending. The way I'm handling this on the other VPNs is, the client's name is assigned to a fixed IP on its egress interface: Wi-Fi or Ethernet. Peers on the LAN connect to this non-VPN address, except that isn't possible if the cellphone is roaming (because peers don't know the cellular assigned address and I'm not going to mess with dynamic DNS on my cellphone). When the client turns on the VPN, it puts a separate IP address on the VPN endpoint, and the server has a permanent route to this address (or pool) via its own VPN endpoint, which it advertises all the time. Other LAN hosts can originate connections to the client's VPN address, but only when the client and its peer have the VPN turned on.

Let's make this design into something a little more concrete that I can turn into a WireGuard conf file.

At the moment this design is evolving and this section might differ in some points from the previous one.
There are two potential VPN servers: Jacinth, the main router, and Surya, the cloud server. At present Jacinth is the only one used, and Surya is in a standby or hot spare configuration, acting like a client. When we change locations, Jacinth will be packed up and Surya will become the the VPN server that the always VPN clients use.
At present, the possibility of VPN connections is configured statically: you have to edit configuration files to turn the VPN on and off between a host and the VPN server. This implies that sometimes VPN is a misnomer; all hosts (except the never VPN category, e.g. the Roomba cleaning robot) are always able to send encrypted packets to the server. Whether they use this opportunity is another matter.
The three always VPN hosts have small differences in their configurations. Xena, the laptop, has its own address on its internal subnet, and its wild side interface has whatever address it gets from the net it's connected to: preassigned on the LAN, but not predictable when roaming, and I'm not going to mess with dynamic DNS for this address. Xena sends all CouchNet traffic (except to its own VM) through the VPN to the server (and everything else via whatever DHCP default route). Since Xena's address is not on the LAN, clients on the LAN wishing to send to Xena will go via their default route, which is Jacinth, which has a route to reach Xena's internal subnet via the VPN. Jacinth's WireGuard sends bearer packets from Jacinth to whatever IP Xena is currently sending from (wild side or local LAN), and Xena's WireGuard sends bearer packets to whatever Jacinth is using; its wild-side address occasionally changes because we don't have a fixed IP for it, and DNS is updated dynamically, but there's a short period while the TTL times out, and the WireGuard kernel module doesn't consult DNS anyway.
Surya, the cloud server, has its own address, not on the LAN, on the VPN interface. Like Xena, it sends all CouchNet traffic through the VPN, and everything else out its wild side. With OpenVPN as the VPN, Jacinth acts as a client and initiates the connection. With WireGuard, Jacinth's IP is aleatory (from the carrier's DHCP), so Surya can't initiate to it; Jacinth will send the first packet to Surya's fixed IP. After that, dynamic DNS is used to update the published IP of Jacinth.
Selen, the cellphone, puts its LAN address on its VPN interface, while the wild side address is from the carrier's (or CouchNet's) DHCP, so when it's not roaming, both addresses would be equal. Like the others, CouchNet traffic goes over the VPN and everything else goes as plaintext (or intrinsic crypto for HTTPS, SSH, etc) out the wild-side interface. CouchNet DHCP pushes a special route so LAN clients will send to Selen via Jacinth.
Rarely, Xena and Selen need to send wild-side traffic through the server. This will be handled by swapping to a configuration that substitutes the VPN as the client's default route. A split 1-bit route automates turning it off: route add 0.0.0.0/1 dev wg0; and similarly for 128.0.0.0/1, ::/1, 8000::/1.
So what happens to the sometimes VPN clients? All are on the LAN. Their own hostname resolves to a LAN address that goes on their LAN interface, and peers can always initiate a connection to it, using their prefix route to the LAN, or off-LAN peers will transmit via their generic route back to the LAN. The sometimes VPN client also has an address and name in a separate subnet, which goes on its WireGuard interface. Bearer packets are exchanged with the server via the LAN interface and address; the server always has a Peer configuration for every host. When there's a special task that needs the VPN, like a VPN test, the client adds a route to its peer (or subnet) via its WireGuard interface, the packets will go to the VPN server, and the server will forward them to the peer, not using WireGuard unless the peer is in the always VPN category (and reversedly for the replies).
----
*The clients' WireGuard interfaces have fixed IPs in a range which is disjoint from the local LAN. That is, my assigned address ranges (IPv4+6) have one subnet to be the local LAN, and other subnets for the various VPNs to be on. The clients also have fixed IPs in a different subnet for non-VPN traffic like bearer packets. If the client is roaming it will send bearer packets from a wild-side address assigned by the carrier, which changes when the phone changes protocol (LTE, UMTS, Edge).
*If the client uses WireGuard all the time, LAN peers send to its fixed address on WireGuard, because that's what its hostname resolves to. If the client uses WireGuard some of the time, peers send to its non-VPN (ethernet or Wi-Fi) fixed address, which the hostname resolves to.
*The main router, called Jacinth, has the IPv4 default route to the wild side and the tunnel for IPv6 (no native IPv6 yet on Verizon or Frontier FIOS, hiss, boo), and the server instances of all the VPNs.
*The rest of the local LAN hosts use Jacinth as their default route; thus if they need to send a packet to a VPN client's endpoint address, they automatically send it via Jacinth. While some LAN hosts have static routes, Jacinth's DHCP and radvd announce a default route (IPv4+6) through Jacinth. This is the actual implementation of the route advertisements mentioned earlier.
*Jacinth has AllowedIPs and matching routes (courtesy of wg-quick) that send and accept traffic to/from each client's WireGuard fixed IP. This is set up at boot time when WireGuard is started, and is supposed to continue forever. If the client has a subnet inside, e.g. my laptop and its VM, the subnet is configured aa a special case.
*The client at the very least needs an AllowedIP and route to the server's WireGuard endpoint address. Some clients will want to send their default routes through the tunnel, but in my use case it turns out that the cloud server, the laptop and the cellphone want traffic to the local LAN (plus other VPN clients and the server) to go through the tunnel, but wild-side traffic should go via their default route on the wild side.
*Bearer packets, i.e. the encrypted VPN payloads, need their own route, because they can't go through the tunnel that they are bearing. OpenVPN creates a special route sending traffic destined for the server (i.e. bearer packets) directly there. However, other traffic like SSH and HTTP(S) also goes direct, causing endless customer support issues and information leakage (for the insecure protocols). For WireGuard, if the policy routing table is configured, and if the default route is sent through the tunnel, wg-quick can create a policy route that diverts only the outgoing bearer packets to that table, into which the original default route is transplanted.
*However, that's not quite what I'm doing; my clients don't send the default route through the tunnel except for special hacks. First I save the route that the bearer packets would take before WireGuard messes with routes. After wg-quick sets up the routes, I re-determine the route of the bearer packets, and if they are being sent down the tunnel I divert them back to the original route, copying the method that wg-quick would have used if the default route had been sent down the tunnel.
The server can staticly set up the policy routes to send via normal routing any bearer packets addressed to the client's local IP. The right way to do this is inside-out, i.e. everything that's addressed to the client's IP and isn't a bearer packet is flipped into the policy routing alternate table, which routes such packets down the tunnel. I believe that one WireGuard interface can distinguish packets addressed to members of a set of clients and can send them to the correct one. Bearer packets are not flipped in and are routed as if WireGuard were not operating; they would be sent out to the wild side if the client has connected from the wild side, i.e. is roaming. WireGuard on the server will accept a connection from any IP as long as a known public key is presented, and as long as the packets get through the firewall (normally promiscuous on a server).
There are several detail issues that bit me:
- If the client is configured with the alphabetic hostname of the server, wg-quick will resolve that name and will prefer the IPv6 address. But the client, if roaming, probably can't connect on IPv6. Cure #1: use the server's fixed IP4 address. But my server is residential and its wild-side address is aleatory, though it lasts several weeks before changing. Cure #2: a wrapper script will resolve the endpoint name the way I want and insert the IPv4 address in the configuration file, much like wg-quick removes its commands when doing wg setconf.
- When the client is actually not roaming, but even so sends bearer packets to the server's wild side address, WireGuard replies to them with the wild side address as the source (correct), but uses that source to determine the interface to send from, which is on the wild side, where the client isn't. Other daemons use normal routing to send from the interface that can reach the client. Actually when the client's IPv6 address is replied to, the packets are not lost, but are sent to the cloud server, which sends them back on the segment tunnel (taking 30 msec for the round trip), whereupon normal routing on the server gets them to the client. But this kind of routing is not possible with IPv4.
  Cure #3: the wrapper will have to switch to the server's LAN address when the client is not roaming, and WireGuard will have to be restarted when it changes between home and roaming.
- When the wrapper or wg-quick does DNS domain name resolution, the client needs a non-VPN address and interface that the server (or other DNS source) can send to without involving the WireGuard tunnel that hasn't been established yet.
Conclusion after a fair amont of testing: If WireGuard is up on both Jacinth and the client (Oso, on the LAN), and if the client configures Jacinth's wild side address (IPv4) as the endpoint, then communication with osowg (the WireGuard addresses, IPv4+6) from LAN clients including Jacinth is perfect with no dropped packets. Jacinth is sending bearer packets from its LAN (not wild) address, and the client updates its peer endpoint to this address, as the docs describe. I previously tested setting Oso's peer endpoint to Jacinth's IPv6 wild address and got some weird behavior involving sending bearer packets to the wild side; I need to check if this is still happening. Communication tests included ping, ssh and w3m (web), IPv4+6. I have SSHFP set up for Oso, but osowg is not considered to be the same host, and needs either a known_hosts entry or its own SSHFP records.

For the symmetric cipher on the main channel, WireGuard uses only ChaCha20Poly1305, for which hardware acceleration is very rare. On the Intel Core® i5-10210U, jimc's tests score it as half as fast as hardware accelerated AES-256 (Rijndael), and twice as fast as software AES-256. This difference would only be significant for a server with thousands of clients.

https://www.wireguard.com/quickstart/

ip link add dev wg0 type wireguard  #Pick a name for the tunnel device
ip address add dev wg0 192.168.2.1/24 [ peer 192.168.2.2 ] if only 1 peer
wg setconf wg0 myconfig.conf   (wg utility is provided) --or--
wg set wg0 listen-port 51820 private-key /path/to/private-key peer $itsname \
    allowed-ips 192.168.88.0/24 endpoint 209.202.254.14:8172
ip link set up dev wg0

wg (with no args) is equiv to wg show (for all interfaces e.g. wg0) wg-quick [up|down|etc] ctlfile

Wireguard wants ECDH (Edwards Curve Diffie-Hellman) private and public keys; each is 255 bits (32 bytes) long, or 43 bytes base64 encoded. The configuraton file may contain the base64 key itself, or the name of a file containing it. The provided wg utility can generate them for yous, like this:
wg genkey | tee privatekey | wg pubkey > publickey

Wireguard does not use X.509 certificates to authenticate/authorize the peers; authorized keys are preinstalled for each client-server pair. But they can be installed on the fly by wg.

You may test with their demo server.

So let's try to set something up. For testing, I'm starting this at 2021-10-07 18:00. I'm going to use these basic steps:

Read the Quick Start Guide.
Make sure there's a client for Android. Install it first but don't try to use it yet.
Install the software on Surya, the cloud server, and Petra, the laptop's development machine.
Understand and compose their configuration files. Generate ECDH key pairs, base64 encoded.
For each host, connect to WireGuard's test server. Fix bugs.
Try to connect Petra to Surya. This will require assigning a new address range for client addresses. Fix bugs.
Install a WireGuard server on Jacinth, the main router.
To my vpn-tester script add a section for testing WireGuard. Fix bugs that the tester discovers.
Configure the Xena to Jacinth VPN, using the newly installed NetworkManager plugin for WireGuard. Get Xena back on the net.
Configure and test the Android client. Fix bugs.
Switch the segment tunnel (Jacinth to Surya) from OpenVPN to WireGuard. Fix bugs.
Finish this writeup.

Android Client

Make sure there's a client for Android. Install it first but don't try to use it yet. Yes there is one, called WireGuard, with the serpent logo (®). Inception 2019-10-13, most recent update 11 days ago, 5e5 downloads, offered by WireGuard Development Team. You could import a configuration from a file, or a QR code (!), or create it by hand. I looked at the required info but didn't create my connection. 7 mins including reading the product info. https://wiki.archlinux.org/title/WireGuard

How to get the QR code that the Android client can import. This is from the Arch Linux wiki article about WireGuard. On the Linux desktop host that has the conf file:
qrencode -o outfile -t ansiutf8 -r client.conf
If you omit -o outfile or specify -o - the result is on standard output, and if this is a terminal that can display ANSI UTF-8 characters (see the -t option), the QR code itself becomes visible. You may need to make the window wider and/or higher to avoid wrapping lines. Suppress long comments; the maximum size is 4000 characters. qrencode is from package qrencode on OpenSuSE Tumbleweed.

Install on Surya and Petra Xena

The required kernel module is called wireguard.ko and it is in the standard kernel, version 5.14.11 and likely quite a bit earlier. To pass configuration information to it (plus displaying connection info and generating keys) you need wireguard-tools (current version as this is written is 1.0.20210914) from the OpenSuSE Tumbleweed main distro. Older versions are available for Leap 15.3 and 15.2. 72Kb to download, 145Kb installed. No dependent packages; it only requires systemd and libc. The package only contains the wg and wg-quick commands, and documentation.

wg-quick is a wrapper around wg for simple configurations. When either command is given just an interface name such as wg0, the corresponding configuration file is sought in /etc/wireguard/wg0.conf, whereas if an absolute pathname is given the interface is inferred from the basename of the conf file. The interface name may be up to 15 bytes of [a-zA-Z0-9_=+.-] . (You don't specify the interface name inside the conf file.)

On Xena I also installed NetworkManager-wireguard plus NetworkManager-wireguard-gnome (you need both for the GUI). These are experimental packages, not in the main distro. Find them with the SuSE package searcher. Depends on wireguard-tools. Most likely you don't have the developer's package signing public key; either get it, or ignore Zypper's security warning.

About 20min to install the packages and read the man pages.

Configuration Files and Key Pairs

A prerequisite is, what port am I going to use? WireGuard doesn't have an IANA port assignment, but documentation often shows 51280 and forum posts and howto's usually show this one. But this port range (all above 32768) is for aleatory ports, and a collision could occur. The BSD Daemon whispered in my ear that since OpenVPN has 1197 assigned, WireGuard should use xx96. Unassigned and stealable port numbers are 2196 4196 4296 4496 4696 4796 4896 4996 5096 and most candidates above this. 42xx is completely vacant and appears to be intended for private use, and I have a local policy to put nonstandard ports in this range, so 4296 is what I will use. I will need to set my firewall to pass 4296/udp in the same cases as it passes 1197/udp.

On the other hand, for the initial tests (that might fail) I don't want to mess with the firewall, so I'll use 4886, the unofficial wakeup port for Android, which my firewall passes from+to the local LAN so the Android hosts can wake each other up.

Here is the client's configuration file for testing. See the genkey subcommand of wg for producing your keys. The conf file contains your private key (not encrypted), so it should have appropriately restrictive permissions, mode 600. /etc/wireguard is insalled with mode 700, but I set the individual conf files to 600 anyway. See the man page for wg for a small number of additional configurable parameters such as the keepalive interval, if your net needs it.

[Interface]
PrivateKey = qwerty...=		# 43 base64 bytes, about 256 bits.  Keep the =.
ListenPort = 4886		# Android wakeup port, which my firewall 
				# allows, but I'll have to change this later.

[Peer]
PublicKey = asdfgh...=		# 43 base64 bytes, about 256 bits.
Endpoint = [2600:3c01:e000:306::8:1]:4886	# IPv6 in [], port after colon
AllowedIPs = 147.75.79.213/32,2604:1380:1:4d00::5/128	# www.zx2c4.com.
# There can be multiple peers.

About 25min + to figure out the conf file.

Xena and Surya to Test Server

Starting about 16:10

The SuSE package wireguard-tools does not include the scripts mentioned in the quick start guide for contacting the demo server.

Download the source:
git clone https://git.zx2c4.com/wireguard-tools
639Kb to download.
In ./wireguard-tools/contrib you will find the scripts mentioned in the quick start guide.
Execute: ./wireguard-tools/contrib/ncat-client-server/client.sh
Oops, routing problem. Testing with ping (IPv4). Petra to Xena to Holly (Wi-Fi) to Jacinth, but packets don't leave Jacinth. Pings originating on Xena are answered. Fixing Petra's routing again is not part of this project, so I'm installing WireGuard on Xena and testing from there.
Executing the script: in a lot less than a second it starts the VPN to the demo subnet. A route has appeared:
192.168.4.0/24 dev wg0 proto kernel scope link src 192.168.4.3
Test activities (in a separate window): ping 192.168.4.1 (answers ping). w3m http://192.168.4.1/ (page tells you you've successfully configured WireGuard).
So how do you disconnect? ip link del dev wg0
WireGuard isn't vaporware; it actually works.

When wg is used to bring up the connection, it loads the wireguard kernel module, nine crypto modules (that the documentation says it actually uses), udp_tunnel and ip6_udp_tunnel.

Debugging Petra's networking took extra time, but once I switched to test on Xena it took about 10 minutes to turn on WireGuard and do the tests.

I repeated these steps on Surya. The two test activities succeeded.

Everywhere, Install Software, Keys and Conf File

Given how my VPN tester is designed, it's a whole lot easier if every host has WireGuard installed, specifically wireguard-tools. Doing that now.

OpenVPN and StrongS/WAN assign the client an IP address from a pool, similar to DHCP. But my tunnels are very predictable, so I pre-assigned IPs to potential WireGuard participants, all on the same subnet. Instead, I'm making new address ranges for WireGuard tunnel endpoints: 192.9.200.112/28 (16 addresses) and 2600:3c01:e000:306::9:0/112. The addresses are assigned according to a pattern, but most likely I will get them into /etc/hosts soon.

Each host gets a key pair and a generic conf file with Jacinth as its peer (server) (except Jacinth itself).

Connect Xena to Surya

This turned into a long and time-consuming learning experience. I'm condensing a lot of failures and listing the high points:

The Quick Start guide is written for a client using wg-quick to control the interface. To the sample conf file shown under Configuration Files I added an Address line (just on Surya, for now); the value is a comma separated list of the IPv4 and IPv6 addresses to be assigned to Surya's wg0.

Starting up first on Surya: wg-quick up wg0
It prints the commands it is executing.

[#] ip link add wg0 type wireguard
[#] wg setconf wg0 /dev/fd/63 # It's feeding wg0.conf minus Address etc.
[#] ip -4 address add 192.9.200.118 dev wg0
[#] ip -6 address add 2600:3c01:e000:306::9:8 dev wg0
[#] ip link set mtu 1420 up dev wg0
[#] ip -6 route add 2600:3c01:e000:306::7:0/112 dev wg0
[#] ip -4 route add 192.9.200.176/29 dev wg0

I captured them into a script wg0.up. The routes to Xena (the last 2 lines) needed a lower metric than the existing ones via Jacinth, reached through the OpenVPN segment tunnel.

I did analogous setup at Xena's end.

Now here's a nasty issue which I didn't solve in this step: this all looks fine assuming Xena and Surya have their WireGuard connections running. But suppose Xena is connected somewhere else, like Jacinth where it's supposed to be? How does Surya know to not route via wg0?
Starting the tests: no communication. For several days. Payload packets departed from Xena; bearer packets left Xena and arrived on Surya; payload packets were decrupted on Surya and were emitted from wg0; and they weren't answered. The reason was that the firewall needed to be told that wg0 was a tunnel with security implications similar to being on the local LAN, not a minion of the global hacking community. The payloads were reported by tcpdump, and then hit the iptables rules in the firewall, and were tossed. With that fixed, I was able to ping in both directions between Xena and Surya.
This confirms my reading of the man page for wg: for each peer (here Xena), AllowedIPs is a list of subnets, packets from that peer whose source address is in that range are allowed by Surya's WireGuard to emerge from wg0, and packets on Surya routed to wg0 because their destination address is in that range will be internally routed to that peer and not to some other peer connected at the same time. This is similar to an Iroute in OpenVPN.
This interpretation implies that the IP that the peer is using on its wg0 has to be inside the AllowedIPs address range, and the IPs of the other peers have to be outside. If there's a subnet that the peer expects to use the tunnel, as on Xena, it has to be in AllowedIPs. If other hosts on the local LAN expect to connect to this peer, they need to use an address in AllowedIPs and they need to route the traffic via the server (Surya).
It remains to be seen whether two of Surya's peers connected at the same time can talk to each other, and where the hairpin routing occurs: before or after emission from wg0. Both OpenVPN and IPSec can do this.
Documentation for a robotics class at the high school level. The organization is the FRC 3512 Software Team, based in Orcutt, California, USA, a little north of Vandenberg Air Force Base and the Diablo Canyon nuclear power plant. (Author and date are not obvious.) They show the WireGuard configuration file that the students are supposed to use on their at-home clients. Under [Interface] the Address (for wg-quick) is assigned uniquely per student. Under [Peer] the endpoint has an alphabetic hostname (and numeric port). The AllowedIPs are the address range of the student clients (I think the key occupant is the gateway that lets them off that subnet), and the subnet that contains the servers, VMs, etc. that they're supposed to learn to use. According to them, when the client reports required key not available, it means that you sent down the tunnel a packet to an address that the peer's AllowedIPs did not include, which the peer reported by an ICMP packet coded for Destination Host Unreachable (which is not a lie).
The key lesson for jime is, at each end, AllowedIPs (describing the peer, the other end) has to include the address(es) on the peer's tunnel device, from which outgoing traffic is sent; otherwise this end will reject traffic from the peer.

Install on Jacinth

I wrote a script to generate conf files and up scripts on each host. It follows the design plans for the special features on particular hosts. This way, issues are not forgotten and chewed-up configurations can be regenerated at will. All hosts now have their proper keys, configurations and up scripts. Petra to Jacinth: no response. Claude to Jacinth: Routes: 192/26 dev en0; 128/25 dev wg0; to Surya, pings to $pfx::8:2 are answered but not to $pfx::8:1 Xena to Claude: IPv6 only. Ditto Surya Jacinth + Iris to Claude: pings IPv4+6 Can't tell if offsite connections are dnatted to Claude via WG or vnet0. Holly to Jacinth: pinging claude diamond iris jacinth via main LAN: works pinging petra xena surya via WireGuard: no answer. xena->holly trcr -6: ov_u_j.cft.ca.us (1:1), holly (i.e. via WG) xena->holly trcr -4: ov_u_j.cft.ca.us (129), nothing thereafter. IPv4 on Jacinth sends this via br0. Got to implement "if client is using WG, route to it; if not, route via br0". Method 1: every bearer packet on the WG port of type 1 (content inspection) is cloned with mirred to some netlink socket.

Oso to Jacinth VPN

I have two types of clients: those that always use WireGuard, and those that sometimes use WireGuard. To deal with routing issues, Xena <-> Jacinth and Jacinth <-> Surya always need the VPN, whereas Selen (Android) uses it only when roaming (and when access to the local LAN is wanted). The latter scenario is the natural one for OpenVPN and IPSec, so I've been focused on that so far, but making it work is going to be hard with WireGuard, so I've decided to switch over to the always on paradigm, at least at first. Xena, Jacinth and Surya are the most important hosts on my net, and it's not acceptable to knock them out with VPN experiments. Among my other VM's, Claude (the webserver) is also mission-critical, and Petra is hosted on Xena and is affected by its networking. So to get this project moving, I revived a disused VM called Oso, hosted on Iris (a leaf node) with bridge networking, so it is effectively an independent leaf node.

For the first try I'm going to have, for each client, an individual interface (wg-$PEER) with individual addresses from 192.9.200.96/28 and 2600:3c01:e000:306::10:0/112. Later I'll try doing the tunnels on a shared interface like I originally planned.

For the first try on Oso I set up Oso with AllowedIPs = 192.9.200.106, 2600:3c01:e000:306::10:10 (just Jacinth's WireGuard interface addresses for Oso), and Jacinth had AllowedIPs = 192.9.200.122, 2600:3c01:e000:306::9:10 (Oso's WireGuard interface addresses). Oso's firewall was rejecting bearer packets on 4296/udp. This fixed, I could ping the peer's interface addresses, both families, both directions.

Next try is to add Oso's own addresses to AllowedIPs on Jacinth, and just Xena's subnet on Oso. For reconfiguring I'm going to take down WireGuard on both ends first, rather than trying to run wg-quick with a running configuration, since I'm expecting trouble on this one. Yes, Jacinth and Oso can't ping each other, because Jacinth tries to send the bearer packets to Oso via the tunnel that they're bearing. wg-quick has a limited ability to activate policy routing for the bearer packets, but this configuration is not recognized as needing it.

Next try: Jacinth AllowedIPs = Oso WG addresses + 2600:3c01:e000:306::d4/128 (Oso's own IP); Oso is unchanged with Jacinth's WG addresses + Xena subnet. Jacinth can ping all the Oso AllowedIPs mentioned, So can Oso. Xena and Petra can ping Oso's IPv4+6 WG address, but Xena needs to specify its public IP in the -I option of ping (source address) because that's what's in the AllowedIPs on Oso, vs. the endpoint of Xena's tunnel to Jacinth. For traceroute this would be the -s option.

Next try: a script that implements the Wireguard Evolution item for bearer packets down the tunnel. Trying it first on Oso. It works, but didn't solve my problems.

Here are the key principles that I finally worked out, for making a configuration file that gets the packets through.

All participating hosts need a fixed IP address (IPv4+6) that will go on the WireGuard interface, which the other end has to designate as AllowedIPs, by number. This is the Address parameter in the Interface section. On hosts that always use WireGuard, the host's own name will normally resolve to this number.
All participating hosts need another fixed IP which does not go through the VPN, to which bearer packets will be addressed, or which will be the source address of outgoing bearer packets. A host that often omits the VPN should use this number as the referent of its own hostname. The other end will configure this number as the peer's Endpoint. If a host, e.g. the server, never initiates a connection to this peer, specifying the Endpoint is optional and does not have to be accurate, e.g. if the client is roaming, it's impossible for the server to know in advance the client's IP.
I use port 4296 as the ListenPort and Endpoint port on all hosts, to simplify maintenance of the configuration files, and debugging. This should be changed to the official IANA assignment, if one ever materializes.
In addition, the Interface section needs the PrivateKey of this host as a string (44 bytes including ending padding with one '='). The Peer section needs the peer's PublicKey. The configuration file needs to not be publicly readable, to protect the private key. (I wish the private and public keys could be read out of files with mode 600, obviating the restrictive permissions on the conf file, but though this may have been supported in the past, it's not supported now (wireguard-tools-1.0.20210914).)
The server's Peer section needs AllowedIPs for the peer's WireGuard address(es), by number. Omit the CIDR bits; this is a host route. If the client is routing for a subnet (like Xena which has a VM, Petra), the server needs to allow the subnet also (with CIDR bits). The non-VPN address on the client must not be an AllowedIP because otherwise bearer packets would be sent down the tunnel that they are bearing; this caused me endless grief.
In the client's Peer section I put AllowedIPs for my LAN address range, including other VPNs' endpoints but excluding WireGuard addresses. This actually worked and didn't need policy routing; bearer packets did not go down the tunnel. Many people's use case involves AllowedIPs for the default route 0.0.0.0/0 and ::/0 but that's not what I'm doing.

Xena to Jacinth VPN

Using the newly installed NetworkManager plugin for WireGuard. Get Xena back on the net.

What's required by the WireGuard plugin:

Choose a Connection Type: WireGuard is listed under Tunnels, not VPNs. When you think about it, WireGuard's facilities are really more akin to other tunnel types than to OpenVPN or IPSec.
Connection Name: wg2jacinth. In theory in the future I might have a connection to some other service.
Interface Name: wg0
Private Key (Xena's), base64 encoded. Make it available for all users.
Listen Port: 4256 (default is automatic, i.e. random per connection).
FWmark: off (is the default)
MTU: automatic (default)
Add peer routes: Yes (checkmark)
Peers: just one, Jacinth.
- Public key (base64 encoded)
- Allowed IPs: 0.0.0.0/1, 128.0.0.0/1, ::/1, 8000::/1
- Endpoint: jacinth.jfcarter.net Must be numeric. This is going to be a problem.
- Preshared key and keepalive: leave blank/off.

Let's think about a design that will apply to all hosts. As set up now on Xena only, /etc/NetworkManager/dispatcher.d/ includes my script that starts OpenVPN. All these acripts are run whenever interfaces go up or down and the scripts decide if they need to do something about it. This script would have to change to bring up WireGuard. Let's concentrate on a design for the other hosts, then adapt it to work with NetworkManager.

There will be a systemd service that brings up WireGuard, using different configuration files for leaf nodes, for Xena, and for servers (Jacinth and Surya). Likely the service unit iteelf can be the same for all.
The WireGuard unit will start after network.target, same as OpenVPN and Strongswan.
As part of startup or reload, the configuration file will be regenerated, if any IP addresses have changed. Jacinth's wild side IPv4 address is aleatory (DHCP) and changes every few weeks. (Some ISPs change the IP daily.) The rest never change.
For leaf nodes (and the others are similar), the configuration will have:
- Interface address: The host's WireGuard addresses in 192.9.200.118/28 and 2600:3c01:e000:306::9:0/118.
- Peer endpoints: Jacinth's local LAN IPv4+6 addresses and Surya's wild side IPv4+6 addresses. These are not the WireGuard addresses; they are the non-WireGuard addresses. Traffic between the local LAN and Surya will go through the segment tunnel.
- AllowedIp: WireGuard address ranges. Bearer packets are sent to the server peer's endpoint, which is its non-WireGuard address, not down the tunnel.
- This means that any other leaf node (local LAN) can send to this one's WireGuard address via one or the other server, the server will forward the traffic, and this host's reply will go back the same way. This is what the VPN tester needs.
- How do we control the choice between Jacinth and Surya as servers? Route metrics, favoring Jacinth for IPv4 and Surya for IPv6. This means Surya IPv4 and Jacinth IPv6 can't be tested. Unless the tester is allowed to set the route metric.
Xena is like a generic leaf node, with these exceptions:
- Its peer endpoint is Jacinth's IPv4 wild side, for use when roaming, and Jacinth's local IPv6, which is accessible from the wild side if the firewall permits, which it does. It either doesn't have Surya at all, or it has some route jiggering so Surya is never used as the server.
- On Xena, the server peer stanza has AllowedIPs (and corresponding outgoing routes) to send the whole CouchNet address range through the tunnel, whether or not it is roaming. The server peer endpoints are outside the CouchNet address range, so policy routing is not needed to keep bearer packets out of the tunnel.
- IPv6 traffic (payloads) from the wild side to Xena's WireGuard address or to its internal subnet (which is the one normally used) is routed via Linode to Surya and from there through the segment tunnel to Jacinth which routes it via the tunnel to Xena. IPv4 payloads (only from the local LAN) go direct to Jacinth which routes them to Xena. Bearer packets are addressed from and to Xena's non-WireGuard address (or if roaming, from the address assigned by the foreign carrier), and from the wild side they similarly go via Jacinth. The firewalls on all involved hosts pass VPN ports including WireGuard.
- It's very rare for Xena or Selen roaming on the wild side to be able to use IPv6.
The two servers are configured like leaf nodes, except that they have all the leaf nodes, plus each other, as peers.
Ports open to the wild side:
- All hosts: WireGuard (4296), DHCP (bootps, dhcpv6-server); the ISP's or LAN's DNCP server can respond to a client's request, and send back an address for it to use, and the client will let in the packet.
- Servers only: isakmp, ipsec-nat-t, openvpn, pptp, protocols ESP, AH, GRE. Also OOBA (out of band authentication) allowing an authorized user to let his IP through the firewall to normal ports.
WireGuard is recognized in /etc/firewallJ.d/wild-acc-C5.

What files and scripts do we need to make this all work?

Most of these will be in /etc/wireguard/.
The WireGuard service unit itself, on all hosts, in /etc/systemd/system/.
A table of Peer stanzas, to be selectively included in the configuration file, wg0.conf . It will be hosted on Jacinth. Retrieval won't be fancy; it will reside in /home/httpd/htdocs and all hosts will retrieve it with curl.
The program that creates that table, on Jacinth. It will monitor the Netlink stream and is immediately aware when Jacinth's wild IPv4 address changes. It will send a magic packet to all clients. It doesn't send the content itself, which would be a security problem, whereas spurious magic packets wuuld cause trivial extra work on the client but could not introduce a tampered peer table.
The socket activated service on the client which imports the peer table, generates the new conf file, and reloads WireGuard if actually changed. It should also be run at boot time (before WireGuard starts) and on a timer, in case the magic packet gets lost.

WireGuard has a big problem if the client sometimes has WireGuard running, and sometimes expects to be contacted on the local LAN. Think of the backup host collecting changed files from the client. Basically, the sysop needs to direct the backup collector to the client's WireGuard or non-WireGuard address, depending on whether or not it's off-site and using the VPN.

Configure and Test Android Client

Vpn-Tester to Test WireGuard

Segment Tunnel

This is the tunnel from Jacinth to Surya, currently operated by OpenVPN. Jacinth originates it and Surya acts as a listening server or responder. But if bearer packets go down the tunnel this is a chicken and egg issue and you end up with an omelet. So OpenVPN has an anti-iroute so any packet on the initiator's end addressed to the interface used by the OpenVPN listener will go out by normal routing, not the tunnel, and Surya's firewall will reject it. The anti-iroute is not restricted to OpenVPN's port number; all ports are blocked, such as traffic to the webserver listening on Surya's wild side (www.jfcarter.net).

The WireGuard deployment campaign in 2021 got preempted, but an issue has arisen which returns WireGuard to the front burner. Specifically, my family is moving to Washington state to be closer to our son, and the master site Jacinth is going to be in a packing box for an unknown time. But Surya, our cloud server, won't be in any packing box. Therefore I'm going to transplant a lot of the server software, specifically our public webserver and site, to Surya. But before and after the move when the local LAN is functioning, LAN hosts still need to get to the webserver and other services on Surya. (Yes, there are slave servers too.) I was solving this by creating an ipip or Geneve tunnel within the segment tunnel, but adding kludge upon kludge is not the way to go; the right solution is to finish the WireGuard deployment and to divert bearer packets off the segment tunnel and onto the default route.

Further design and details are below.

Redesigning the WireGuard Deployment

The design reqirements are:

Every CouchNet host must be able to initiate a WireGuard tunnel (bidirectional) to either of two listening responders (servers): Jacinth or Surya. If possible, future any to any connections should not be blocked in the design.
Some clients with a WireGuard tunnel must be able to initiate through it a service connection (http, ssh, dns, etc.) to any LAN host, and LAN hosts must be able to initiate in the other direction. There is a coincidental correlation between roaming, the need for both outward and inward connectivity, and needing the tunnel to be up all the time.
Other clients only occasionally need the tunnel. They will initiate service connections, but incoming connections will be rare, and it's not a showstopper if I can't make them work.
There should be automation and app support so the client's tunnel turns on automatically, or if it's not supposed to be on all the time, so turning it on is simple for the user.

After a rather aggressive struggle to get WireGuard working, I learned these points:

I will need only one WireGuard interface per host, conventionally called wg0. It knows which outgoing packets should be sent to which peer. Multiple interfaces may be useful in complicated designs, but it looks to me that one interface will almost always be enough.
In the past I have thought of giving the host one IP to which peers send bearer packets, and another IP to which remote hosts (including the peer) send packets through the tunnel. A big mess ensues if you send bearer packets through the tunnel that they are bearing. However it's a lot simpler and more comprehensible if the same IP is used for bearer packets and for payload services like a webserver. In this case, policy routing (described further below) can keep out the bearer packets.
Each host with WireGuard needs a configuration file with a Peer stanza for each other host that it can communicate with. The peer's public key is required; this is the authentication and authorization token that lets in the peer, and also the foundation of symmetric encryption on the tunnel. The Endpoint (IP and port) is optional for the responder, which sends reply packets to the IP and port that the peer most recently used. But the Endpoint is required for the initiator, which otherwise would have no idea what IP and port to send the initial packet to.
Often the initiator and responder will have fixed IPs, e.g. when connecting multiple sites of one enterprise. But if the initiator is roaming, e.g. a cellphone, its IP address and default route will change whenever the infrastructure feels like it, particularly when changing cellular protocols like 5G, LTE, UMTS or EDGE. This is one of my major use cases.
The default route resides in the Main policy routing table. When it changes, the WireGuard infrastructure is not going to be able to detect the change and copy it into other policy routing tables.
If the responder expects to originate payload connections (e.g. http or ssh) to the initiator, then a roaming initiator should configure keepalive packets to announce to the responder which IP it has roamed to. This is PersistentKeepalive in the Peer stanza. A value of 25 (seconds) is generally recommended to deal with commercial NAT routers, and 25sec seems reasonable for the roaming case also.
Each Peer stanza needs one or more AllowedIPs assignments. The value of each one is a comma (and whitespace) separated list of IP subnets with CIDR bits, e.g. 192.168.3.0/24, fd18::/11. To get a packet sent out, local routes need to direct these address ranges to the WireGuard interface. The wg-quick script picks the ranges out of the configuration file and creates routes for you. Incoming and outgoing packets are checked for being in the AllowedIPs ranges; if not, they are dropped silently, preferably before being sent to the other end. It's a best practice, and saves CPU work, for only allowed packets to get routed to the WireGuard interface.
Inside the WireGuard driver, the AllowedIPs are sorted similar to a normal routing table: the longest ones (number of CIDR bits) are tried first, and the first match wins; the packet goes to the peer which provided that subnet. If identical AllowedIPs are in multiple Peer stanzas, it's an arbitrary choice which peer the packet goes to, so don't do that.
OpenVPN has a very similar concept called an iroute.
Almost invariably the peer's own IP is in one of the AllowedIPs ranges. And the local host is going to be sending bearer packets to the peer, which have a route to oblivion down the tunnel that they are bearing. Policy routing will rescue them. The scheme I worked out goes like this:
- The author of the ip-rule man page warns that each rule needs an explicit and unique order number (priority). Rules are executed ordered by increasing numbers.
- If you use IPv6 you need a parallel set of rules for the IPv6 packets. In my case the bearer packets are always IPv4 but payload packets could be either.
- All my WireGuard hosts use the same ListenPort = 4296. (42xx is a vacant block of 100 ports obviously for private stuff.) Jason Donenfeld often uses 51820. Using the same port simplifies the rules.
- Pick a vacant table (if it can't be a fixed assignment).
- Place in that table routes that send each AllowedIPs range into the WireGuard interface. Example:
  ip -4 route add 192.168.3.178/32 dev wg0 table 214
- Recognize bearer packets by their destination port (dport), and flip them directly to the main table, bypassing following rules.
  ip -4 rule add dport 4296 table main priority 11
- Send all remaining packets to the WireGuard table:
  ip -4 rule add to all table 214 priority 12
  ip -6 rule add to all table 214 priority 13
- If the packet matches none of the AllowedIPs routes, it continues to the next rule, provided by default, sending it to the Main table, where it encounters the default route (changeable) and others like the prefix route for the local LAN.
In the stock WireGuard implementation, if the responder is ever able to respond to an initiator's connection, and if it is ever going to initate a payload connection (ssh, http, etc) to that initiator, that connection will go over WireGuard, whether or not the initiator has WireGuard running at that moment. In other words, my scenario where some LAN hosts mostly don't encrypt is not going to work: it's going to be always WireGuard, or never WireGuard.
IPSec has an authentication, authorization and key establishment daemon called Charon on UDP port 500 and 4500. It sends Traffic Selectors to the kernel that specify which endpoints' traffic should be encrypted. This kind of building block fits perfectly with WireGuard's method of operation, and I should be able to write one that's as simple as the rest of WireGuard. Following the underworld theme, I'll call it Anubis. Basic design points:
- It will be a web app, a CGI script. It can also be called outside of the webserver, with different authentication.
- It will require TLS, and will accept but not require an X.509 certificate from the client, for authentication.
- Authorization will be basically like Charon or OpenVPN does: a certificate with a trust chain including a particular Certificate Authority (mine) will be accepted for admission.
- An obvious and desirable substitute is for the client to present its WireGuard public key and an encrypted challenge string.
- While the main emphasis is on scripted operation, there should also be a GUI on a web page. Needed functions (like shutdown) are to be decided later.
However, I need WireGuard to be working soon, well before I could commit to finishinng Anubis. Therefore the all or none design will be deployed promptly: the responders will only be configured with the peers that will always use WireGuard to talk to them.

Translating these into implementation issues:

The real hosts have addresses on the local LAN in 192.9.200.192/28 and 2600:3c01:e000:306::/112. Excluded hosts which are still on the local LAN, e.g. appliances like the VoIP ATA, are in 192.9.200.208/28. The whole local LAN has 31 members and /27 bits.
wgtunnel@jacinth.service (or Surya) starts WireGuard in initiator mode, connecting to the selected server. It will create an interface called wg0, and will put this host's address on it, with noprefixroute. Another interface such as en0 or br0 also has the same address, with a prefix route. It will make routes to send all LAN traffic down wg0 but traffic to the wild side will still go via the default route. Except, there will be a policy routing rule and table to send bearer packets via the default route even in the common case that the destination is on the local LAN.
We'll need an option to send all traffic including wild side (except bearer packets) via the tunnel, but this will be rare. Once I was on vacation and wanted to buy an e-book, but my preferred vendor had rights for selling in the United States but not for the country I was in. So I set my default route via CouchNet: problem solved.
This traffic will be routed on Jacinth's end (client) of the segment tunnel: all IPv6, except not if originating from Jacinth's wild IPv6 interface; and Surya's wild and local IPv4 addresses.
wglisten.service starts WireGuard in listening (responder) mode on Jacinth and Surya. Surya's local name resolves to the address on this interface, but Jacinth (and potentially other servers) will have a separate address.

What's already written: this is all in /etc/wireguard and I'm describing what's currently on Jacinth.

fcns.incl -- try executing a command line; initialize the WireGuard link; configure routes through the tunnel.
$HOST.{priv,pub} -- Host key pair.
makegeneric.sh -- Makes a generic WireGuard configuration file, with special cases on Jacinth and Xena.
mkpeers.pl -- Not sure what it does; it's definitely not finished.
omelet.pl -- Wrapper for wg-quick that handles policy routing for the bearer packets. To be used on the client (initiator).
wg0.conf -- Conf file for responder (server).
wg0.up -- for bringing up the initiator (client) (?)

Fly in ointment; I'm testing the WireGuard configuration on one responder and one initiator. The responder starts first. The initiator starts 2 seconds later. (Both have Peer stanzas with Endpoints.) The responder sends a bearer packet, probably key establishment. The initiator responds ICMP UDP port 4296 unreachable. The responder never sends another bearer packet, despite payload packets coming in to wg0, and the initiator never sends any bearer packet.

Investigation #1: Why didn't the Initiator get any payload packets routed to wg0? Because the AllowedIPs is to Jacinth's wild side and all the payloads go to Jacinth's LAN address.

Fix #1: Force the AllowedIPs to the LAN address. Program couldn't handle a FQDN for a peer. Now it can handle the FQDN. Functional test passed (curl to webserver, both directions).

The working endpoints are:

On Xena: HOSTNAME=xenawild ./wg-test -p jacinth.cft.ca.us
When Jacinth or Surya is a peer, its AllowedIPs should include both the LAN and wild addresses.
On Jacinth: ./wg-test -p xenawild
When Xena or Xenawild is a peer, its AllowedIPs should include Xenawild and the Xena internal net.

Anubis, Your Guide to the Underworld

The StrongS/WAN IPSec suite includes a daemon called Charon, formerly Pluto. The initiator starts a VPN-type connection by signalling their own Charon to establish a Security Association with the peer's Charon (authentication credential required) and to send to the peer connection parameters like which address ranges should go over the tunnel. In OpenVPN the connection setup module isn't a separate daemon but it performs similar functions, including selecting affected traffic (its equivalent of AllowedIPs is called an iroute).

WireGuard needs a similar gatekeeper which, following the underworld theme, I'm calling Anubis. Its functions are just about one to one equivalent to Charon's, but WireGuard has advantages in simplicity. Here are the basic design points:

Any plan involving modifications to WireGuard's kernel driver is not going to fly.
I want to get away from the paradigm that when peers are authorized, WireGuard always has them configured in the kernel and therefore always communicates with those peers over WireGuard. On my net two of the pairs are in this mode, but most of the associations should be available when needed, and should not be another opportunity for bugs to kill the net, if permanently activated despite not being needed.
Therefore the part-time initiator communicates with the responder's Anubis, authenticating, and both of them add or delete the peer's public key, endpoint and AllowedIPs on the kernel's list, plus routes, activating or turning off WireGuard.
Just like StrongS/WAN's Charon, WireGuard has two collections of Security Associations: generic VPN communication occurs only when activated, whereas (not like StronS/WAN) Anubis has a separate WireGuard interface with a permanently configured connecton to every authorized peer. That won't scale to hundreds of thousands of peers, but it's something I can implement promptly for my much smaller net.
Or will it scale? How big is the per-peer data structure? Off the top of my head, assuming IPv6, I can think of: peer's public key (32 bytes), IP address (16), port (2), just one AllowedIP with CIDR bits (17), probably not our ephemeral private or public key (0), and the Diffie-Hellman shared secret (possibly further hashed) as the symmetric key (32), and 2 chain links (2x8), for a total structure size of 115 bytes, call it 128. For a million peers we're talking about 1.28e8 bytes. The smallest Raspberry Pi you can buy has 2.14e9 bytes RAM. So ignoring the minor detail of network bandwidth, you actually could serve a million VPN service users on your Raspberry Pi.
Anubis has a fixed port for bearer packets, different from the production WireGuard, and the firewall needs to allow packets from any wild-side or internal IP to both these ports. Policy routes at both ends limit communication to only Anubis' service port; thus the availability of perpetual WireGuard service to this one port has no effect on generic WireGuard service, or the lack thereof when not wanted.
With this infrastructure handling authentication, authorization and security, all Anubis needs to do is this:
The initiator sends a packet saying up or down, and some public key. To have keys, and therefore to send this packet, the initiator must know its own private key, so the responder is assured that the packet comes from some authorized peer, and there will be no problem when WireGuard cheaply drops random exploit attempts.
But the packet's IP address does not uniquely identify which connection should be brought up or down, because NAT, or network infrastructure for containers or virtual machines, could cause the same service IP and even port to be used for multiple WireGuard instances.
The initiator creates a nonce (meaningless random bits) and hashes it; the hash is included in the packet. The initiator encrypts the nonce with a symmetric key which is the initiator's private key times the responder's public key (its private key times the G factor), which the initiator already knows, and the result goes into the packet.
The responder decrypts the nonce with the public key in the packet (hoped to be the initiator's private key times the G factor) times the responder's private key: the Diffie-Hellman shared secret, which is the same at both ends because multiplication is commutative in modular rings, incuding the field of size 2²⁵⁵-19. The responder hashes the decrypted nonce and compares with the hash in the packet. If they are equal, the responder knows that the packet came from the specific authorized peer using the public key in the packet.
The responder then can identify the production connection and bring it up or down. Mission accomplished.
A nice addition to the protocol would be a response with a success or error message.
An alert reader will have noticed that a Black Hat can steal the private key by dividing the public key by G. In the ring of integers we have an efficient algorithm to do that, long division, but not so in modular rings: the effort to do the division is similar to doing test decryptions using for the symmetric key each ring member from 2 to 2¹²⁸, the square root of the size of the modular ring. This effort level is considered sufficient to protect Top Secret data.

Final (I Hope) WireGuard Design, Version 3

These are the categories of hosts:
- Two servers, Jacinth (main router) and Surya (cloud server). They have Peer stanzas for all hosts capable of WireGuard. Jacinth routes, via WireGuard to Surya, all IPv6 traffic destined for the wild side. Surya routes CouchNet traffic via WireGuard to Jacinth.
- Xena, my laptop, which roams. It has an internal subnet with a development VM on it with bridged networking, the WireGuard interface wg0 is in the bridge, and the bridge holds Xena's own name and IP addresses. It always runs IPv4 traffic to CouchNet, and all IPv6, via the VPN, but IPv4 access to the wild side is via its wild interface (no VPN, but TLS when needed), which would actually be on the local LAN when it's at home.
- Selen, my cellphone, has no internal subnet, but it's set up similarly to Xena, except it usually has WireGuard turned off (activated manually by the WireGuard app). On the LAN, Kea DHCP4 sends a route to all hosts directing traffic to Selen via Jacinth. This means that when other hosts need to originate connections to Selen, e.g. for backups, Selen has to turn on its VPN to Jacinth to receive its peer's packets.
- Eight other hosts, referred to as leaf nodes, normally have WireGuard off; their names resolve to the addresses on their LAN interfaces. For a special operation like a VPN test, a leaf node brings up WireGuard to Jacinth, does the job, then shuts it off.
- There are several appliances that are not capable of running WireGuard. They rely on the default route via Jacinth passed out by Kea-DHCP4. Examples: VoIP ATA, streaming video box.
For each host, its $host.conf file has Peer stanzas that list AllowedIPs which the WireGuard kernel module allows to emerge from wg0, the WireGuard interface, having originated from that peer or more distant hosts using the peer as a gateway. The startup script turns all the AllowedIPs into routes, so when the host sends traffic to the AllowedIPs via the peer, traffic goes via WireGuard. Except, a policy routing rule diverts bearer packets, recognized by their destination port, to leave via one of the host's non-WireGuard interfaces.
Jacinth (main router) and Surya (cloud server) have a server role: each one's $host.conf has a Peer stanza for each other host including each other. In the stanza, the peer's Endpoint is the one used for bearer packets: the one on the local LAN in most cases. Its AllowedIPs in most cases are just the peer's WireGuard address, the one on wg0 (the WireGuard interface).
Non-servers have a peer stanza to Jacinth, and most of them have AllowedIPs for a split default route via Jacinth. Whether they have corresponding routes is a more complicated story. Their hostnames resolve to their non-WireGuard interface addresses. Jacinth is going to be out of action for an unknown time, probably weeks long, and during that period the remaining functioning clients will switch to Surya for access to the local LAN, whose only member is Surya (providing directory services). Their default route will go through their wild side interface, not Surya.
Xena has an internal subnet with a VM, Petra, on it, as well as its own WireGuard address, and both servers include the whole subnet in Xena's AllowedIPs.
In surya.conf, the Peer stanza for Jacinth has AllowedIPs for the whole local LAN, the competitor VPN endpoint ranges, and Jacinth's wild-side IPv4+6 addresses.
Surya and Xena's names resolve to their WireGuard addresses. Jacinth's name resolves to its local LAN address. It has separate names for its wild-side interfaces and WireGuard.
Jacinth's ISP isn't capable (in 2024) of native IPv6, and Jacinth uses Surya as an IPv6 gateway. In jacinth.conf, the Peer stanza for Surya has AllowedIPs for all IPv6 addresses in a split default route: ::/1, 8000::/1. Firewall rules have a set of authorized ports on each host, specifically Jacinth, and we aren't relying only on AllowedIPs to keep out the global hacking community.
Xena (the laptop) roams, but most of the time it's at home on the local LAN. To make the internal subnet function, it alwyas uses the VPN, with Jacinth as its (only) peer. It has AllowedIPs and routes to accept the whole CouchNet address range over WireGuard, and its non-WireGuard interface has a split default route through Jacinth.
Selen is a cellphone running LineageOS (Android). It has three modes of operation: roaming with no VPN (it's cut off from CouchNet), roaming with VPN (routing is like Xena), and at home without the VPN (full service because it's on the local LAN). Its configuration when using the VPN is pretty much the same as Xena.

Ideas for WireGuard Evolution

Official Multi-Client Support

Jacinth's role on OpenVPN and IPSec is as a generic server: potentially a variety of clients could connect at the same time, authenticating with an X.509 certificate with an acceptable trust chain. This isn't going to fly with WireGuard, since the server has to know the client's public key before it can accept a connection from the client.

Brainwave:

The proposal is, when a client sends its initial handshake packet, WireGuard in the kernel does not immediately respond, but emits the public key on some socket. You can capture packets (but not hold up responses) without modifying WireGuard, with a traffic control filter (/usr/sbin/tc) that cues on the WireGuard port and message type 1. Its action would be mirred egress mirror dev $IFB, the latter being an Intermediate Functional Block (synthetic interface) which the daemon can listen to. See man tc-mirred for documentation.
A listening daemon in user space, analogous to Charon for StrongS/WAN (IPsec), checks if the public key is on its list of pre-registered clients.
Alternatively it will use DNS: a PTR record to get the client's domain name, then SSHFP to compare the public key with the client's ed25519 SSH public key (RFC 7479), and all clients within a particular sub-domain are considered to be authorized. DNSSEC would be important so the responder can trust the DNS responses.
The PTR record won't work through NAT. An alternative is to include in the dummy payload heterogeneous extra info including the client's alleged domain name. That name is expected to have a SSHFP record, and the client is expected to proffer the public key identified in that record. Again, a particular sub-domain is authorized en masse.
Once the responder's daemon gains confidence that the client is authentic, it can signal to the kernel module to respond, or to wipe client data if authentication fails.
Similar to OpenVPN and IPSec, the client should include in its handshake payload the AllowedIPs it hopes the responder will allow. The daemon should intersect with local policies, pass the resulting Traffic Selector to the kernel, add/modify routes accordingly, and include the actually used Traffic Selector in its response payload.

Explicit Exit Notify

WireGuard needs the equivalent of OpenVPN's explicit-exit-notify. When the kernel module detects that a connection is going down (e.g. ip link del dev wg0) it should notify the peer. The rekey timeout seems to be short, under 1 minute, but the rekey attempt only occurs if the non-dead peer sends a packet, and it's not clear how much state it's keeping for the dead peer and how significant that is. It just seems neater to notify the surviving peer if you're closing the connection.

Dealing With a Compromised Crypto Algorithm

Cryptographic algorithms can't be relied on to last forever, although Rijndael (AES) has lasted with only minimally effective attacks up to 2021 since 1997 (inception, or 2001, anointment in FIPS pub. 197), and ChaCha20 has been widely deployed from 2008 to 2021. It would be a very smart move to add algo negotiation, with the needed info in the dummy payload in the initial handshake packet.

There should be more than one algo that all implementations are required to support, plus the possibility of optional algos, preferably every algo known to the initator's kernel. I'm discussing the case that (only) one of these is compromised, it is the initiator's favorite (the initiator not being up to the minute on security patches), and the responder refuses to use it. The inverse case causes no problems since the initiator would not use a known compromised algo for a connection.
The initiator's initial packet would include (unencrypted) the ID of the algo by which the payload is encrypted, and in the payload, a list of algos in preference order that it could support.
If the responder rejects the proposed algo, the responder will send back a normal response including the algos that it does accept, encrypted with a different required algo. An error indication will be included.
The initiator knows that the handshake has failed. It initiates again, encrypting with one of the algos that the responder announced.
The responder intersects the initiator's algo list with its own, and picks an algo from the ones shared by both ends using an implementation-defined procedure (which does not have to be coordinated with the peer), such as the surviving algo that one or the other end most prefers, or a compromise: picking the algo with the lowest sum of ranks at both ends. The response, and all subsequent payloads from both ends, will be encrypted with this algo.
A mutually acceptable algo is guaranteed, since there is more than one required algo and it's assumed that only one algo at a time is compromised.
The initial handshake packets from both ends need the unencrypted algo ID, but the ID should be omitted subsequently.

Bearer Packets Down the Tunnel

In this scenario you have a chicken and egg situation that results in an omelet. wg-quick already recognizes when the default route is sent through the tunnel and puts in a policy route to divert bearer packets to their original (presumably default) route. But a more limited omelet route is not recognized, nor is the case where such a policy route has already been set up.

The very first step for wg-quick should be to do ip route get $EndpointIP, with the IP it's actually going to use (IPv4 or 6), This route should lead to the peer's non-tunnel address. When wg-quick finishes setting up routes, including running PostUp and PreDown scripts that might set routes, it should again do ip route get $EndpointIP, and if the route goes through the WireGuard interface, it should do the policy routing thing that diverts bearer packets via the route that it initially discovered. As much as possible of this route should be preserved, specifically the metric and the source address, if available.

On a server with multiple peers you may need an individual diversion route for some or all of the peers.

Re-planning Routes

I'm looking carefully again at the network design on my net. I think I need to refactor routes to/via the VPNs (with WireGuard added). In the table below, leaves means all the hosts not explicitly mentioned. $pfx represents the first three octets of the IPv4 address range. See below for Xena's default route, indicated by *. There are analogous addresses and routes for IPv6.

Host	VPN or Route	Presently	Change To
— Address Ranges —
Vacant		$pfx.0/25	$pfx.0/26+64/27
Jacinth	OpenVPN 1194/udp	$pfx.128/29	$pfx.96/29
Jacinth	OpenVPN 443/tdp	$pfx.144/29	$pfx.104/29
Jacinth	IPSec	$pfx.160/29	$pfx.112/29
Jacinth	WireGuard	(none)	$pfx.120/29
Surya	OpenVPN 1194/udp	$pfx.136/29	$pfx.128/29
Surya	OpenVPN 443/tdp	$pfx.152/29	$pfx.136/29
Surya	IPSec	$pfx.168/29	$pfx.144/29
Surya	WireGuard	(none)	$pfx.152/29
Surya	Segment tunnel	$pfx.184/29	$pfx.160/29
Xena	Xena+Petra subnet	$pfx.176/29	$pfx.168/29
Vacant		(none)	$pfx.176/28
Leaves	Main LAN	$pfx.192/26	$pfx.192/26 (same)
DHCP	In main LAN	$pfx.240..254	No change
— Routes —
Leaves	Default route	Jacinth $pfx.193	(Same)
Jacinth	Default route IPv4	Its wild side (en1)	(Same)
Jacinth	Default route IPv6	Surya $ofx.185	Surya $ofx.161
Surya	Default route both	Its wild side (en0)	(Same)
Xena	Default route	Jacinth $ofx.193*	(Same)
Petra	Default route	Xena $pfx.177	Xena $pfx.169
Jacinth	Main LAN	dev br0	(Same)
Jacinth	Jacinth OV 1194/udp	dev tun0	(Same)
Jacinth	Jacinth OV 443/tcp	dev tun1	(Same)
Jacinth	Jacinth IPSec	Already on Jacinth	(Same)
Jacinth	Jacinth WireGuard	(none)	dev wg0
Jacinth	Surya VPNs+subnets	(Combined)	dev tun9/wg9 to Surya
Jacinth	Surya OV 1194/udp	Surya $pfx.185	(Combined)
Jacinth	Surya OV 443/tcp	Surya $pfx.185	(Combined)
Jacinth	Surya IPSec	Surya $pfx.185	(Combined)
Jacinth	Surya (segment tnl)	dev tun9 (to surya)	(Combined)
Jacinth	Xena + Petra	VPN(Xena) $pfx.130	VPN(Xena) $pfx.106
Surya	Jacinth VPNs+subnets	(Combined)	dev tun9/wg9 to Jacinth
Surya	Jacinth OV 1194/udp	Jacinth $pfx.186	(Combined)
Surya	Jacinth OV 443/tcp	Jacinth $pfx.186	(Combined)
Surya	Jacinth (segment tnl)	dev tun9 to Jacinth	(Combined)
Surya	Jacinth IPSec	Jacinth $pfx.186	(Combined)
Surya	Surya OV 1194/udp	dev tun0	(Same)
Surya	Surya OV 443/tcp	dev tun1	(Same)
Surya	Surya IPSec	Already on Surya	(Same)
Surya	Xena + Petra	Jacinth $pfx.186	(Combined)
Surya	Main LAN	Jacinth $pfx.186	(Combined)
Xena	(finish this)

Picking Up Again

WireGuard is not IPSec (StrongS/WAN) or OpenVPN; it has no key agreement mechanisms analogous to StrongS/WAN's Charon. The correct way to handle WireGuard is pure pre-shared keys. The server always loads all the authorized keys, and each client has its own key and the server's key. For true point to point links, the server would have only one authorized peer.

So let's set this up and be done with it!