Some info about WireGuard, a new VPN:
Project website: www.wireguard.com
Inception 2015.
Lead developer: Jason A. Donenfeld. WireGuard
and service
marks are registered trademarks of Jason A. Donenfeld.
Website sponsors: ZX2C4 (Jason A. Donenfeld) and Edge Security.
I have just gone through yet another audit of my VPNs, making sure that
they work for all relevant clients and that the vpn-tester program can
competently report if they are or aren't working. Currently (2021) my two
servers run StrongS/WAN IPSec (strongswan-ipsec-5.9.3 on SuSE Tumbleweed) and
OpenVPN (openvpn-2.5.3 on SuSE Tumbleweed). The clients have Linux (same
versions) and Android: strongSwan VPN Client
(version 2.3.3,
org.strongswan.android) and OpenVPN for Android
(version 0.7.25,
de.blinkt.openvpn). Both VPNs work well when properly configured, but they
have a number of less than wonderful features:
The learning curve is steep for routing packets properly through the tunnel (of course excluding bearer packets) and for providing credentials in the required form to authenticate the two ends.
The network paradigm for IPSec is, if an IPv6 packet has an IPSec header, the corresponding Security Association has the information (crypto algo, key, etc.) which the kernel can use to decrypt it. The headers and payload thus revealed are than processed normally. Outgoing packets are selected by a Traffic Selector (different from normal routing) and the inverse transformation is performed, after which the encrypted packet is sent out through normal routing. IPv4 uses ESP and AH protocol packets instead. I found that it was often a challenge to get the Traffic Selector right, and it was also a challenge to extract the cert's Distinguished Name in the format Charon wants. (Using a SAN turns out to be easier.)
OpenVPN uses a tun/tap device, and payload packets pop out of it or are stuffed into it by normal routes, as if it were a physical net interface. It's a lot easier to handle routing in this context, which WireGuard shares.
IPSec connects fairly promptly, initially or after a net disruption, but OpenVPN takes several seconds to do this.
Both IPSec and OpenVPN have a lot of code in the packages: around 400,000 and 600,000 lines of code. This doesn't affect the end user directly, but I remeber a quote from Weitse Venema, the author of the Postfix mail transport agent: he says his code has about one bug per thousand lines, and if you introduce complexity (he was talking about TLS for mail transport) you should think about exploits against the bugs and accidental loss of service.
Responding to shortcomings in existing VPN software, Jason A. Donenfeld in 2015 began to develop WireGuard, a new VPN. The project website describes these features; whether they're scored as good or bad depends on the user's goals.
Drastically reduced complexity; features not absolutely essential were sacrificed for this goal. He claims only 4000 lines of code.
A lot fewer configurable aspects; for example there is [currently, 2024] only one symmetric crypto algo, ChaCha20Poly1305 by Daniel Bernstein, so no configuration and no negotiation with the peer. Negotiation is very complex for both IPSec and OpenVPN.
An ED25519 (Edwards Curve Diffie-Hellman) public key, locally generated, serves as both the authentication credential and the foundation of the tunnel's encryption, similar to the style of SSH.
WireGuard does UDP only, no TCP and no unusual protocols like ESP. Check out udptunnel and udp2raw for add-on layers if you need TCP (which I do).
Very fast connection setup: the initiator sends one handshake packet, the responder sends one back, whereupon they both can infer the symmetric keys to encrypt payloads. The CPU time needed to do this is obviously minimal.
The ChaCha20Poly1305 symmetric crypto algo (AEAD type) is faster than the competitors.
The protocol isn't chatty: the only packets sent are payloads and key establishment (and rekeying). You can configure keepalive packets (zero-length payloads) if your net needs them. The responder doesn't reapond to and doesn't expend resources on unauthorized initiators.
Some details of the Edwards Curve Diffie-Hellman key establishment procedure are interesting. See this Wikipedia article about ECDH, which I've summarized. See also the EdDSA (Edwards Curve Digital Signature Algorithm) article, the section on ED25519.
Parameters for ECDH are agreed on in advance; WireGuard has only one set of parameters built in. It uses a modular field. NSA guidance says that a field of size in the range of 2^256 is sufficient for protecting Top Secret data; that is, to crack the crypto the Black Hats would have to run billions of dollars worth of computers for a year or more to crack one key (and WireGuard re-keys about every 2 minutes). The actual modular field size is 2^255-19.
A private (secret) key for ECDH is a randomly chosen point in the modular field, basically a 255 bit random number excluding the 19 that won't fit. Call it S. For the public key, a number G is agreed on, and it is added to itself S times; that is, G is multiplied by S. Call the product Q.
An attacker can recover the private key by dividing Q by G. This would be easy if the operands were integers, but in the modular field, doing the division will need similar effort as in cracking a 128 bit symmetric key, half the ECDH bit length. This is the effort level currently considered adequate for protecting Top Secret data.
For each connection (or re-key), an ephemeral Diffie-Hellman key
pair is created by each peer. Generating the private key would
normally include one or more whitening
steps through a
pseudorandom generator, requiring a 256 bit multiplication, plus
another multiplication for the public key, but this is a lot less
effort than generating a prime factor key pair (RSA).
The initiator sends its static (permanent) public key and the ephemeral public key that it just generated; neither is encrypted, so the attackers know them. The responder sends back its ephemeral public key, but the initiator is supposed to have the responder's static public key in a configuration file. Each peer also sends a counter which prevents replay attacks, and an encrypted dummy payload which, if successfully decrypted, assures each peer that the other one holds the private keys that correspond to both public keys that it proffered or that was configured.
Each peer, for each of the static and ephemeral keys, multiplies the other end's public key by its own private key. Remember that the other end's public key is its private key times G, so the resulting product is the local private key times the other end's private key times G (but done in the opposite order at the other end). Since multiplication in modular rings (including fields) is commutative, both will get the same answer: the Diffie-Hellman shared secret. The peers hash up the shared secrets with an agreed-upon algo to produce the symmetric key which they will use to encrypt or decrypt payloads.
For authentication, the dummy payload includes a HMAC, or the symmetric encruption algo is AEAD type which includes a HMAC, so each end can tell authoritatively whether it decrypted the payload successfully, and if so, they know authoritatively that the other end used the private key corresponding to the public key that was proffered, in other words, they can be sure of the identity of the peer (unless its key was stolen).
For authorization, the initiator has the responder's static public
key in a configuration file, so it can be sure that it is talking to
the intended responder. The responder has a list of public keys of
every initiator authorized to connect. It knows on the first packet
who the initiator claims to be, and will only respond if that public
key is on its list. The list can be added to or pruned on the fly by
the provided wg
utility, which would be called by out-of-band
facilities that aren't part of WireGuard, analogous to the
charon
key management daemon of StrongS/WAN (IPsec) and related
functions in OpenVPN.
What are my goals for the VPNs, and how much hassle will it be to make WireGuard deliver what I need, so I can add it to my collection?
Resistance to bit-rot: changes in the system configuration tend to have a bad effect on operation of the VPNs, and I hope WireGuard will be affected less than IPSec and OpenVPN.
While I always use UDP if feasible, avoiding the dreaded TCP Meltdown syndrome, I've found that hotel Wi-Fi often blocks UDP in general and VPN ports in particular. Thus I support and use TCP on port 443, which the Wi-Fi access points have got to pass. (Note: some authoritarian nations block VPN ports nationally. 443/TCP can bypass this, but the Secret Police could recognize that it was not normal HTTPS traffic, even if they can't decrypt the payload, with baleful consequences for the perpetrator.)
OpenVPN can multiplex VPN traffic and another protocol (e.g. HTTPS) on its listen port, so the VPN server can host a normal webserver as well. For WireGuard, it's probably better to not mess with protocol conversion or tunneling from TCP to UDP. Rather, 53/UDP for DNS is another port that the hotel weasels can't block, and I might try to have DNS listen only to localhost:53(UDP) while WireGuard listens to 0.0.0.0:53(UDP), ignoring occasional DNS packets.
There are four VPN routes that need to work:
The segment tunnel, from the main router Jacinth to a host in the cloud. See A Server in the Clouds for what is being accomplished here. Basically the local net's default route for IPv6 goes out via the cloud host.
Xena and Petra to Jacinth: Xena is my laptop, which roams, and Petra is a virtual machine on it for development. See Network for Xena's Virtual Machine. But how do other hosts know where to send packets destined for Xena and Petra? Solution: they always send via Jacinth, and Xena always initiates a VPN to Jacinth even when it's not roaming.
Selen to Jacinth. Selen is a cellphone with Android, and it also roams. To get to the webmail and PIM server from off-site it needs a VPN. See Jimc's Awesome PIM Server for the packages I'm using. Unlike Xena, Selen would prefer to not use the VPN unless it is both roaming and using PIM, or is working with local LAN hosts, or is doing something for which privacy or integrity is critical.
For testing the VPNs, two local LAN hosts are chosen (not necessarily the same every time, depending on which are up or down). Host A connects to Jacinth's VPN server and sends packets to B; the tester checks if the packets go direct (the NoVPN case) or via Jacinth, and whether their content is really encrypted. For this to work, every LAN host has to be able to connect to Jacinth's WireGuard. The test is done daily; most of the time the LAN host is not using WireGuard.
For authorization, VPNs can be set up two ways. In the historic design each connection has individual credentials installed, typically in the form of pre-shared symmetric keys. Modern versions, such as IPSec and OpenVPN (starting in version 2.x), install a credential (normally a X.509 certificate and private key) on the server and on each client; the client certs are all signed by one Certificate Authority (or intermediate cert) which the server requires in the client's trust chain. The server doesn't need the clients' certs individually. I don't really have a big herd of users and I can handle either arrangement, but X.509 certs are what I'm using now. Preview: for WireGuard, each connected pair of peers needs to be configured at each end, but the credentials are Edwards Curve Diffie-Hellman public keys of 255 bits (32 octets) that are also used for the crypto, not pre-shared symmetric keys.
There's an issue that makes a lot of trouble for designing a net with VPNs:
some clients always use the VPN and some don't. I'm implicitly assuming a
central server that all the clients work through. For the always
VPN
case the right
routing setup is to assign the client's hostname
to a fixed address on the VPN tunnel device. The same address is on the
client's LAN interface, and by policy routing its peer(s) send bearer packets
to that interface. Normally only the server would have that policy route, but
there might also be multiple peers. The server always has, and advertises, a
route to this client through its own VPN tunnel endpoint, and the rest of the
LAN (non-peers) sends to the client via the server. My laptop and my cloud
server operate this way.
The harder case is when the client sometimes uses the VPN, and sometimes doesn't, like my VPN tester and my cellphone. It's a total can of worms to set up a route via the server when the client connects, and to make this route go away when it disconnects, particularly when other LAN members need to originate connections to the VPN client, directly or via the server, depending. The way I'm handling this on the other VPNs is, the client's name is assigned to a fixed IP on its egress interface: Wi-Fi or Ethernet. Peers on the LAN connect to this non-VPN address, except that isn't possible if the cellphone is roaming (because peers don't know the cellular assigned address and I'm not going to mess with dynamic DNS on my cellphone). When the client turns on the VPN, it puts a separate IP address on the VPN endpoint, and the server has a permanent route to this address (or pool) via its own VPN endpoint, which it advertises all the time. Other LAN hosts can originate connections to the client's VPN address, but only when the client and its peer have the VPN turned on.
Let's make this design into something a little more concrete that I can turn into a WireGuard conf file.
At the moment this design is evolving and this section might differ in some points from the previous one.
There are two potential VPN servers: Jacinth, the main router, and
Surya, the cloud server. At present Jacinth is the only one used, and
Surya is in a standby or hot spare configuration, acting like a client.
When we change locations, Jacinth will be packed up and Surya will become
the the VPN server that the always VPN
clients use.
At present, the possibility of VPN connections is configured
statically: you have to edit configuration files to turn the VPN on and off
between a host and the VPN server. This implies that sometimes VPN
is a misnomer; all hosts (except the never VPN
category, e.g. the
Roomba cleaning robot) are always able to send encrypted packets to the
server. Whether they use this opportunity is another matter.
The three always VPN
hosts have small differences in their
configurations. Xena, the laptop, has its own address on its internal
subnet, and its wild side interface has whatever address it gets from the
net it's connected to: preassigned on the LAN, but not predictable when
roaming, and I'm not going to mess with dynamic DNS for this address. Xena
sends all CouchNet traffic (except to its own VM) through the VPN to the
server (and everything else via whatever DHCP default route). Since Xena's
address is not on the LAN, clients on the LAN wishing to send to Xena will
go via their default route, which is Jacinth, which has a route to reach
Xena's internal subnet via the VPN. Jacinth's WireGuard sends bearer
packets from Jacinth to whatever IP Xena is currently sending from (wild
side or local LAN), and Xena's WireGuard sends bearer packets to whatever
Jacinth is using; its wild-side address occasionally changes because we
don't have a fixed IP for it, and DNS is updated dynamically, but there's a
short period while the TTL times out, and the WireGuard kernel module
doesn't consult DNS anyway.
Surya, the cloud server, has its own address, not on the LAN, on the VPN interface. Like Xena, it sends all CouchNet traffic through the VPN, and everything else out its wild side. With OpenVPN as the VPN, Jacinth acts as a client and initiates the connection. With WireGuard, Jacinth's IP is aleatory (from the carrier's DHCP), so Surya can't initiate to it; Jacinth will send the first packet to Surya's fixed IP. After that, dynamic DNS is used to update the published IP of Jacinth.
Selen, the cellphone, puts its LAN address on its VPN interface, while the wild side address is from the carrier's (or CouchNet's) DHCP, so when it's not roaming, both addresses would be equal. Like the others, CouchNet traffic goes over the VPN and everything else goes as plaintext (or intrinsic crypto for HTTPS, SSH, etc) out the wild-side interface. CouchNet DHCP pushes a special route so LAN clients will send to Selen via Jacinth.
Rarely, Xena and Selen need to send wild-side traffic through the server. This will be handled by swapping to a configuration that substitutes the VPN as the client's default route. A split 1-bit route automates turning it off: route add 0.0.0.0/1 dev wg0; and similarly for 128.0.0.0/1, ::/1, 8000::/1.
So what happens to the sometimes VPN
clients? All are on
the LAN. Their own hostname resolves to a LAN address that goes on their
LAN interface, and peers can always initiate a connection to it, using
their prefix route to the LAN, or off-LAN peers will transmit via their
generic route back to the LAN. The sometimes VPN
client also has an
address and name in a separate subnet, which goes on its WireGuard
interface. Bearer packets are exchanged with the server via the LAN
interface and address; the server always has a Peer configuration for every
host. When there's a special task that needs the VPN, like a VPN test, the
client adds a route to its peer (or subnet) via its WireGuard
interface, the packets will go to the VPN server, and the server will
forward them to the peer, not using WireGuard unless the peer is in the
always VPN
category (and reversedly for the replies).
----
*The clients' WireGuard interfaces have fixed IPs in a range which
is disjoint from the local LAN. That is, my assigned address ranges
(IPv4+6) have one subnet to be the local LAN, and other subnets for the
various VPNs to be on. The clients also have fixed IPs in a different
subnet for non-VPN traffic like bearer packets. If the client is roaming
it will send bearer packets from a wild-side address assigned by the
carrier, which changes when the phone changes protocol (LTE, UMTS, Edge).
*If the client uses WireGuard all the time, LAN peers send to its fixed address on WireGuard, because that's what its hostname resolves to. If the client uses WireGuard some of the time, peers send to its non-VPN (ethernet or Wi-Fi) fixed address, which the hostname resolves to.
*The main router, called Jacinth, has the IPv4 default route to the wild side and the tunnel for IPv6 (no native IPv6 yet on Verizon or Frontier FIOS, hiss, boo), and the server instances of all the VPNs.
*The rest of the local LAN hosts use Jacinth as their default route;
thus if they need to send a packet to a VPN client's endpoint address, they
automatically send it via Jacinth. While some LAN hosts have static
routes, Jacinth's DHCP and radvd announce a default route (IPv4+6) through
Jacinth. This is the actual implementation of the route
advertisements
mentioned earlier.
*Jacinth has AllowedIPs and matching routes (courtesy of wg-quick) that send and accept traffic to/from each client's WireGuard fixed IP. This is set up at boot time when WireGuard is started, and is supposed to continue forever. If the client has a subnet inside, e.g. my laptop and its VM, the subnet is configured aa a special case.
*The client at the very least needs an AllowedIP and route to the server's WireGuard endpoint address. Some clients will want to send their default routes through the tunnel, but in my use case it turns out that the cloud server, the laptop and the cellphone want traffic to the local LAN (plus other VPN clients and the server) to go through the tunnel, but wild-side traffic should go via their default route on the wild side.
*Bearer packets, i.e. the encrypted VPN payloads, need their own route, because they can't go through the tunnel that they are bearing. OpenVPN creates a special route sending traffic destined for the server (i.e. bearer packets) directly there. However, other traffic like SSH and HTTP(S) also goes direct, causing endless customer support issues and information leakage (for the insecure protocols). For WireGuard, if the policy routing table is configured, and if the default route is sent through the tunnel, wg-quick can create a policy route that diverts only the outgoing bearer packets to that table, into which the original default route is transplanted.
*However, that's not quite what I'm doing; my clients don't send the default route through the tunnel except for special hacks. First I save the route that the bearer packets would take before WireGuard messes with routes. After wg-quick sets up the routes, I re-determine the route of the bearer packets, and if they are being sent down the tunnel I divert them back to the original route, copying the method that wg-quick would have used if the default route had been sent down the tunnel.
The server can staticly set up the policy routes to send via normal
routing any bearer packets addressed to the client's local IP. The
right
way to do this is inside-out, i.e. everything that's addressed
to the client's IP and isn't a bearer packet is flipped into the policy
routing alternate table, which routes such packets down the tunnel. I
believe that one WireGuard interface can distinguish packets addressed to
members of a set of clients and can send them to the correct one. Bearer
packets are not flipped in and are routed as if WireGuard were not
operating; they would be sent out to the wild side if the client has
connected from the wild side, i.e. is roaming. WireGuard on the server
will accept a connection from any IP as long as a known public key is
presented, and as long as the packets get through the firewall
(normally promiscuous on a server).
There are several detail issues that bit me:
If the client is configured with the alphabetic hostname of the server, wg-quick will resolve that name and will prefer the IPv6 address. But the client, if roaming, probably can't connect on IPv6. Cure #1: use the server's fixed IP4 address. But my server is residential and its wild-side address is aleatory, though it lasts several weeks before changing. Cure #2: a wrapper script will resolve the endpoint name the way I want and insert the IPv4 address in the configuration file, much like wg-quick removes its commands when doing wg setconf.
When the client is actually not roaming, but even so sends bearer packets to the server's wild side address, WireGuard replies to them with the wild side address as the source (correct), but uses that source to determine the interface to send from, which is on the wild side, where the client isn't. Other daemons use normal routing to send from the interface that can reach the client. Actually when the client's IPv6 address is replied to, the packets are not lost, but are sent to the cloud server, which sends them back on the segment tunnel (taking 30 msec for the round trip), whereupon normal routing on the server gets them to the client. But this kind of routing is not possible with IPv4.
Cure #3: the wrapper will have to switch to the server's LAN address when the client is not roaming, and WireGuard will have to be restarted when it changes between home and roaming.
When the wrapper or wg-quick does DNS domain name resolution, the client needs a non-VPN address and interface that the server (or other DNS source) can send to without involving the WireGuard tunnel that hasn't been established yet.
Conclusion after a fair amont of testing: If WireGuard is up on both Jacinth and the client (Oso, on the LAN), and if the client configures Jacinth's wild side address (IPv4) as the endpoint, then communication with osowg (the WireGuard addresses, IPv4+6) from LAN clients including Jacinth is perfect with no dropped packets. Jacinth is sending bearer packets from its LAN (not wild) address, and the client updates its peer endpoint to this address, as the docs describe. I previously tested setting Oso's peer endpoint to Jacinth's IPv6 wild address and got some weird behavior involving sending bearer packets to the wild side; I need to check if this is still happening. Communication tests included ping, ssh and w3m (web), IPv4+6. I have SSHFP set up for Oso, but osowg is not considered to be the same host, and needs either a known_hosts entry or its own SSHFP records.
For the symmetric cipher on the main channel, WireGuard uses only ChaCha20Poly1305, for which hardware acceleration is very rare. On the Intel Core® i5-10210U, jimc's tests score it as half as fast as hardware accelerated AES-256 (Rijndael), and twice as fast as software AES-256. This difference would only be significant for a server with thousands of clients.
https://www.wireguard.com/quickstart/
ip link add dev wg0 type wireguard #Pick a name for the tunnel device ip address add dev wg0 192.168.2.1/24 [ peer 192.168.2.2 ] if only 1 peer wg setconf wg0 myconfig.conf (wg utility is provided) --or-- wg set wg0 listen-port 51820 private-key /path/to/private-key peer $itsname \ allowed-ips 192.168.88.0/24 endpoint 209.202.254.14:8172 ip link set up dev wg0
wg (with no args) is equiv to wg show (for all interfaces e.g. wg0) wg-quick [up|down|etc] ctlfile
Wireguard wants ECDH (Edwards Curve Diffie-Hellman) private and public
keys; each is 255 bits (32 bytes) long, or 43 bytes base64 encoded. The
configuraton file may contain the base64 key itself, or the name of a file
containing it. The provided wg
utility can generate them for yous, like
this:
wg genkey | tee privatekey | wg pubkey > publickey
Wireguard does not use X.509 certificates to authenticate/authorize the
peers; authorized keys are preinstalled for each client-server pair. But they
can be installed on the fly by wg
.
You may test with their demo server.
So let's try to set something up. For testing, I'm starting this at 2021-10-07 18:00. I'm going to use these basic steps:
Make sure there's a client for Android. Install it first but don't try to
use it yet. Yes there is one, called WireGuard, with the serpent logo
(®). Inception 2019-10-13, most recent update 11 days ago, 5e5 downloads,
offered by WireGuard Development Team
. You could import a configuration
from a file, or a QR code (!), or create it by hand. I looked at the required
info but didn't create my connection. 7 mins including reading the product
info.
https://wiki.archlinux.org/title/WireGuard
How to get the QR code that the Android client can import. This is from
the Arch Linux wiki
article about WireGuard.
On the Linux desktop host that has the conf file:
qrencode -o outfile -t ansiutf8 -r client.conf
If you omit -o outfile
or specify -o -
the result is on
standard output, and if this is a terminal that can display ANSI UTF-8
characters (see the -t option), the QR code itself becomes visible. You may
need to make the window wider and/or higher to avoid wrapping lines. Suppress
long comments; the maximum size is 4000 characters
. qrencode is from
package qrencode on OpenSuSE Tumbleweed.
The required kernel module is called wireguard.ko and it is in the standard
kernel, version 5.14.11 and likely quite a bit earlier. To pass configuration
information to it (plus displaying connection info and generating keys) you
need wireguard-tools (current version as this is written is 1.0.20210914) from
the OpenSuSE Tumbleweed main distro. Older versions are available for Leap
15.3 and 15.2. 72Kb to download, 145Kb installed. No dependent packages; it
only requires systemd and libc. The package only contains the wg
and
wg-quick
commands, and documentation.
wg-quick is a wrapper around wg for simple configurations. When either
command is given just an interface name such as wg0
, the corresponding
configuration file is sought in /etc/wireguard/wg0.conf, whereas if an absolute
pathname is given the interface is inferred from the basename of the conf file.
The interface name may be up to 15 bytes of [a-zA-Z0-9_=+.-] . (You don't
specify the interface name inside the conf file.)
On Xena I also installed NetworkManager-wireguard plus
NetworkManager-wireguard-gnome (you need both for the GUI). These are
experimental
packages, not in the main distro. Find them with the
SuSE package searcher.
Depends on wireguard-tools. Most likely you don't have the developer's
package signing public key; either get it, or ignore Zypper's security warning.
About 20min to install the packages and read the man pages.
A prerequisite is, what port am I going to use? WireGuard doesn't have an IANA port assignment, but documentation often shows 51280 and forum posts and howto's usually show this one. But this port range (all above 32768) is for aleatory ports, and a collision could occur. The BSD Daemon whispered in my ear that since OpenVPN has 1197 assigned, WireGuard should use xx96. Unassigned and stealable port numbers are 2196 4196 4296 4496 4696 4796 4896 4996 5096 and most candidates above this. 42xx is completely vacant and appears to be intended for private use, and I have a local policy to put nonstandard ports in this range, so 4296 is what I will use. I will need to set my firewall to pass 4296/udp in the same cases as it passes 1197/udp.
On the other hand, for the initial tests (that might fail) I don't want to mess with the firewall, so I'll use 4886, the unofficial wakeup port for Android, which my firewall passes from+to the local LAN so the Android hosts can wake each other up.
Here is the client's configuration file for testing. See the genkey
subcommand of wg
for producing your keys. The conf file contains your
private key (not encrypted), so it should have appropriately restrictive
permissions, mode 600. /etc/wireguard is insalled with mode 700, but I set
the individual conf files to 600 anyway. See the man page for wg
for
a small number of additional configurable parameters such as the keepalive
interval, if your net needs it.
[Interface] PrivateKey = qwerty...= # 43 base64 bytes, about 256 bits. Keep the =. ListenPort = 4886 # Android wakeup port, which my firewall # allows, but I'll have to change this later. [Peer] PublicKey = asdfgh...= # 43 base64 bytes, about 256 bits. Endpoint = [2600:3c01:e000:306::8:1]:4886 # IPv6 in [], port after colon AllowedIPs = 147.75.79.213/32,2604:1380:1:4d00::5/128 # www.zx2c4.com. # There can be multiple peers.
About 25min + to figure out the conf file.
Starting about 16:10
The SuSE package wireguard-tools does not include the scripts mentioned in the quick start guide for contacting the demo server.
When wg is used to bring up the connection, it loads the wireguard kernel module, nine crypto modules (that the documentation says it actually uses), udp_tunnel and ip6_udp_tunnel.
Debugging Petra's networking took extra time, but once I switched to test on Xena it took about 10 minutes to turn on WireGuard and do the tests.
I repeated these steps on Surya. The two test activities succeeded.
Given how my VPN tester is designed, it's a whole lot easier if every host has WireGuard installed, specifically wireguard-tools. Doing that now.
OpenVPN and StrongS/WAN assign the client an IP address from a pool, similar to DHCP. But my tunnels are very predictable, so I pre-assigned IPs to potential WireGuard participants, all on the same subnet. Instead, I'm making new address ranges for WireGuard tunnel endpoints: 192.9.200.112/28 (16 addresses) and 2600:3c01:e000:306::9:0/112. The addresses are assigned according to a pattern, but most likely I will get them into /etc/hosts soon.
Each host gets a key pair and a generic conf file with Jacinth as its peer (server) (except Jacinth itself).
This turned into a long and time-consuming learning experience. I'm condensing a lot of failures and listing the high points:
The Quick Start guide is written for a client using wg-quick to control the interface. To the sample conf file shown under Configuration Files I added an Address line (just on Surya, for now); the value is a comma separated list of the IPv4 and IPv6 addresses to be assigned to Surya's wg0.
Starting up first on Surya: wg-quick up wg0
It prints the commands it is executing.
[#] ip link add wg0 type wireguard [#] wg setconf wg0 /dev/fd/63 # It's feeding wg0.conf minus Address etc. [#] ip -4 address add 192.9.200.118 dev wg0 [#] ip -6 address add 2600:3c01:e000:306::9:8 dev wg0 [#] ip link set mtu 1420 up dev wg0 [#] ip -6 route add 2600:3c01:e000:306::7:0/112 dev wg0 [#] ip -4 route add 192.9.200.176/29 dev wg0
I captured them into a script wg0.up
. The routes to Xena
(the last 2 lines) needed a lower metric than the existing ones via
Jacinth, reached through the OpenVPN segment tunnel.
I did analogous setup at Xena's end.
Now here's a nasty issue which I didn't solve in this step: this all looks fine assuming Xena and Surya have their WireGuard connections running. But suppose Xena is connected somewhere else, like Jacinth where it's supposed to be? How does Surya know to not route via wg0?
Starting the tests: no communication. For several days. Payload packets departed from Xena; bearer packets left Xena and arrived on Surya; payload packets were decrupted on Surya and were emitted from wg0; and they weren't answered. The reason was that the firewall needed to be told that wg0 was a tunnel with security implications similar to being on the local LAN, not a minion of the global hacking community. The payloads were reported by tcpdump, and then hit the iptables rules in the firewall, and were tossed. With that fixed, I was able to ping in both directions between Xena and Surya.
This confirms my reading of the man page for wg
: for each
peer (here Xena), AllowedIPs is a list of subnets, packets from that
peer whose source address is in that range are allowed by Surya's
WireGuard to emerge from wg0, and packets on Surya routed to wg0
because their destination address is in that range will
be internally routed to that peer and not to some other peer connected
at the same time. This is similar to an Iroute in OpenVPN.
This interpretation implies that the IP that the peer is using on its wg0 has to be inside the AllowedIPs address range, and the IPs of the other peers have to be outside. If there's a subnet that the peer expects to use the tunnel, as on Xena, it has to be in AllowedIPs. If other hosts on the local LAN expect to connect to this peer, they need to use an address in AllowedIPs and they need to route the traffic via the server (Surya).
It remains to be seen whether two of Surya's peers connected at
the same time can talk to each other, and where the hairpin
routing
occurs: before or after emission from wg0. Both OpenVPN
and IPSec can do this.
Documentation for a
robotics class at the high school level. The organization is the
FRC 3512 Software Team, based in Orcutt, California, USA, a little
north of Vandenberg Air Force Base and the Diablo Canyon nuclear power
plant. (Author and date are not obvious.) They show the WireGuard
configuration file that the students are supposed to use on their
at-home clients. Under [Interface] the Address (for wg-quick) is
assigned uniquely per student. Under [Peer] the endpoint has an
alphabetic hostname (and numeric port). The AllowedIPs are the address
range of the student clients (I think the key occupant is the gateway
that lets them off that subnet), and the subnet that contains the
servers, VMs, etc. that they're supposed to learn to use. According to
them, when the client reports required key not available
, it
means that you sent down the tunnel a packet to an address that the
peer's AllowedIPs did not include, which the peer reported by an ICMP
packet coded for Destination Host Unreachable
(which is not a
lie).
The key lesson for jime is, at each end, AllowedIPs (describing the peer, the other end) has to include the address(es) on the peer's tunnel device, from which outgoing traffic is sent; otherwise this end will reject traffic from the peer.
I wrote a script to generate conf files and up
scripts on each host.
It follows the design plans for the special features on particular hosts.
This way, issues are not forgotten and chewed-up configurations can be
regenerated at will. All hosts now have their proper keys, configurations
and up
scripts.
Petra to Jacinth: no response.
Claude to Jacinth: Routes: 192/26 dev en0; 128/25 dev wg0;
to Surya, pings to $pfx::8:2 are answered but not to $pfx::8:1
Xena to Claude: IPv6 only. Ditto Surya
Jacinth + Iris to Claude: pings IPv4+6
Can't tell if offsite connections are dnatted to Claude via WG or vnet0.
Holly to Jacinth: pinging claude diamond iris jacinth via main LAN: works
pinging petra xena surya via WireGuard: no answer.
xena->holly trcr -6: ov_u_j.cft.ca.us (1:1), holly (i.e. via WG)
xena->holly trcr -4: ov_u_j.cft.ca.us (129), nothing thereafter.
IPv4 on Jacinth sends this via br0.
Got to implement "if client is using WG, route to it; if not, route via br0".
Method 1: every bearer packet on the WG port of type 1 (content inspection)
is cloned with mirred to some netlink socket.
I have two types of clients: those that always use WireGuard, and those
that sometimes use WireGuard. To deal with routing issues, Xena <->
Jacinth and Jacinth <-> Surya always need the VPN, whereas Selen
(Android) uses it only when roaming (and when access to the local LAN is
wanted). The latter scenario is the natural one for OpenVPN and IPSec, so I've
been focused on that so far, but making it work is going to be hard with
WireGuard, so I've decided to switch over to the always on
paradigm, at
least at first. Xena, Jacinth and Surya are the most important hosts on my
net, and it's not acceptable to knock them out with VPN experiments. Among my
other VM's, Claude (the webserver) is also mission-critical, and Petra is
hosted on Xena and is affected by its networking. So to get this project
moving, I revived a disused VM called Oso, hosted on Iris (a leaf node) with
bridge networking, so it is effectively an independent leaf node.
For the first try I'm going to have, for each client, an individual interface (wg-$PEER) with individual addresses from 192.9.200.96/28 and 2600:3c01:e000:306::10:0/112. Later I'll try doing the tunnels on a shared interface like I originally planned.
For the first try on Oso I set up Oso with AllowedIPs = 192.9.200.106, 2600:3c01:e000:306::10:10 (just Jacinth's WireGuard interface addresses for Oso), and Jacinth had AllowedIPs = 192.9.200.122, 2600:3c01:e000:306::9:10 (Oso's WireGuard interface addresses). Oso's firewall was rejecting bearer packets on 4296/udp. This fixed, I could ping the peer's interface addresses, both families, both directions.
Next try is to add Oso's own addresses to AllowedIPs on Jacinth, and just Xena's subnet on Oso. For reconfiguring I'm going to take down WireGuard on both ends first, rather than trying to run wg-quick with a running configuration, since I'm expecting trouble on this one. Yes, Jacinth and Oso can't ping each other, because Jacinth tries to send the bearer packets to Oso via the tunnel that they're bearing. wg-quick has a limited ability to activate policy routing for the bearer packets, but this configuration is not recognized as needing it.
Next try: Jacinth AllowedIPs = Oso WG addresses + 2600:3c01:e000:306::d4/128 (Oso's own IP); Oso is unchanged with Jacinth's WG addresses + Xena subnet. Jacinth can ping all the Oso AllowedIPs mentioned, So can Oso. Xena and Petra can ping Oso's IPv4+6 WG address, but Xena needs to specify its public IP in the -I option of ping (source address) because that's what's in the AllowedIPs on Oso, vs. the endpoint of Xena's tunnel to Jacinth. For traceroute this would be the -s option.
Next try: a script that implements the Wireguard Evolution item for bearer packets down the tunnel. Trying it first on Oso. It works, but didn't solve my problems.
Here are the key principles that I finally worked out, for making a configuration file that gets the packets through.
All participating hosts need a fixed IP address (IPv4+6) that will go on the WireGuard interface, which the other end has to designate as AllowedIPs, by number. This is the Address parameter in the Interface section. On hosts that always use WireGuard, the host's own name will normally resolve to this number.
All participating hosts need another fixed IP which does not go through the VPN, to which bearer packets will be addressed, or which will be the source address of outgoing bearer packets. A host that often omits the VPN should use this number as the referent of its own hostname. The other end will configure this number as the peer's Endpoint. If a host, e.g. the server, never initiates a connection to this peer, specifying the Endpoint is optional and does not have to be accurate, e.g. if the client is roaming, it's impossible for the server to know in advance the client's IP.
I use port 4296 as the ListenPort and Endpoint port on all hosts, to simplify maintenance of the configuration files, and debugging. This should be changed to the official IANA assignment, if one ever materializes.
In addition, the Interface section needs the PrivateKey of this host as a string (44 bytes including ending padding with one '='). The Peer section needs the peer's PublicKey. The configuration file needs to not be publicly readable, to protect the private key. (I wish the private and public keys could be read out of files with mode 600, obviating the restrictive permissions on the conf file, but though this may have been supported in the past, it's not supported now (wireguard-tools-1.0.20210914).)
The server's Peer section needs AllowedIPs for the peer's WireGuard address(es), by number. Omit the CIDR bits; this is a host route. If the client is routing for a subnet (like Xena which has a VM, Petra), the server needs to allow the subnet also (with CIDR bits). The non-VPN address on the client must not be an AllowedIP because otherwise bearer packets would be sent down the tunnel that they are bearing; this caused me endless grief.
In the client's Peer section I put AllowedIPs for my LAN address range, including other VPNs' endpoints but excluding WireGuard addresses. This actually worked and didn't need policy routing; bearer packets did not go down the tunnel. Many people's use case involves AllowedIPs for the default route 0.0.0.0/0 and ::/0 but that's not what I'm doing.
Using the newly installed NetworkManager plugin for WireGuard. Get Xena back on the net.
What's required by the WireGuard plugin:
wg2jacinth. In theory in the future I might have a connection to some other service.
Let's think about a design that will apply to all hosts. As set up now on Xena only, /etc/NetworkManager/dispatcher.d/ includes my script that starts OpenVPN. All these acripts are run whenever interfaces go up or down and the scripts decide if they need to do something about it. This script would have to change to bring up WireGuard. Let's concentrate on a design for the other hosts, then adapt it to work with NetworkManager.
There will be a systemd service that brings up WireGuard, using different configuration files for leaf nodes, for Xena, and for servers (Jacinth and Surya). Likely the service unit iteelf can be the same for all.
The WireGuard unit will start after network.target, same as OpenVPN and Strongswan.
As part of startup or reload, the configuration file will be
regenerated, if any IP addresses have changed. Jacinth's wild side
IPv4 address is aleatory (DHCP) and changes every few weeks. (Some
ISPs change the IP daily.) The rest never
change.
For leaf nodes (and the others are similar), the configuration will have:
Xena is like a generic leaf node, with these exceptions:
Its peer endpoint is Jacinth's IPv4 wild side, for use when roaming, and Jacinth's local IPv6, which is accessible from the wild side if the firewall permits, which it does. It either doesn't have Surya at all, or it has some route jiggering so Surya is never used as the server.
On Xena, the server peer stanza has AllowedIPs (and corresponding outgoing routes) to send the whole CouchNet address range through the tunnel, whether or not it is roaming. The server peer endpoints are outside the CouchNet address range, so policy routing is not needed to keep bearer packets out of the tunnel.
IPv6 traffic (payloads) from the wild side to Xena's WireGuard address or to its internal subnet (which is the one normally used) is routed via Linode to Surya and from there through the segment tunnel to Jacinth which routes it via the tunnel to Xena. IPv4 payloads (only from the local LAN) go direct to Jacinth which routes them to Xena. Bearer packets are addressed from and to Xena's non-WireGuard address (or if roaming, from the address assigned by the foreign carrier), and from the wild side they similarly go via Jacinth. The firewalls on all involved hosts pass VPN ports including WireGuard.
It's very rare for Xena or Selen roaming on the wild side to be able to use IPv6.
The two servers are configured like leaf nodes, except that they have all the leaf nodes, plus each other, as peers.
Ports open to the wild side:
normalports.
WireGuard is recognized in /etc/firewallJ.d/wild-acc-C5.
What files and scripts do we need to make this all work?
Most of these will be in /etc/wireguard/.
The WireGuard service unit itself, on all hosts, in /etc/systemd/system/.
A table of Peer stanzas, to be selectively included in the configuration file, wg0.conf . It will be hosted on Jacinth. Retrieval won't be fancy; it will reside in /home/httpd/htdocs and all hosts will retrieve it with curl.
The program that creates that table, on Jacinth. It will monitor the Netlink stream and is immediately aware when Jacinth's wild IPv4 address changes. It will send a magic packet to all clients. It doesn't send the content itself, which would be a security problem, whereas spurious magic packets wuuld cause trivial extra work on the client but could not introduce a tampered peer table.
The socket activated service on the client which imports the peer table, generates the new conf file, and reloads WireGuard if actually changed. It should also be run at boot time (before WireGuard starts) and on a timer, in case the magic packet gets lost.
WireGuard has a big problem if the client sometimes has WireGuard running, and sometimes expects to be contacted on the local LAN. Think of the backup host collecting changed files from the client. Basically, the sysop needs to direct the backup collector to the client's WireGuard or non-WireGuard address, depending on whether or not it's off-site and using the VPN.
This is the tunnel from Jacinth to Surya, currently operated by OpenVPN. Jacinth originates it and Surya acts as a listening server or responder. But if bearer packets go down the tunnel this is a chicken and egg issue and you end up with an omelet. So OpenVPN has an anti-iroute so any packet on the initiator's end addressed to the interface used by the OpenVPN listener will go out by normal routing, not the tunnel, and Surya's firewall will reject it. The anti-iroute is not restricted to OpenVPN's port number; all ports are blocked, such as traffic to the webserver listening on Surya's wild side (www.jfcarter.net).
The WireGuard deployment campaign in 2021 got preempted, but an issue has arisen which returns WireGuard to the front burner. Specifically, my family is moving to Washington state to be closer to our son, and the master site Jacinth is going to be in a packing box for an unknown time. But Surya, our cloud server, won't be in any packing box. Therefore I'm going to transplant a lot of the server software, specifically our public webserver and site, to Surya. But before and after the move when the local LAN is functioning, LAN hosts still need to get to the webserver and other services on Surya. (Yes, there are slave servers too.) I was solving this by creating an ipip or Geneve tunnel within the segment tunnel, but adding kludge upon kludge is not the way to go; the right solution is to finish the WireGuard deployment and to divert bearer packets off the segment tunnel and onto the default route.
Further design and details are below.
The design reqirements are:
Every CouchNet host must be able to initiate a WireGuard tunnel
(bidirectional) to either of two listening responders (servers):
Jacinth or Surya. If possible, future any to any
connections
should not be blocked in the design.
Some clients with a WireGuard tunnel must be able to initiate through it a service connection (http, ssh, dns, etc.) to any LAN host, and LAN hosts must be able to initiate in the other direction. There is a coincidental correlation between roaming, the need for both outward and inward connectivity, and needing the tunnel to be up all the time.
Other clients only occasionally need the tunnel. They will initiate service connections, but incoming connections will be rare, and it's not a showstopper if I can't make them work.
There should be automation and app support so the client's tunnel turns on automatically, or if it's not supposed to be on all the time, so turning it on is simple for the user.
After a rather aggressive struggle to get WireGuard working, I learned these points:
I will need only one WireGuard interface per host, conventionally called wg0. It knows which outgoing packets should be sent to which peer. Multiple interfaces may be useful in complicated designs, but it looks to me that one interface will almost always be enough.
In the past I have thought of giving the host one IP to which peers send bearer packets, and another IP to which remote hosts (including the peer) send packets through the tunnel. A big mess ensues if you send bearer packets through the tunnel that they are bearing. However it's a lot simpler and more comprehensible if the same IP is used for bearer packets and for payload services like a webserver. In this case, policy routing (described further below) can keep out the bearer packets.
Each host with WireGuard needs a configuration file with a Peer
stanza for each other host that it can communicate with. The peer's
public key is required; this is the authentication and authorization
token that lets in the peer, and also the foundation of symmetric
encryption on the tunnel. The Endpoint (IP and port) is optional
for the responder, which sends reply packets to the IP and port that
the peer most recently used. But the Endpoint is required for the
initiator, which otherwise would have no idea what IP and port to send
the initial packet to.
Often the initiator and responder will have fixed IPs, e.g. when connecting multiple sites of one enterprise. But if the initiator is roaming, e.g. a cellphone, its IP address and default route will change whenever the infrastructure feels like it, particularly when changing cellular protocols like 5G, LTE, UMTS or EDGE. This is one of my major use cases.
The default route resides in the Main policy routing table. When it changes, the WireGuard infrastructure is not going to be able to detect the change and copy it into other policy routing tables.
If the responder expects to originate payload connections (e.g. http or ssh) to the initiator, then a roaming initiator should configure keepalive packets to announce to the responder which IP it has roamed to. This is PersistentKeepalive in the Peer stanza. A value of 25 (seconds) is generally recommended to deal with commercial NAT routers, and 25sec seems reasonable for the roaming case also.
Each Peer stanza needs one or more AllowedIPs assignments. The
value of each one is a comma (and whitespace) separated list of IP
subnets with CIDR bits, e.g. 192.168.3.0/24, fd18::/11. To get a
packet sent out, local routes need to direct these address ranges to
the WireGuard interface. The wg-quick
script picks the ranges
out of the configuration file and creates routes for you. Incoming and
outgoing packets are checked for being in the AllowedIPs ranges; if
not, they are dropped silently, preferably before being sent to the
other end. It's a best practice, and saves CPU work, for only allowed
packets to get routed to the WireGuard interface.
Inside the WireGuard driver, the AllowedIPs are sorted similar to a normal routing table: the longest ones (number of CIDR bits) are tried first, and the first match wins; the packet goes to the peer which provided that subnet. If identical AllowedIPs are in multiple Peer stanzas, it's an arbitrary choice which peer the packet goes to, so don't do that.
OpenVPN has a very similar concept called an iroute.
Almost invariably the peer's own IP is in one of the AllowedIPs ranges. And the local host is going to be sending bearer packets to the peer, which have a route to oblivion down the tunnel that they are bearing. Policy routing will rescue them. The scheme I worked out goes like this:
ip -4 route add 192.168.3.178/32 dev wg0 table 214
ip -4 rule add dport 4296 table main priority 11
ip -4 rule add to all table 214 priority 12
ip -6 rule add to all table 214 priority 13
In the stock WireGuard implementation, if the responder is ever able to respond to an initiator's connection, and if it is ever going to initate a payload connection (ssh, http, etc) to that initiator, that connection will go over WireGuard, whether or not the initiator has WireGuard running at that moment. In other words, my scenario where some LAN hosts mostly don't encrypt is not going to work: it's going to be always WireGuard, or never WireGuard.
IPSec has an authentication, authorization and key establishment daemon called Charon on UDP port 500 and 4500. It sends Traffic Selectors to the kernel that specify which endpoints' traffic should be encrypted. This kind of building block fits perfectly with WireGuard's method of operation, and I should be able to write one that's as simple as the rest of WireGuard. Following the underworld theme, I'll call it Anubis. Basic design points:
However, I need WireGuard to be working soon, well before I could commit to finishinng Anubis. Therefore the all or none design will be deployed promptly: the responders will only be configured with the peers that will always use WireGuard to talk to them.
Translating these into implementation issues:
The real
hosts have addresses on the local LAN in
192.9.200.192/28 and 2600:3c01:e000:306::/112. Excluded hosts which
are still on the local LAN, e.g. appliances like the VoIP ATA, are in
192.9.200.208/28. The whole local LAN has 31 members and /27 bits.
wgtunnel@jacinth.service (or Surya) starts WireGuard in initiator mode, connecting to the selected server. It will create an interface called wg0, and will put this host's address on it, with noprefixroute. Another interface such as en0 or br0 also has the same address, with a prefix route. It will make routes to send all LAN traffic down wg0 but traffic to the wild side will still go via the default route. Except, there will be a policy routing rule and table to send bearer packets via the default route even in the common case that the destination is on the local LAN.
We'll need an option to send all traffic including wild side (except bearer packets) via the tunnel, but this will be rare. Once I was on vacation and wanted to buy an e-book, but my preferred vendor had rights for selling in the United States but not for the country I was in. So I set my default route via CouchNet: problem solved.
This traffic will be routed on Jacinth's end (client) of the segment tunnel: all IPv6, except not if originating from Jacinth's wild IPv6 interface; and Surya's wild and local IPv4 addresses.
wglisten.service starts WireGuard in listening (responder) mode on Jacinth and Surya. Surya's local name resolves to the address on this interface, but Jacinth (and potentially other servers) will have a separate address.
What's already written: this is all in /etc/wireguard and I'm describing what's currently on Jacinth.
tryexecuting a command line; initialize the WireGuard link; configure routes through the tunnel.
Fly in ointment; I'm testing the WireGuard configuration on one
responder and one initiator. The responder starts first. The
initiator starts 2 seconds later. (Both have Peer stanzas with
Endpoints.) The responder sends a bearer packet, probably key
establishment. The initiator responds ICMP UDP port 4296
unreachable
. The responder never sends another bearer packet,
despite payload packets coming in to wg0, and the initiator never
sends any bearer packet.
Investigation #1: Why didn't the Initiator get any payload packets routed to wg0? Because the AllowedIPs is to Jacinth's wild side and all the payloads go to Jacinth's LAN address.
Fix #1: Force the AllowedIPs to the LAN address. Program couldn't handle a FQDN for a peer. Now it can handle the FQDN. Functional test passed (curl to webserver, both directions).
The working endpoints are:
The StrongS/WAN IPSec suite includes a daemon called Charon, formerly
Pluto. The initiator starts a VPN-type connection by signalling their own
Charon to establish a Security Association with the peer's Charon
(authentication credential required) and to send to the peer connection
parameters like which address ranges should go over the tunnel. In OpenVPN the
connection setup module isn't a separate daemon but it performs similar
functions, including selecting affected traffic (its equivalent of AllowedIPs
is called an iroute
).
WireGuard needs a similar gatekeeper which, following the underworld theme, I'm calling Anubis. Its functions are just about one to one equivalent to Charon's, but WireGuard has advantages in simplicity. Here are the basic design points:
Any plan involving modifications to WireGuard's kernel driver is not going to fly.
I want to get away from the paradigm that when peers are authorized, WireGuard always has them configured in the kernel and therefore always communicates with those peers over WireGuard. On my net two of the pairs are in this mode, but most of the associations should be available when needed, and should not be another opportunity for bugs to kill the net, if permanently activated despite not being needed.
Therefore the part-time initiator communicates with the responder's Anubis, authenticating, and both of them add or delete the peer's public key, endpoint and AllowedIPs on the kernel's list, plus routes, activating or turning off WireGuard.
Just like StrongS/WAN's Charon, WireGuard has two collections of Security Associations: generic VPN communication occurs only when activated, whereas (not like StronS/WAN) Anubis has a separate WireGuard interface with a permanently configured connecton to every authorized peer. That won't scale to hundreds of thousands of peers, but it's something I can implement promptly for my much smaller net.
Or will it scale? How big is the per-peer data structure? Off the top of my head, assuming IPv6, I can think of: peer's public key (32 bytes), IP address (16), port (2), just one AllowedIP with CIDR bits (17), probably not our ephemeral private or public key (0), and the Diffie-Hellman shared secret (possibly further hashed) as the symmetric key (32), and 2 chain links (2x8), for a total structure size of 115 bytes, call it 128. For a million peers we're talking about 1.28e8 bytes. The smallest Raspberry Pi you can buy has 2.14e9 bytes RAM. So ignoring the minor detail of network bandwidth, you actually could serve a million VPN service users on your Raspberry Pi.
Anubis has a fixed port for bearer packets, different from the production WireGuard, and the firewall needs to allow packets from any wild-side or internal IP to both these ports. Policy routes at both ends limit communication to only Anubis' service port; thus the availability of perpetual WireGuard service to this one port has no effect on generic WireGuard service, or the lack thereof when not wanted.
With this infrastructure handling authentication, authorization and security, all Anubis needs to do is this:
The initiator sends a packet saying up
or down
, and
some public key. To have keys, and therefore to send this
packet, the initiator must know its own private key, so the
responder is assured that the packet comes from some authorized
peer, and there will be no problem when WireGuard cheaply drops random
exploit attempts.
But the packet's IP address does not uniquely identify which connection should be brought up or down, because NAT, or network infrastructure for containers or virtual machines, could cause the same service IP and even port to be used for multiple WireGuard instances.
The initiator creates a nonce (meaningless random bits) and hashes it; the hash is included in the packet. The initiator encrypts the nonce with a symmetric key which is the initiator's private key times the responder's public key (its private key times the G factor), which the initiator already knows, and the result goes into the packet.
The responder decrypts the nonce with the public key in the packet (hoped to be the initiator's private key times the G factor) times the responder's private key: the Diffie-Hellman shared secret, which is the same at both ends because multiplication is commutative in modular rings, incuding the field of size 2255-19. The responder hashes the decrypted nonce and compares with the hash in the packet. If they are equal, the responder knows that the packet came from the specific authorized peer using the public key in the packet.
The responder then can identify the production connection and bring it up or down. Mission accomplished.
A nice addition to the protocol would be a response with a success or error message.
An alert reader will have noticed that a Black Hat can steal the private key by dividing the public key by G. In the ring of integers we have an efficient algorithm to do that, long division, but not so in modular rings: the effort to do the division is similar to doing test decryptions using for the symmetric key each ring member from 2 to 2128, the square root of the size of the modular ring. This effort level is considered sufficient to protect Top Secret data.
These are the categories of hosts:
For each host, its $host.conf file has Peer stanzas that list AllowedIPs which the WireGuard kernel module allows to emerge from wg0, the WireGuard interface, having originated from that peer or more distant hosts using the peer as a gateway. The startup script turns all the AllowedIPs into routes, so when the host sends traffic to the AllowedIPs via the peer, traffic goes via WireGuard. Except, a policy routing rule diverts bearer packets, recognized by their destination port, to leave via one of the host's non-WireGuard interfaces.
Jacinth (main router) and Surya (cloud server) have a server role: each one's $host.conf has a Peer stanza for each other host including each other. In the stanza, the peer's Endpoint is the one used for bearer packets: the one on the local LAN in most cases. Its AllowedIPs in most cases are just the peer's WireGuard address, the one on wg0 (the WireGuard interface).
Non-servers have a peer stanza to Jacinth, and most of them have AllowedIPs for a split default route via Jacinth. Whether they have corresponding routes is a more complicated story. Their hostnames resolve to their non-WireGuard interface addresses. Jacinth is going to be out of action for an unknown time, probably weeks long, and during that period the remaining functioning clients will switch to Surya for access to the local LAN, whose only member is Surya (providing directory services). Their default route will go through their wild side interface, not Surya.
Xena has an internal subnet with a VM, Petra, on it, as well as its own WireGuard address, and both servers include the whole subnet in Xena's AllowedIPs.
In surya.conf, the Peer stanza for Jacinth has AllowedIPs for the whole local LAN, the competitor VPN endpoint ranges, and Jacinth's wild-side IPv4+6 addresses.
Surya and Xena's names resolve to their WireGuard addresses. Jacinth's name resolves to its local LAN address. It has separate names for its wild-side interfaces and WireGuard.
Jacinth's ISP isn't capable (in 2024) of native IPv6, and Jacinth uses Surya as an IPv6 gateway. In jacinth.conf, the Peer stanza for Surya has AllowedIPs for all IPv6 addresses in a split default route: ::/1, 8000::/1. Firewall rules have a set of authorized ports on each host, specifically Jacinth, and we aren't relying only on AllowedIPs to keep out the global hacking community.
Xena (the laptop) roams, but most of the time it's at home on the local LAN. To make the internal subnet function, it alwyas uses the VPN, with Jacinth as its (only) peer. It has AllowedIPs and routes to accept the whole CouchNet address range over WireGuard, and its non-WireGuard interface has a split default route through Jacinth.
Selen is a cellphone running LineageOS (Android). It has three modes of operation: roaming with no VPN (it's cut off from CouchNet), roaming with VPN (routing is like Xena), and at home without the VPN (full service because it's on the local LAN). Its configuration when using the VPN is pretty much the same as Xena.
Jacinth's role on OpenVPN and IPSec is as a generic server: potentially a variety of clients could connect at the same time, authenticating with an X.509 certificate with an acceptable trust chain. This isn't going to fly with WireGuard, since the server has to know the client's public key before it can accept a connection from the client.
Brainwave:
mirred egress mirror dev $IFB, the latter being an Intermediate Functional Block (synthetic interface) which the daemon can listen to. See man tc-mirred for documentation.
WireGuard needs the equivalent of OpenVPN's explicit-exit-notify. When the
kernel module detects that a connection is going down (e.g. ip link del dev
wg0
) it should notify the peer. The rekey timeout seems to be short, under
1 minute, but the rekey attempt only occurs if the non-dead peer sends a
packet, and it's not clear how much state it's keeping for the dead peer and
how significant that is. It just seems neater to notify the surviving peer if
you're closing the connection.
Cryptographic algorithms can't be relied on to last forever, although Rijndael (AES) has lasted with only minimally effective attacks up to 2021 since 1997 (inception, or 2001, anointment in FIPS pub. 197), and ChaCha20 has been widely deployed from 2008 to 2021. It would be a very smart move to add algo negotiation, with the needed info in the dummy payload in the initial handshake packet.
In this scenario you have a chicken and egg situation that results in an omelet. wg-quick already recognizes when the default route is sent through the tunnel and puts in a policy route to divert bearer packets to their original (presumably default) route. But a more limited omelet route is not recognized, nor is the case where such a policy route has already been set up.
The very first step for wg-quick should be to do ip route get
$EndpointIP
, with the IP it's actually going to use (IPv4 or 6), This
route should lead to the peer's non-tunnel address. When wg-quick finishes
setting up routes, including running PostUp and PreDown scripts that might set
routes, it should again do ip route get $EndpointIP
, and if the route
goes through the WireGuard interface, it should do the policy routing thing
that diverts bearer packets via the route that it initially discovered.
As much as possible of this route should be preserved, specifically the
metric and the source address, if available.
On a server with multiple peers you may need an individual diversion route for some or all of the peers.
I'm looking carefully again at the
network design on my net. I think I need to refactor routes to/via the VPNs
(with WireGuard added). In the table below, leaves
means all the
hosts not explicitly mentioned. $pfx
represents the first three octets
of the IPv4 address range. See below for Xena's default route, indicated by *.
There are analogous addresses and routes for IPv6.
Host | VPN or Route | Presently | Change To |
---|---|---|---|
— Address Ranges — | |||
Vacant | $pfx.0/25 | $pfx.0/26+64/27 | |
Jacinth | OpenVPN 1194/udp | $pfx.128/29 | $pfx.96/29 |
Jacinth | OpenVPN 443/tdp | $pfx.144/29 | $pfx.104/29 |
Jacinth | IPSec | $pfx.160/29 | $pfx.112/29 |
Jacinth | WireGuard | (none) | $pfx.120/29 |
Surya | OpenVPN 1194/udp | $pfx.136/29 | $pfx.128/29 |
Surya | OpenVPN 443/tdp | $pfx.152/29 | $pfx.136/29 |
Surya | IPSec | $pfx.168/29 | $pfx.144/29 |
Surya | WireGuard | (none) | $pfx.152/29 |
Surya | Segment tunnel | $pfx.184/29 | $pfx.160/29 |
Xena | Xena+Petra subnet | $pfx.176/29 | $pfx.168/29 |
Vacant | (none) | $pfx.176/28 | |
Leaves | Main LAN | $pfx.192/26 | $pfx.192/26 (same) |
DHCP | In main LAN | $pfx.240..254 | No change |
— Routes — | |||
Leaves | Default route | Jacinth $pfx.193 | (Same) |
Jacinth | Default route IPv4 | Its wild side (en1) | (Same) |
Jacinth | Default route IPv6 | Surya $ofx.185 | Surya $ofx.161 |
Surya | Default route both | Its wild side (en0) | (Same) |
Xena | Default route | Jacinth $ofx.193* | (Same) |
Petra | Default route | Xena $pfx.177 | Xena $pfx.169 |
Jacinth | Main LAN | dev br0 | (Same) |
Jacinth | Jacinth OV 1194/udp | dev tun0 | (Same) |
Jacinth | Jacinth OV 443/tcp | dev tun1 | (Same) |
Jacinth | Jacinth IPSec | Already on Jacinth | (Same) |
Jacinth | Jacinth WireGuard | (none) | dev wg0 |
Jacinth | Surya VPNs+subnets | (Combined) | dev tun9/wg9 to Surya |
Jacinth | Surya OV 1194/udp | Surya $pfx.185 | (Combined) |
Jacinth | Surya OV 443/tcp | Surya $pfx.185 | (Combined) |
Jacinth | Surya IPSec | Surya $pfx.185 | (Combined) |
Jacinth | Surya (segment tnl) | dev tun9 (to surya) | (Combined) |
Jacinth | Xena + Petra | VPN(Xena) $pfx.130 | VPN(Xena) $pfx.106 |
Surya | Jacinth VPNs+subnets | (Combined) | dev tun9/wg9 to Jacinth |
Surya | Jacinth OV 1194/udp | Jacinth $pfx.186 | (Combined) |
Surya | Jacinth OV 443/tcp | Jacinth $pfx.186 | (Combined) |
Surya | Jacinth (segment tnl) | dev tun9 to Jacinth | (Combined) |
Surya | Jacinth IPSec | Jacinth $pfx.186 | (Combined) |
Surya | Surya OV 1194/udp | dev tun0 | (Same) |
Surya | Surya OV 443/tcp | dev tun1 | (Same) |
Surya | Surya IPSec | Already on Surya | (Same) |
Surya | Xena + Petra | Jacinth $pfx.186 | (Combined) |
Surya | Main LAN | Jacinth $pfx.186 | (Combined) |
Xena | (finish this) |
WireGuard is not IPSec (StrongS/WAN) or OpenVPN; it has no key agreement
mechanisms analogous to StrongS/WAN's Charon. The correct
way to handle
WireGuard is pure pre-shared keys. The server always loads all the authorized
keys, and each client has its own key and the server's key. For true point to
point links, the server
would have only one authorized peer.
So let's set this up and be done with it!