I have several issues I had to deal with involving the wireless network.
The first presenting symptom is that upon boot the laptop never gets an IPv6 address or default route. Strangely, this doesn't happen on the machines with wired Ethernet. The per machine firewall tosses packets in a weird conntrack state (invalid or untracked); it positively accepts only new, related and established. Apparently ICMP router advertisements and multicast listener queries were getting tossed. I speculate that incoming packets from 802.11 wireless are treated differently than from 802.3 wired Ethernet.
I put in a special case to accept packets (from anywhere) to the All Nodes multicast address ff02::1, and to the host specific multicast address, ff02:0:0:0:0:1:ff00::/104, which is to be followed by the last 24 bits of the host's IPv6 unicast address. (So the intended host will accept the packet, but very few others will be bothered, avoiding one of the bad features of ARP over IPv4 broadcast.) Now the laptop gets its RFC 2464 autoconfigured address and default route.
The second major symptom is that the wireless NIC will not join the bridge device.
This laptop normally connects to the outside world via 802.11 WiFi (wireless networking in the 2.4GHz ISM band). 802.3 wired Ethernet is used when the maximum reliability is required or when higher speed is needed, but this is rare. On the old Xena with kernel 2.6.37, I had a network bridge whose members were the wired Ethernet (eth0), wireless (wlan0), and the virtual interface (virt0) of any virtual machine I was running. The virtual machine was directly on the host network and managed its IP address independent of Xena. This worked out well. However, with the current kernel 3.11.10 the bridge driver refuses to accept the wireless NIC, or the wireless driver refuses to be added to the bridge.
brctl addbr br0
brctl addif br0 wlan0
It says: can't add wlan0 to bridge br0: Operation not supported
I did strace; it opens a PF_LOCAL SOCK_STREAM socket, then does
ioctl(3, SIOCBRADDIF, 0x7fffa9427800) = -1 EOPNOTSUPP
ioctl(3, SIOCDEVPRIVATE, 0x7fffa9427800) = -1 EOPNOTSUPP
This error is thrown in /usr/src/linux/./net/bridge/br_ioctl.c . The SIOCDEVPRIVATE thing is deprecated and feeds into the same code for adding an interface. br_if_add() is defined in br_if.c and tests for dev->priv_flags & IFF_DONT_BRIDGE; the comment says "No bridging devices that dislike that (e.g. wireless)". So the problem is to be found in the wireless driver.
Looking in /usr/src/linux/net/wireless , I'm pretty sure it's building cfg80211.ko. In the function core.c cfg80211_netdev_notifier_call(), when the device is registered it tests wdev->iftype (mode of operation) for NL80211_IFTYPE_STATION, P2P_CLIENT, or ADHOC, and not wdev->use_4addr. In this case it turns on IFF_DONT_BRIDGE. In util.c cfg80211_change_iface() it turns on IFF_DONT_BRIDGE for the same types, but not for P2P_GO, AP, AP_VLAN, WDS or MESH_POINT. P2P_GO means the Group Owner in a WiFi Direct (peer to peer) network. We definitely aren't going to be in any of these modes. Conclusion: it isn't going to let the NIC join the bridge without some hacking that actually breaks a clearly intentional design decision.
I've seen one explanation, not too clear, that the 802.11 spec states that the client should only send to the AP's MAC address, not to the MAC address of some object that it has not personally authenticated to. Of course this makes bridging impossible.
This calls for a tunnel, which can be bridged. But from where and to where? I was not able to figure that out; see below for the progress I made, such as it is.
Instead I sabotaged the places where cfg80211.ko turns on IFF_DONT_BRIDGE. This was effective and the wireless NIC is now able to join the bridge.
Very annoying: about every 10 minutes the wireless disconnects and takes about 30 seconds to re-establish the connection.
In syslog the kernel reports:
wlan0: deauthenticated from
90:f6:52:ea:18:da (Reason: 2). Here is a link to a
list of reason codes: Reason 2 means
Previous auth no longer valid.
Searches on Google reveal that a number of people report this symptom of
wireless disconnection at various regular repeating intervals. Interventions
on the client always fail, except for (only) one person who avoided failures
while a media playback node originated traffic to his wireless box
that had media storage. In the successful fixes, the finger of blame points
to the access point.
I am running hostapd on my gateway. Simultaneous with the disconnects, other stations finish a group re-key (without needing full reauthentication). So I put this Band-Aid on the problem, successfully. In /etc/hostapd.conf I added:
The group key is the one used for broadcast and multicast packets sent to the whole group (BSS). The first parameter is the interval in seconds between group rekey events; the default is 600 secs, which is my former disconnect interval. The second one, if set to 1 (the default), tells it to re-key whenever a station leaves the BSS, whereupon it could reveal the group key to hostile hackers. (Of course, a station with jiggered firmware could also reveal the group key before leaving the BSS.) These changes weaken security, but I use encrypted protocols on the wire, and an aggressive firewall on each host, so I judge that in my situation, when balancing security versus useability, security should not win this one.
All the Android, Windows and IOS (Apple) wireless stations manage to replace the group key without doing a full reauth, and I think the problem probably resides in NetworkManager, or possibly in the wireless driver stack. In any event, I'm calling it a bug.
I had the idea to put one end of a tunnel in the bridge, and to somehow connect the other end to or through the wireless NIC. But I was unable to figure out any working combination. The following discussion is kept for historical interest, and in case in the future I can make progress on it.
Let's be clear on the goals for the tunnel:
Xena has to operate on foreign nets with no cooperation from the foreign infrastructure.
In the foreign case it will use ipsec or OpenVPN to make a tunnel to CouchNet.
The local or foreign net will go up and down when the machine wakes and sleeps; configuration must not be lost in that case.
It is preferred if Xena is in the same address range as the rest of the CouchNet machines. But as a second choice, it is acceptable to assign a separate range for the tunnel endpoints.
Now let's try to design the networking.
The wireless NIC is [going to be] outside the bridge and uses whatever address the environment gives it. On a foreign net this likely will not include IPv6. It likely will have a default route.
The bridge has Xena's CouchNet fixed address, IPv4 and IPv6. Its members are eth0, any virtual machines' NICs, and tun0 (the tunnel).
The tunnel is to Jacinth (the gateway). Command line to set up a GRE tunnel:
ip tunnel add cn0 mode gre remote jacinth.jfcarter.net local xena.cft.ca.us dev wlan0
The problem with explicit kernel-level tunneling is that the other end has to create a tunnel for reply packets. This means it needs to know an IPv4 address to send them to, which is not forthcoming in the likely case that the laptop is on a NATted foreign network.
IPSec can overcome NAT, and can handle IPv6, but many foreign networks are known to refuse to pass IPSec protocols.
OpenVPN can be (and is, on CouchNet) configured to use port 443 which hotel networks cannot refuse to transport. Its negative point is that it cannot do IPv6.
OpenSSH can do packet forwarding, but again, a fair number of hotel networks refuse to pass port 22.
Can we cheat? Create a tap device. Add to the bridge. A daemon reads/writes raw packets on wlan0 and copies them to/from tap0. Will this work? See Net::RawIP and/or Net::Write. Net::Pcap or Net::PcapUtils is for capturing (reading) raw packets. Probably better to write actual code rather than messing with Perl.
Is it possible to create a tunnel to ourself? The local tunnel endpoint can then be in the bridge. The idea is that the remote endpoint is going to be routed through the wireless NIC. But making that happen sounds kind of weird and likely impossible.
(As of 2014-04-20 the bridge network is turning into a time sink and I'm going to defer it. Actually the best strategy is going to be to sabotage cfg80211.ko to not turn on IFF_DONT_BRIDGE. )