I am installing VoIP service (Voice over Internet Protocol), and I want to follow best practices in my house router to get the best call quality and reliability. An important factor is QoS (Quality of Service), referring to the Type of Service byte in the IP header (see RFC 2474), and I need to renovate my network traffic control structures to take RFC 2474 into account. So what exactly should I do for a network traffic control strategy?
For VoIP the client signals to its server or peer who it wants to talk to, and other connection parameters like picking a codec, using SIP (Session Initiation Protocol). Then each end digitizes the voice signal, i.e. what the speaker is saying, and sends it to the other end, generally using RTP (Realtime Transport Protocol). Both of these are normally handled as UDP (Unreliable Datagram Protocol) with a timeout-based retransmission fallback for SIP and no retransmission for lost RTP packets. It is particularly important that SIP packets not be lost, and that RTP packets both are not lost and are sent promptly.
Latency
is the jargon for how long it takes for a packet to get from
the sender to the recipient, although the round trip time is sometimes
incorrectly referred to as the latency. Per ITU-T G.114, a latency of 150msec
or a round trip time of 250msec is about the maximum before users complain.
Jitter
refers to how much the latency varies, and a uniform latency (low
jitter) is also important.
IPv4 is specified in RFC 791 and IPv6 in RFC 2460, both as amended. Each of them provides a byte in the IP header, historically called ToS (Type of Service). Its current interpretation comes from RFC 2474, plus RFC 3168 for ECN (Explicit Congestion Notification). The 64 code points described in RFC 2474 are referred to as Differentiated Services. Each cues the router to apply a different PHB (Per Hop Behavior) to a packet with that DiffServ code point. RFC 2474 does not define what the PHBs should be; in fact the RFC assumes that different enterprise networks may have different implemented PHBs and different code points representing them. But RFC 2474 says that PHBs should be numbered so higher numbers imply a higher probability that the packet will timely go out. (Dropped packets are not timely.)
However, RFC 2598 recommends (not requires) a code point 0x2e (0xb8 shifted) that requests Expedited Forwarding, i.e. the packet should go out quickly and reliably as wanted for RTP. RFC 2597 recommends a set of code points for Assured Delivery (try not to drop), of which 0x1a (0x68 shifted) is the one that seems most appropriate for SIP. A code point of 0 is the default, and such packets get no special treatment. RFC 1349 (deprecated) was the previous incarnation of ToS, and some applications may use its code points, which have limited backward compatibility with RFC 2474.
The DiffServ scheme gives a different QoS (Quality of Service) to differently marked packets, and the whole scheme is usually referred to as just QoS.
Let's step aside for some unofficial terminology. A flow
refers to
a sequence of packets (one way) between particular host/port pairs: for example
a file download or upload, or a VoIP call (one flow in each direction). A flow
that can saturate the network, i.e. which always has a packet needing to go
through, is called a hog
. Typically downloading a file or an
advertisement on the web is a hog. A frog
sends packets at a modest
rate limited by the data source or sink, and almost never has a packet waiting,
but usually has an operational requirement that when there is a packet it
should go out right away. VoIP is the archetypical frog; also interactive
terminal sessions and network games are frogs. Streaming audio or video is
also in the frog category in that the effect is spoiled if a saturated network
link limits packets to less than real time, but their data rates tend to be
higher.
TCP (Transmission Control Protocol) regulates the rate of data flow by a sliding window. When a packet arrives, the recipient sends back an ACK (acknowledgement) packet announcing how much of the data stream has been received error-free, and how much more the recipient can accept. When the client process, e.g. a web browser, reads some of the incoming payload, that many more bytes can be added to the window, while when a packet comes in, its size is subtracted from the window. The client's operating system adjusts the window size for its convenience in disposing of the packets, and it automatically adjusts for how many packets are flying through the Internet, sent but not yet received. Packets can possibly arrive out of order, but if the sender sees in the ACKs that the recipient is not making progress, it will retransmit the missing packet(s). Then the recipient knows that the net is congested, and it shrinks the window to reduce the flow. It is common practice for routers to drop packets if the data link's maximum rate (or the SLA (Service Level Agreement)) is exceeded. The ECN bits (Explicit Congestion Notification, RFC 3168) give the same effect without wasting a packet that the net has already done some work on.
A TCP connection is bidirectional, and the packets in one direction contain ACKs for the other direction. But almost invariably, one direction has a much higher data rate. For example in HTTP (web), the desired URL is sent in the forward direction and the much larger text or image comes back. The client needs to continue sending ACKs until the whole text or image is received, after which it will send more data in the forward direction requesting another component of the web page. The incoming web page is a hog flow while the forward stream is froggy, and if the webserver is going to fill the available bandwidth it needs to promptly see the ACKs from the client, so it knows it can send more payload.
In the shadowy past, 2009 or 2010, my son was playing a networked game and I started downloading the new version of my Linux distro. He justifiably complained, but this motivated me to learn about and implement traffic control. A major issue is that routers between the source and sink, not under my administrative control, often have big buffers, and if there is a hog flow it will fill the buffer, so frog packets have to percolate through the buffer. In the game vs. distro situation the measured round trip time (2x latency) went up to over 2 seconds while it was normally more like 10msec (to a geographically nearby server). With traffic control, in the simultaneous hog and frog situation the frog's round trip time returned to the normal value of 10 to 20 msec despite the hog flow running flat out.
When preparing for this upgrade I did an empirical test to see which QoS
code points were in actual use, and particularly, what our partners send us.
Actually RFC 2474 allows intermediate routers to translate
QoS code
points, and in some cases this means setting QoS to zero. These services
always had a QoS code point of zero: incoming SMTP (mail), incoming and
outgoing HTTP and HTTPS (web), outgoing FTP, DNS, IPv6 tunnel, SSH-slogin.
These outgoing services had nonzero QoS but in all cases the replies had
a QoS of zero: SIP (0x68), RTP (0xb8), scp (0x10, should be 0).
Conclusion: None of our remote partners was ever seen to use QoS. Although attention to QoS on our router is a best practice, it will have almost no effect on service quality as seen my me and my users. My ATA (Analog Telephone Adapter) for VoIP is setting the normal QoS. SSH-slogin-scp needs to be configured to set the right QoS code points.
I have positive confirmation that my ATA turned on QoS in outgoing packets, but QoS had been cleared to zero when it arrived at the server. (This is legal according to RFC 2474.) I would be surprised if the server left out QoS, but I don't have positive confirmation that the bits are being cleared in the incoming direction. In any case, QoS is useless for picking the PHB to use on a packet.
Then I did a functional test with one hog, testing both directions, and several frog services, using several traffic control structures. The frogs were also tested without the hog. The hog task was for either the remote site or the router machine to download a giant file using http (wget). The frog tests were:
Ping the remote partner, in the same city, to estimate latency (actually round trip time) and jitter. Except for one test the average round trip time ranged from 11.4ms to 12.0ms, and the jitter ranged from 7.7ms to 9.6ms. These are well within normal recommendations for VoIP.
VoIP subjective voice quality in both directions. In all cases it
was good
and there was no audible sign of dropped or delayed
packets.
SSH-slogin session responsiveness. There was no sign of delayed packets.
Refresh a web page that has a fair amount of CSS and images. Same time, 2 secs, in all conditions.
The only condition noticeably different from the others was sending out the hog flow with no traffic control at all: ping round trip time went up to 48.8ms and jitter was 21.0ms, but no other effect was seen on the frogs. We achieved 96% of our SLA speed (Service Level Agreement). If I were using DSL I would have seen a lot higher round trip time. Almost certainly our FIOS ONT is enforcing the SLA speed, and is doing a lot more: it is actively putting the frog packets ahead of the hogs. This is not surprising because a part of the product package is streaming video on demand, and frog abuse would kill the video quality.
So should I implement QoS or an alternative on my own router? Or should I rely on traffic control in the ONT? In security the rule is, don't trust unknown outsiders; see to your own security. This policy is probably a best practice for QoS also. More important, each router individually needs to avoid detaining frog packets among buffered hog flows traversing it. So I should continue to implement traffic control on my router. But I should not try to set up a complicated optimum design when I won't be able to test it because the ONT will compensate for any of my features that are inoperative.
Let's go through conventional wisdom about VoIP as seen in a lot of blog postings and product hype for professionally installed VoIP systems.
I don't see QoS code points being set by partners where they would make a difference. I'm going to ignore QoS, i.e. the DiffServ field, and I'm going to achieve quality service by providing enough capacity that the VoIP flows are frogs (which is easy: 1 kbyte/sec with the G.729 codec) and by helping all the frogs to hop quickly.
No you don't; a rigid allocation can choke one category when another is not using all of its allocation. You need to identify the flows that need to go out promptly and give them priority. Hog flows whose bandwidth is limited by network capacity clearly have to be given the lowest priority; otherwise they can and do squeeze out the frog flows. And this priority is independent of the unpredictabe bandwidth of the categories or of the whole net segment.
They're really saying that VoIP traffic has to be isolated on its own VLAN. But this is the same fallacy as allocating bandwidth specifically for that VLAN (if your switches support that), with the added problem that a clog-up on the real LAN affects all the VLANs.
The value of VLANs is for administration, so you can manage hosts and apply policies more easily in smaller groups, and for defense in depth (security), so a compromised host has fewer neighbors in a broadcast domain that it can attack.
No you don't: you should not leave bandwidth unused; you should send hog flows on it, just so long as the frog packets get priority to go out.
How should I design my traffic control? To achieve good quality of service, clearly I will accomplish very little if I pay attention to the QoS field because remote partners don't set it or it gets cleared in transit. Instead I should distinguish frogs from hogs and help them get through promptly, whatever QoS they have requested (if any).
My actual network is very small, but there aren't too many example traffic control scripts posted, and they seem to get copied from one blog post to the next. When reasonable, I want to make my solution apply to larger nets, and so I will be discussing net situations that will not actually occur (and be tested) on my own net.
Definitely I'm going to use fair queueing
on the hogs. This means
that they take turns sending packets in a round robin pattern. That means that
they will all tend to the same data rate: SLA/N if there are N hogs. (SLA
means the network capacity or the service level agreement rate.)
On my net there will almost always be zero or one hog flow, and frog flows will turn up at random intervals. But the natural state of a network segment in the enterprise is to be saturated with multiple hog flows, with multiple frogs trying to squeeze between them. Clearly the first step is to distinguish frogs from hogs. I'm setting a rate per flow of 1/4 of the SLA rate as the upper bound for the frogs. In an enterprise net this upper bound has to be set substantially less than SLA/N, lest the N hogs be treated mistakenly as frogs.
The TCP slow start algorithm starts the connection with a small window (and resulting data rate) and then increases it gradually. Thus a future hog flow will start out slow enough that it won't squeeze out the real frogs. But as the flow rate rises the local rate estimator should recognize quickly enough that it's grown into a true hog, before it gets fat enough to damage the frogs. In an enterprise net you will need to tune the time constant on the estimator, and set the frog upper bound low enough that the growing hog flow gets flipped into the hog category before squeezing out the actual frogs.
Since systemd-217, fq_codel (with default parameters) is the default queue discipline whenever an interface is newly brought up. Formerly it was pfifo_fast. Arch Linux wiki article on advanced traffic control mentioning this new default. It was released on 2014-10-28 though distros would adopt it somewhat later.
I'm going to use ECN on hog flows, which is negotiated with the partner.
So ECN has to be turned on by setting /proc/sys/net/ipv4/conf/all/tcp_ecn.
Values are 0 = never use ECN, 1 = always ask the partner to use it, or 2 =
don't ask, but use it if the partner asks.
At least in OpenSuSE Leap
42.1, the default is 2. The standard way
to set one of the /proc/sys controls is to put a line in /etc/sysctl.conf
like net.ipv4.conf.all.tcp_ecn = 1
. This setting controls both IPv4
and IPv6. Then one of the boot scripts executes /bin/sysctl -p
.
A very simple design for the outbound traffic control goes like this:
This traffic control set is applied to several network devices.
The root queue discipline is a Token Bucket Filter that limits the overall rate to 95% of the SLA speed. This avoids the queue at the router (the ONT) that enforces the SLA. The TBF chokes the flow by not dequeuing packets, versus dropping them. Excess packets are dropped in a lower level.
There could also be queues on core routers when aggregate flows exceed capacity, so radically increased latency is possible even though my traffic control is doing its job. Reducing my egress rate would have a miniscule effect on the queue length, recognizing the situation is hard, and reducing my rate is not worth the effort.
The intermediate queue discipline is strict priority (prio) with two bands (subclasses). Number 1 is for frogs, with a policing filter cueing on a rate of SLA/4. Number 2 gets all packets from flows with a higher rate.
This simple priority plan has a defect: tunnelled IPv6. If I had native IPv6 that traffic would be handled the same as IPv4, but the tunnel carries a mixture of hog and frog flows, and it appears to traffic control as a single flow: froggy only if there are no IPv6 hogs. So what happens to an IPv6 frog if there are both IPv6 hogs and IPv4 hogs? The tunnel device has this generic traffic control, so the frog packet will get on the tunnel right away, but the tunnel flow will be a hog, and it will have to take turns with the IPv4 hog(s). Large numbers of hog flows are unlikely on my small net, but let's assume the least unlikely scenario of one each IPv6 frog, IPv6 hog and IPv4 hog. The IPv6 frog packet may have to wait for one IPv4 frog packet, which at 50Mbit/sec would take 240 microseconds to send. This delay is insignificant. But on a core router of an enterprise net there could be more hog flows. The best cure is to use an ISP that has native IPv6.
The inner queue discipline for band 1 (frogs) is plain codel, since a frog packet is always sent out next, and it will be very rare for a frog packet to be waiting when the next packet comes through, so that a choice between them is relevant. For band 2 (hogs) the queue discipline is Fair Queuing with Controlled Delay Active Queue Management (fq_codel). Fair queuing means that each flow gets to send one packet round robin; thus the flows get an equal fraction of the net bandwidth. Each individual hog flow is managed by its own codel.
The CoDel algorithm monitors the queue of packets in its category waiting to go out. If packets stay in the queue too long (default 5ms), which is inevitable for a hog flow, then something has to be done. Packets are either dropped, or in my case, they have their ECN (Explicit Congestion Notification) bits set. In either case on a TCP hog flow the far end will reduce its acceptance window, which makes the traffic source send slower, thus letting the queue on this router drain out, until the recipient increases the window again, which it will do after a timeout. There is a hard limit on the queue length, set to give a little over the targeted latency, in case the endpoints ignore or botch ECN.
Token Bucket Filter: (Mind the units.)
NET=eth1 # Net device being controlled
MTU=1500 # bytes, size of one packet
LATEN=50 # ms (milliseconds), maximum time to get through queue
SLA=25 # Mbit/sec, maximum allowed net capacity
RATE=$(((SLA*19000)/20)) # 95% of SLA data rate in kbit/s
BURST=$(((RATE*1000)/HZ)) # Number of bytes that can go out during 1 clock tick; use HZ=1000 on a tickless kernel.
tc qdisc add dev $NET root handle 1: \
tbf rate ${RATE}kbit burst ${BURST} latency ${LATEN}ms
Command line options of the TBF:
Priority Queue Discipline:
tc qdisc add dev $NET parent 1:0 handle 2: \
prio bands 2 priomap 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Command line options of the prio queue:
CODEL For Frogs:
# local LIMIT=$(( 1.5 * (LATEN/1000) * (SLA*1000000) / (8*MTU) ))
# -- Simplify to:
local LIMIT=$(((LATEN*SLA*187)/MTU))
tc qdisc add dev $NET parent 2:1 handle 11: \
codel limit $LIMIT ecn
Command line options of the CODEL:
Filter to Collect Frogs
local RMAX=$(((SLA*1000)/4)) # 1/4 SLA in kbit/sec
tc filter add dev $NET parent 2: pref 1 protocol IP \
estimator 250ms 4sec \
u32 match u8 0 0 at 0 \
police avrate ${RMAX}kbit conform-exceed continue/pass classid 2:1
Command line options for the filter:
FQ_Codel For Hogs
tc qdisc add dev $NET parent 2:2 handle 12: \
fq_codel limit $LIMIT ecn
Command line options are the same as for the plain CODEL for the frogs, but here each hog flow gets its private CODEL, and the FQ part dequeues them round robin.
Filter to Collect Hogs
tc filter add dev $NET parent 2: pref 2 protocol IP \
u32 match u8 0 0 at 0 classid 2:2
Command line options are a subset of the ones for the frog filter. Here it captures every packet passed to it, i.e. not accepted as a frog.
The generic queue discipline code is nice but it only works on outgoing packets and I'm not doing traffic control on the fast local LAN, just the slower wild side interface. What about incoming packets? Incoming frogs are going to get stuck in the ONT's queue too, if not protected, and the higher data rates are expected on ingress (downloads). Back in 2009 when I first wrote this script, there was a motif which I copied which attaches a simple policing filter to the ingress stream, and drops packets that exceed the configured data rate (95% of SLA). That's better than nothing, but far from ideal when VoIP is involved.
Since 2009 a lot of advances have been made. In particular, you can create an Intermediate Functional Block (fake net interface), flip the incoming packets onto it, and put the generic egress queue discipline on it. Here is how to make that happen.
Create the IFB. It's not a problem to load the module and bring up the IFB multiple times.
IFB=ifb0
modprobe ifb
ip link set $IFB up
Call the subroutine that puts the generic egress queue discipline on the IFB.
Create an ingress tap on the wild side net interface. I don't see this motif in any of the man pages for the tc family. Various blog and wiki posts copy it verbatim.
tc qdisc add dev $NET handle ffff:0 ingress
Command line options:
ingressis the name of a pseudo queue discipline that captures the incoming packets.
Flip the incoming packets over to the IFB, where they can be queued and filtered properly.
tc filter add dev $NET parent ffff:0 u32 \
match u8 0 0 \
action mirred egress redirect dev $IFB
Command line options for mirred:
I have IPv6 service via a tunnel to Hurricane Electric. As previously noted, traffic control would be a lot simpler with native IPv6, which would be handled by the generic traffic control. But my ISP is firmly rooted in the 1900's, so this is not an option. And my family does quite a lot of traffic with IPv6-capable sites, so I need traffic control on the tunnel.
As it turns out, I can just call the subroutines to put the generic egress and ingress traffic control on the tunnel endpoint and a second IFB for ingress. Problem solved.
Should there be more than one band for the frogs? No, I've decided. In this design the frog packets go out right away, and there is supposed to be plenty of capacity to handle them. Thus if there were multiple frog bands an incoming packet would go in one of them, and go right out; very rarely would multiple frog packets be waiting so a choice between them would be relevant. So I'm using only one frog band. But if you are sending out streaming video, which is a frog flow of higher bandwidth, the probability of multiple packets waiting might be higher, and it might make sense to put in a fair queueing module for the frogs, so flows of a similar rate would make similarly even progress.
I debated whether to post the complete traffic control script, or to make the reader write the script himself. Too often people act like script kiddies, drop a pre-made admin script or security recommendation into their system, get burned, and complain. It's very important to understand what the script is trying to accomplish, and to adjust the parameters and strategies according to what's appropriate on your network. In particular, not everyone will have tunneled IPv6 service (I hope that means most people will have native IPv6). Anyway, here are links to the script and a systemd unit file.
enableit for starting at boot.
I did a final test of hog versus frog performance with this traffic control
script. The hog task was to download (In
) or upload (Out
) a
file of about 80Mbytes; the partner is in the same city (Los Angeles).
The frog task was to ping the same host 10 times
using the same family (IPv4 or IPv6) as the hog.
The average round trip time is reported,
as well as the jitter, defined here as the difference between the best and
worst ping time. There were no lost ping packets. For the hog flow,
MiB/sec
means 220 bytes/sec, whereas
Mbit/sec
means 106 bits/sec.
The Service Level Agreement is 50 Mbit/sec and the Token Bucket
Filter is set for 95% of that, or 47.5 Mbit/sec.
Family | MiB/sec | Mbit/s | Ping ms | Jitter ms |
---|---|---|---|---|
IPv4 None | 0 | 0 | 10.3 | 3.8 |
IPv4 In | 5.42 | 45.5 | 13.5 | 6.0 |
IPv4 Out | 5.41 | 45.4 | 12.3 | 3.5 |
IPv6 None | 0 | 0 | 29.2 | 6.0 |
IPv6 In | 1.97 | 16.5 | 27.7 | 4.5 |
IPv6 Out | 5.08 | 42.7 | 29.1 | 3.1 |
The frog flow (ping) was not affected at all by the hog; differences (within a family) are not significant. IPv6 takes longer because the packets run through the tunnel up to Fremont (near San Francisco) and back to the partner in Los Angeles via native IPv6. On IPv4 the hog achieved 91% of SLA. On IPv6 the upload speed was almost as good, but the download speed was only about 1/3 SLA. Almost certainly the issue is that the Hurricane Electric free tunnel service is shared among quite a lot of users, and we're seeing a capacity limit, not a defect in my traffic control. It's notable that the frog flow was unaffected despite cloud clogging. Hurricane Electric must limit their buffer size and may have their own traffic control that identifies frogs and gives them priority.
Unfortunately in my distro (OpenSuSE Leap
42.1) I have iproute2-4.2
but the documentation has not caught up with the functions available in the
code. (On Google search for tc police man page
.)
In particular, in a filter with a police action you can specify precisely what
is going to happen if a packet does or doesn't exceed the data rate limit.
You specify this with the option conform-exceed EXCEEDACT[/NOTEXCEEDACT]
.
The conform action is optional, and the exceed action comes first, even though
it appears second in the keyword. Remember that one filter could have several
actions. I did some empirical tests to see what the outcome choices actually
do; they were tested alone, not with multiple actions.
From the man page, the available actions are:
Classy
means the queue discipline can
have several classes with different priorities. All of these queue
disciplines are classy. They need an inner queue discipline without
classes.
The purpose of QoS in our use case is to send out VoIP packets promptly, delaying competing high volume traffic like Youtube videos and big downloads.
Their script uses:
$TC filter add dev ${DEV} parent 1: prio 1 protocol ip u32 \ match ip tos 0x68 0xff \ match ip protocol 0x11 0xff \ flowid 1:1This is attracting TOS 0x68, and another one for 0xb8. Minimum delay 0x10 is put in the next lower level.
Before deploying VoIP, assess whether QoS actually does something. Send synthetic VoIP with and without QoS turned on and see if it really helps. Determine your maximum expected load and generate this amount of data. Measure VoIP metrics (latency, jitter, packet loss) and save the result for later comparison. Monitor regularly after deployment. Compare with your baseline. Recognize developing problems and fix them right away.
mean; this is for IANA (ICANN?) registration.
List of typically implemented PHBs (per hop behaviors). The code
points given here are what you would or
into the TOS octet, but
in documentation they are frequently given relative to the 6 bit DS
field and thus have to be shifted left 2 bits for use in actual code.
E.g. documentation specifies 0x2e, traffic shaping rule needs 0xb8,
the latter is shown below.