About
This article describes how to set up WHLE-LS1046A, using standard upstream DPAA driver, as a router giving strict priority to any ssh packets, making the router's non-ssh workload nearly transparent to any ssh connections going through it.
The article aims to showcase the practical use of DPAA hardware-offloaded Multiqueue Priority Discipline (mqprio
qdisc) in conjunction with iptables
, so the setup focuses only on the situation of network interface's congestion. Controlling access to other resources, like CPU, required to make the router's workload truly transparent to ssh connections, is outside of the scope.
Connection diagram
The setup is similar to the one used in Router/Bridge Mode: PC + WHLE Setup: whle_ls1046
board acts as a router between two links connected with the testing PC
. The difference is that one of the links is 1 Gb/s instead of both being 10 Gb/s. This allows for saturating the physical link with little load on WHLE’s processing power, thus simplifying the setup and eliminating other possible factors which could influence the outcome of ssh throughput measuring experiments.
The speed of ssh connection will be measured between enxc84d4423262e
and ens1f0
interfaces on PC
. The isolated_ns denotes network namespace in which the ens1f0
interface had to be enclosed to force PC
to send the packets through whle_ls1046
instead of short-circuiting to the local interface.
Three scenarios for ssh connection will be considered:
no other traffic than ssh,
ssh connection over a link saturated with
iperf3
traffic:without using DPAA’s priority queues,
with the usage of DPAA’s priority queues.
Network setup
PC
root@PC~# ip netns add isolated_ns root@PC~# ip link set ens1f0 netns isolated_ns root@PC~# ip netns exec isolated_ns ip addr flush ens1f0 root@PC~# ip netns exec isolated_ns ip addr add 192.168.10.1/24 dev ens1f0 root@PC~# ip netns exec isolated_ns ip link set dev ens1f0 up root@PC~# ip netns exec isolated_ns ip route delete 192.168.3.0/24 root@PC~# ip netns exec isolated_ns ip route add 192.168.3.0/24 via 192.168.10.2 root@PC~# ip addr flush enxc84d4423262e root@PC~# ip address add 192.168.3.1/24 dev enxc84d4423262e root@PC~# ip link set dev enxc84d4423262e up root@PC~# ip route delete 192.168.10.0/24 root@PC~# ip route add 192.168.10.0/24 via 192.168.3.2
whle_ls1046a
root@whle-ls1046a:~# ip address flush eth1 root@whle-ls1046a:~# ip address flush eth5 root@whle-ls1046a:~# ip addr add 192.168.3.2/24 dev eth1 root@whle-ls1046a:~# ip addr add 192.168.10.2/24 dev eth5 root@whle-ls1046a:~# ip link set dev eth1 up root@whle-ls1046a:~# ip link set dev eth5 up root@whle-ls1046a:~# echo 1 > /proc/sys/net/ipv4/ip_forward
By default the network interfaces on WHLE are controlled by NetworkManager service and the effects of the ip
commands above will be periodically overwritten with its own configuration. It may be necessary to temporarily stop the service
root@whle-ls1046a:~# systemctl stop NetworkManager
or to configure it to ignore the eth1
, eth5
interfaces with a configuration like
root@whle-ls1046a:~# echo ' [main] plugins=ifupdown,keyfile [keyfile] unmanaged-devices=interface-name:eth1,interface-name:eth5 ' > /etc/NetworkManager/NetworkManager.conf root@whle-ls1046a:~# systemctl restart NetworkManager
Services setup
PC
root@PC:~# ip netns exec isolated_ns iperf3 --server --daemon
Keep in mind that starting the iperf3
server within the isolated network namespace isolated_ns
makes it reachable only through the 192.168.10.1
address. Attempts to connect the client through a different address will result in a cryptic Bad file descriptor
error.
root@whle-ls1046a:~# iperf3 --client 192.168.3.1 iperf3: error - unable to send control message: Bad file descriptor
It’s assumed that there is a ssh daemon running on PC
already.
Tests
Control case: scp
transfer through empty network
To measure the ssh throughput the scp
program will be used on some decently big file ~700 MB, assumed to be at /home/user/files/download.xz
on PC
. It will be sent to /home/user
on the same machine.
PC
root@PC:~# time ip netns exec isolated_ns scp /home/user/files/download.xz user@192.168.3.1:
download.xz 100% 706MB 111.7MB/s 00:06 real 0m6,757s user 0m3,617s sys 0m1,786s
The root access was needed to execute the ip netns
command. Transferring the whole file through the empty network takes around 7 seconds.
The direction of the transfer is actually important in this experiment. The notion of queue prioritization in the DPAA architecture (or any other mqprio
architecture for that matter) is only applicable to the egress traffic. Sending the local file /home/user/files/download.xz
to the “remote“ location 192.168.3.1
from isolated namespace implies the following order of processing for the majority of ssh traffic:
PC
’s CPU,PC
’sens1f0
interface (egress),whle_ls1046
’seth5
interface (ingress),whle_ls1046
’s CPU,whle_ls1046
’seth1
interface (egress),PC
’senxc84d4423262e
interface (ingress),PC
’s CPU.
Given that the maximum throughput of 1 Gb/s for the whole connection leaves plenty of space on whle_ls1046
’s CPU (let alone PC
’s) and that the ens1f0
- eth5
link is 10 Gb/s, the 1 Gb/s enxc84d4423262e
- eth1
link becomes the bottleneck, with packets congesting at the eth1
funnel where the DPAA prioritization can come into play. Having the transfer go the other way, eg. with
root@PC:~# time ip netns exec isolated_ns scp user@192.168.3.1:/home/user/files/download.xz .
the funnel would form at the testing machine’s enxc84d4423262e
interface.
Test case: scp
transfer on saturated link, no prioritization
Start the iperf3
flow to saturate the 1 Gb/s link.
PC
user@PC:~$ iperf3 --client 192.168.10.1 --time 0 --reverse
Connecting to host 192.168.10.1, port 5201 Reverse mode, remote host 192.168.10.1 is sending [ 5] local 192.168.3.1 port 53244 connected to 192.168.10.1 port 5201 [ ID] Interval Transfer Bitrate [ 5] 0.00-1.00 sec 112 MBytes 942 Mbits/sec [ 5] 1.00-2.00 sec 112 MBytes 942 Mbits/sec [ 5] 2.00-3.00 sec 112 MBytes 942 Mbits/sec ...
Once again the direction of iperf3
’s flow is important: it must match the direction scp
’s transfer, or there would be no conflict between them to arbitrate. By default iperf3
sends data from client to server. Using the --reverse
flag reverses it, ensuring that the data traverses ens1f0
(egress)→ eth5
(ingress) → eth1
(egress) → enxc84d4423262e
(ingress).
Perform the scp
transfer in another console.
PC
root@PC:~# time ip netns exec isolated_ns scp /home/user/files/download.xz user@192.168.3.1:
download.xz 100% 706MB 55.8MB/s 00:12 real 0m13,106s user 0m6,229s sys 0m2,324s
The time to transfer the file doubled. Meanwhile in iperf3
’s logs:
... [ 5] 29.00-30.00 sec 112 MBytes 941 Mbits/sec [ 5] 30.00-31.00 sec 112 MBytes 942 Mbits/sec [ 5] 31.00-32.00 sec 112 MBytes 942 Mbits/sec [ 5] 32.00-33.00 sec 71.8 MBytes 602 Mbits/sec <-- scp transfer start [ 5] 33.00-34.00 sec 56.9 MBytes 477 Mbits/sec [ 5] 34.00-35.00 sec 57.6 MBytes 483 Mbits/sec [ 5] 35.00-36.00 sec 57.6 MBytes 483 Mbits/sec [ 5] 36.00-37.00 sec 57.6 MBytes 483 Mbits/sec [ 5] 37.00-38.00 sec 57.2 MBytes 480 Mbits/sec [ 5] 38.00-39.00 sec 55.7 MBytes 468 Mbits/sec [ 5] 39.00-40.00 sec 55.2 MBytes 463 Mbits/sec [ 5] 40.00-41.00 sec 55.2 MBytes 463 Mbits/sec [ 5] 41.00-42.00 sec 55.2 MBytes 463 Mbits/sec [ 5] 42.00-43.00 sec 55.2 MBytes 463 Mbits/sec [ 5] 43.00-44.00 sec 55.2 MBytes 463 Mbits/sec [ 5] 44.00-45.00 sec 58.7 MBytes 493 Mbits/sec <-- scp transfer finish [ 5] 45.00-46.00 sec 112 MBytes 942 Mbits/sec [ 5] 46.00-47.00 sec 112 MBytes 941 Mbits/sec [ 5] 47.00-48.00 sec 112 MBytes 942 Mbits/sec [ 5] 48.00-49.00 sec 112 MBytes 942 Mbits/sec [ 5] 49.00-50.00 sec 112 MBytes 941 Mbits/sec ...
This shows that with the default queuing discipline the 1 Gb/s link is shared evenly between iperf3
and scp
, an expected behavior where neither flow has higher priority than the other.
Test case: scp
transfer on saturated link, with prioritization
Setting iptables
Configure iptables
to assign the highest skb priority to ssh packets.
whle_ls1046a
root@whle-ls1046a:~# iptables -t mangle -F root@whle-ls1046a:~# iptables -t mangle -A POSTROUTING -p tcp --dport 22 -j CLASSIFY --set-class 0:f root@whle-ls1046a:~# iptables -t mangle -A POSTROUTING -p tcp --sport 22 -j CLASSIFY --set-class 0:f
This should result in the following table:
root@whle-ls1046a:~# iptables -t mangle --list -v
Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes) pkts bytes target prot opt in out source destination Chain INPUT (policy ACCEPT 0 packets, 0 bytes) pkts bytes target prot opt in out source destination Chain FORWARD (policy ACCEPT 0 packets, 0 bytes) pkts bytes target prot opt in out source destination Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes) pkts bytes target prot opt in out source destination Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes) pkts bytes target prot opt in out source destination 0 0 CLASSIFY tcp -- any any anywhere anywhere tcp dpt:ssh CLASSIFY set 0:f 0 0 CLASSIFY tcp -- any any anywhere anywhere tcp spt:ssh CLASSIFY set 0:f
The configuration makes use of the mangle
table which is designed for packet modification. While the packets themselves aren’t modified in this scenario, their socket buffer structure’s used by the kernel is, namely the priority
field.
The first command simply flushes the mangle
table’s configuration to make sure no other rules apply. The second command assigns the priority 15
to any TCP packet with the destination port being 22
. The third command does so with the source port. This effectively covers all standard ssh connections.
The actual priority assignment is done, indirectly, by the --set-class 0:f
fragment. From iptables-extensions manual:
CLASSIFY
This module allows you to set the skb->priority value (and thus clas-
sify the packet into a specific CBQ class).--set-class major:minor
Set the major and minor class value. The values are always in-
terpreted as hexadecimal even if no 0x prefix is given.
Unfortunately the documentation doesn’t provide the actual correspondence between major:minor
class specification and the affected skb->priority
value. This can be found in iptable
’s source (iptables-1.8.7/extensions/libxt_CLASSIFY.c
):
static int CLASSIFY_string_to_priority(const char *s, unsigned int *p) { unsigned int i, j; if (sscanf(s, "%x:%x", &i, &j) != 2) return 1; *p = TC_H_MAKE(i<<16, j); return 0; }
and kernel’s source (include/uapi/linux/pkt_sched.h
):
#define TC_H_MAJ_MASK (0xFFFF0000U) #define TC_H_MIN_MASK (0x0000FFFFU) ... #define TC_H_MAKE(maj,min) (((maj)&TC_H_MAJ_MASK)|((min)&TC_H_MIN_MASK))
From this it can be concluded that as long as major
is 0
and minor < 0x10000
then skb->priority
is simply the value of minor
. To use a different priority 10
, for example, one would have to use the --set-class 0:a
. The values of skb->priority
higher than 0xF
aren’t recognized by the mqprio
qdisc anyway.
The usage of POSTROUTING
chain signifies that the prioritization occurs right before the packet is sent to the network interface. It’s not strictly required to do it at the last moment and the FORWARD
chain could be used as well. The OUTPUT
chain, however, applies only to the packets generated by whle_ls1046
itself, so the routed packets would remain unaffected, while PREROUTING
and INPUT
chains aren’t even accepted along with the CLASSIFY
target by iptables
command.
Setting tc
Set up the queues discipline for the eth1
interface.
whle_ls1046a
root@whle-ls1046a:~# tc qdisc del dev eth1 root handle 1: root@whle-ls1046a:~# tc qdisc add dev eth1 \ root handle 1: mqprio num_tc 4 map 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 hw 1
The first command deletes any qdisc that may have been assigned to eth1
already. It may return an error when there is none, that’s not a problem.
The second command initiates 1024
DPAA queues in 4
different classes, each having a different DPAA priority (to distinguish it from skb priority that iptables
is concerned with).
Below is the detailed description of the command’s fragments and what setup they result in.
mqprio
Use the “Multiqueue Priority Qdisc (Offloaded Hardware QOS)“ man:tc-mqprio.
num_tc 4
Use 4 queue classes, identified with numbers 0
, 1
, 2
, 3
. The queue’s class (Packet Queue’s Traffic Class in DPAA’s nomenclature) maps directly to the Work Queue in eth1
’s Direct Connected Channel the queue is put on when having at least one packet (see Upstream DPAA driver and Channels):
Traffic Class | Work Queue |
---|---|
0 | WQ6 |
1 | WQ2 |
2 | WQ1 |
3 | WQ0 |
DPAA arranges Work Queues into 4 groups, ordered by their increasing DPAA priority:
Work Queue Group | DPAA Priority Name | DPAA Priority Num |
---|---|---|
WQ5, WQ6, WQ7 | Low | 1 |
WQ2, WQ3, WQ4 | Medium | 2 |
WQ1 | High | 3 |
WQ0 | Highest | 4 |
They are governed by a strict priority rule: a group with priority number n must be emptied of all packets before any packet from the group with number k lower than n can be serviced (k, n in {1,2,3,4}). Because WQ0, WQ1, WQ2, WQ6 corresponding to different traffic classes all belong to different groups, the strict priority rule effectively applies to the traffic classes 0
, 1
, 2
, 3
as well, with 3
having the highest priority.
Each class has exactly 256
queues, resulting in total of 1024
queues in this case. This number cannot be changed with the tc
’s queues
argument - it’s silently ignored by DPAA’s driver, including the provided offsets. No more than 4 classes can be used. When less than 4 classes are used then the queues are trimmed from the higher priority end. For example, using num_tc 3
would result in 768
queues (3 * 256
) belonging to traffic classes 0
, 1
, 2
, using work queues WQ6, WQ2, WQ1.
map 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3
This argument maps skb priority 0 .. 15
to the traffic class 0 .. 3
using the value’s position in the sequence as argument.
skb->priority | Traffic Class |
---|---|
0, 1, 2, 3 | 0 |
4, 5, 6, 7 | 1 |
8, 9, 10, 11 | 2 |
12, 13, 14, 15 | 3 |
hw 1
Tells tc
to actually use hardware offloading implemented by DPAA architecture instead of emulating this queue discipline in kernel.
The created queue discipline can be displayed with
root@whle-ls1046a:~# tc qdisc show dev eth1
qdisc mqprio 1: root tc 4 map 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 queues:(0:255) (256:511) (512:767) (768:1023) mode:dcb shaper:dcb qdisc pfifo_fast 0: parent 1:400 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 qdisc pfifo_fast 0: parent 1:3ff bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 qdisc pfifo_fast 0: parent 1:3fe bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 qdisc pfifo_fast 0: parent 1:3fd bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 qdisc pfifo_fast 0: parent 1:3fc bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 ... qdisc pfifo_fast 0: parent 1:3 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 qdisc pfifo_fast 0: parent 1:2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 qdisc pfifo_fast 0: parent 1:1 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
The qdisc pfifo_fast 0: parent ...
are sub-qdiscs which correspond directly to the DPAA frame queues listed as queues
.
The full relation between the entities described so far is as follows:
skb->priority | Traffic Class | Work Queue | DPAA Priority Name | DPAA Priority Num | DPAA Frame Queue | qdisc pfifo_fast |
---|---|---|---|---|---|---|
0, 1, 2, 3 | 0 | WQ6 | Low | 1 | (0:255) | 1:1 … 1:100 |
4, 5, 6, 7 | 1 | WQ2 | Medium | 2 | (256:511) | 1:101 … 1:200 |
8, 9, 10, 11 | 2 | WQ1 | High | 3 | (512:767) | 1:201 … 1:300 |
12, 13, 14, 15 | 3 | WQ0 | Highest | 4 | (768:1023) | 1:301 … 1:400 |
Keep in mind that the “DPAA Frame Queue“ indexes displayed in tc
’s output are not the same as Frame Queue IDs for Tx which can be displayed for eth1
as Tx: 3337 - 4360
with
root@whle-ls1046a:~# cat /sys/class/net/eth1/fqids
Rx error: 2181 Rx default: 2182 Rx PCD: 2304 - 2431 Tx confirmation (mq): 2183 - 2303 Tx confirmation (mq): 2432 - 3334 Tx error: 3335 Tx default confirmation: 3336 Tx: 3337 - 4360
The Frame Queue IDs are low-level DPAA identifiers which must be globally unique across all network interfaces. The (0:255) (256:511) (512:767) (768:1023)
ids are tc
-specific and describe only the queues assigned to the interface provided in the argument, in this case eth1
.
Although it’s not enforced by the configuration, it can be established empirically that packets from iperf3
’s traffic fall into classes 0
and 1
. Assuming that the iptables
configuration properly assigns ssh packets the skb priority 15
before sending them to eth1
for transfer they should all fall into traffic class 3
and be enqueued on the highest priority Work Queue WQ0, to be serviced before all iperf3
packets. This should result in iperf3
’s traffic being stopped completely for the duration of scp
’s transfer.
Performing the test
Start the iperf3
flow to saturate the link.
PC
user@PC:~$ iperf3 --client 192.168.10.1 --time 0 --reverse
Connecting to host 192.168.10.1, port 5201 Reverse mode, remote host 192.168.10.1 is sending [ 5] local 192.168.3.1 port 41978 connected to 192.168.10.1 port 5201 [ ID] Interval Transfer Bitrate [ 5] 0.00-1.00 sec 112 MBytes 941 Mbits/sec [ 5] 1.00-2.00 sec 112 MBytes 942 Mbits/sec [ 5] 2.00-3.00 sec 112 MBytes 941 Mbits/sec ...
Perform the scp
transfer in another console.
PC
root@PC:~# time ip netns exec isolated_ns scp /home/user/files/download.xz user@192.168.3.1:
download.xz 100% 706MB 111.7MB/s 00:06 real 0m6,773s user 0m3,766s sys 0m1,534s
The file transfer time is basically the same as if there was no other data transferred on the link. Meanwhile in iperf3
’s logs:
... [ 5] 17.00-18.00 sec 112 MBytes 941 Mbits/sec [ 5] 18.00-19.00 sec 112 MBytes 942 Mbits/sec [ 5] 19.00-20.00 sec 112 MBytes 942 Mbits/sec [ 5] 20.00-21.00 sec 112 MBytes 941 Mbits/sec [ 5] 21.00-22.00 sec 70.2 MBytes 589 Mbits/sec <-- scp transfer start [ 5] 22.00-23.00 sec 0.00 Bytes 0.00 bits/sec [ 5] 23.00-24.00 sec 0.00 Bytes 0.00 bits/sec [ 5] 24.00-25.00 sec 0.00 Bytes 0.00 bits/sec [ 5] 25.00-26.00 sec 0.00 Bytes 0.00 bits/sec [ 5] 26.00-27.00 sec 0.00 Bytes 0.00 bits/sec [ 5] 27.00-28.00 sec 5.87 MBytes 49.3 Mbits/sec <-- scp transfer finish [ 5] 28.00-29.00 sec 112 MBytes 942 Mbits/sec [ 5] 29.00-30.00 sec 112 MBytes 942 Mbits/sec [ 5] 30.00-31.00 sec 112 MBytes 942 Mbits/sec [ 5] 31.00-32.00 sec 112 MBytes 942 Mbits/sec ...
The iperf3
flow ceased completely during the scp
transfer, showcasing the strict priority rule in action.
Add Comment