Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

About

This article describes how to set up WHLE-LS1046A, using standard upstream DPAA driver, as a router giving strict priority to any ssh packets, making the router's non-ssh workload nearly transparent to any ssh connections going through it.

The article aims to showcase the practical use of DPAA hardware-offloaded Multiqueue Priority Discipline (mqprio qdisc) in conjunction with iptables, so the setup focuses only on the situation of network interface's congestion. Controlling access to other resources, like CPU, required to make the router's workload truly transparent to ssh connections, is outside of the scope.

Connection diagram

The setup is similar to the one used in Router/Bridge Mode: PC + WHLE Setup: whle_ls1046 board acts as a router between two links connected with the testing PC. The difference is that one of the links is 1 Gb/s instead of both being 10 Gb/s. This allows for saturating the physical link with little load on WHLE’s processing power, thus simplifying the setup and eliminating other possible factors which could influence the outcome of ssh throughput measuring experiments.

The speed of ssh connection will be measured between enxc84d4423262e and ens1f0 interfaces on PC. The isolated_ns denotes network namespace in which the ens1f0 interface had to be enclosed to force PC to send the packets through whle_ls1046 instead of short-circuiting to the local interface.

Three scenarios for ssh connection will be considered:

  • no other traffic than ssh,

  • ssh connection over a link saturated with iperf3 traffic:

    • without using DPAA’s priority queues,

    • with the usage of DPAA’s priority queues.

Network setup

PC
root@PC~# ip netns add isolated_ns
root@PC~# ip link set ens1f0 netns isolated_ns
root@PC~# ip netns exec isolated_ns ip addr flush ens1f0
root@PC~# ip netns exec isolated_ns ip addr add 192.168.10.1/24 dev ens1f0
root@PC~# ip netns exec isolated_ns ip link set dev ens1f0 up
root@PC~# ip netns exec isolated_ns ip route delete 192.168.3.0/24
root@PC~# ip netns exec isolated_ns ip route add 192.168.3.0/24 via 192.168.10.2

root@PC~# ip addr flush enxc84d4423262e
root@PC~# ip address add 192.168.3.1/24 dev enxc84d4423262e
root@PC~# ip link set dev enxc84d4423262e up
root@PC~# ip route delete 192.168.10.0/24
root@PC~# ip route add 192.168.10.0/24 via 192.168.3.2
whle_ls1046a
root@whle-ls1046a:~# ip address flush eth1
root@whle-ls1046a:~# ip address flush eth5
root@whle-ls1046a:~# ip addr add 192.168.3.2/24 dev eth1
root@whle-ls1046a:~# ip addr add 192.168.10.2/24 dev eth5
root@whle-ls1046a:~# ip link set dev eth1 up
root@whle-ls1046a:~# ip link set dev eth5 up
root@whle-ls1046a:~# echo 1 > /proc/sys/net/ipv4/ip_forward

By default the network interfaces on WHLE are controlled by NetworkManager service and the effects of the ip commands above will be periodically overwritten with its own configuration. It may be necessary to temporarily stop the service

root@whle-ls1046a:~# systemctl stop NetworkManager

or to configure it to ignore the eth1, eth5 interfaces with a configuration like

root@whle-ls1046a:~# echo '
[main]
plugins=ifupdown,keyfile

[keyfile]
unmanaged-devices=interface-name:eth1,interface-name:eth5
' > /etc/NetworkManager/NetworkManager.conf
root@whle-ls1046a:~# systemctl restart NetworkManager

Services setup

PC
root@PC:~# ip netns exec isolated_ns iperf3 --server --daemon

Keep in mind that starting the iperf3 server within the isolated network namespace isolated_ns makes it reachable only through the 192.168.10.1 address. Attempts to connect the client through a different address will result in a cryptic Bad file descriptor error.

root@whle-ls1046a:~# iperf3 --client 192.168.3.1
iperf3: error - unable to send control message: Bad file descriptor

It’s assumed that there is a ssh daemon running on PC already.

Tests

Control case: scp transfer through empty network

To measure the ssh throughput the scp program will be used on some decently big file ~700 MB, assumed to be at /home/user/files/download.xz on PC. It will be sent to /home/user on the same machine.

PC
root@PC:~# time ip netns exec isolated_ns scp /home/user/files/download.xz user@192.168.3.1:
download.xz                                                           100%  706MB 111.7MB/s   00:06    
real	0m6,757s
user	0m3,617s
sys	    0m1,786s

The root access was needed to execute the ip netns command. Transferring the whole file through the empty network takes around 7 seconds.

The direction of the transfer is actually important in this experiment. The notion of queue prioritization in the DPAA architecture (or any other mqprio architecture for that matter) is only applicable to the egress traffic. Sending the local file /home/user/files/download.xz to the “remote“ location 192.168.3.1 from isolated namespace implies the following order of processing for the majority of ssh traffic:

  1. PC’s CPU,

  2. PC’s ens1f0 interface (egress),

  3. whle_ls1046’s eth5 interface (ingress),

  4. whle_ls1046’s CPU,

  5. whle_ls1046’s eth1 interface (egress),

  6. PC’s enxc84d4423262e interface (ingress),

  7. PC’s CPU.

Given that the maximum throughput of 1 Gb/s for the whole connection leaves plenty of space on whle_ls1046’s CPU (let alone PC’s) and that the ens1f0 - eth5 link is 10 Gb/s, the 1 Gb/s enxc84d4423262e - eth1 link becomes the bottleneck, with packets congesting at the eth1 funnel where the DPAA prioritization can come into play. Having the transfer go the other way, eg. with

root@PC:~# time ip netns exec isolated_ns scp user@192.168.3.1:/home/user/files/download.xz .

the funnel would form at the testing machine’s enxc84d4423262e interface.

Test case: scp transfer on saturated link, no prioritization

Start the iperf3 flow to saturate the 1 Gb/s link.

PC
user@PC:~$ iperf3 --client 192.168.10.1 --time 0 --reverse
Connecting to host 192.168.10.1, port 5201
Reverse mode, remote host 192.168.10.1 is sending
[  5] local 192.168.3.1 port 53244 connected to 192.168.10.1 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec   112 MBytes   942 Mbits/sec                  
[  5]   1.00-2.00   sec   112 MBytes   942 Mbits/sec                  
[  5]   2.00-3.00   sec   112 MBytes   942 Mbits/sec                  
...

Once again the direction of iperf3’s flow is important: it must match the direction scp’s transfer, or there would be no conflict between them to arbitrate. By default iperf3 sends data from client to server. Using the --reverse flag reverses it, ensuring that the data traverses ens1f0 (egress)→ eth5 (ingress) → eth1 (egress) → enxc84d4423262e (ingress).

Perform the scp transfer in another console.

PC
root@PC:~# time ip netns exec isolated_ns scp /home/user/files/download.xz user@192.168.3.1:
download.xz                                                           100%  706MB  55.8MB/s   00:12    
real	0m13,106s
user	0m6,229s
sys	    0m2,324s

The time to transfer the file doubled. Meanwhile in iperf3’s logs:

...
[  5]  29.00-30.00  sec   112 MBytes   941 Mbits/sec                  
[  5]  30.00-31.00  sec   112 MBytes   942 Mbits/sec                  
[  5]  31.00-32.00  sec   112 MBytes   942 Mbits/sec                  
[  5]  32.00-33.00  sec  71.8 MBytes   602 Mbits/sec                  <-- scp transfer start
[  5]  33.00-34.00  sec  56.9 MBytes   477 Mbits/sec                  
[  5]  34.00-35.00  sec  57.6 MBytes   483 Mbits/sec                  
[  5]  35.00-36.00  sec  57.6 MBytes   483 Mbits/sec                  
[  5]  36.00-37.00  sec  57.6 MBytes   483 Mbits/sec                  
[  5]  37.00-38.00  sec  57.2 MBytes   480 Mbits/sec                  
[  5]  38.00-39.00  sec  55.7 MBytes   468 Mbits/sec                  
[  5]  39.00-40.00  sec  55.2 MBytes   463 Mbits/sec                  
[  5]  40.00-41.00  sec  55.2 MBytes   463 Mbits/sec                  
[  5]  41.00-42.00  sec  55.2 MBytes   463 Mbits/sec                  
[  5]  42.00-43.00  sec  55.2 MBytes   463 Mbits/sec                  
[  5]  43.00-44.00  sec  55.2 MBytes   463 Mbits/sec                  
[  5]  44.00-45.00  sec  58.7 MBytes   493 Mbits/sec                  <-- scp transfer finish
[  5]  45.00-46.00  sec   112 MBytes   942 Mbits/sec                  
[  5]  46.00-47.00  sec   112 MBytes   941 Mbits/sec                  
[  5]  47.00-48.00  sec   112 MBytes   942 Mbits/sec                  
[  5]  48.00-49.00  sec   112 MBytes   942 Mbits/sec                  
[  5]  49.00-50.00  sec   112 MBytes   941 Mbits/sec                  
...

This shows that with the default queuing discipline the 1 Gb/s link is shared evenly between iperf3 and scp, an expected behavior where neither flow has higher priority than the other.

Test case: scp transfer on saturated link, with prioritization

Setting iptables

Configure iptables to assign the highest skb priority to ssh packets.

whle_ls1046a
root@whle-ls1046a:~# iptables -t mangle -F 
root@whle-ls1046a:~# iptables -t mangle -A POSTROUTING -p tcp --dport 22 -j CLASSIFY --set-class 0:f
root@whle-ls1046a:~# iptables -t mangle -A POSTROUTING -p tcp --sport 22 -j CLASSIFY --set-class 0:f

This should result in the following table:

root@whle-ls1046a:~# iptables -t mangle --list -v
Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source     destination    

Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source     destination    

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source     destination    

Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source     destination    

Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source     destination    
    0     0 CLASSIFY   tcp  --  any    any     anywhere   anywhere       tcp dpt:ssh CLASSIFY set 0:f
    0     0 CLASSIFY   tcp  --  any    any     anywhere   anywhere       tcp spt:ssh CLASSIFY set 0:f

The configuration makes use of the mangle table which is designed for packet modification. While the packets themselves aren’t modified in this scenario, their socket buffer structure’s used by the kernel is, namely the priority field.

The first command simply flushes the mangle table’s configuration to make sure no other rules apply. The second command assigns the priority 15 to any TCP packet with the destination port being 22. The third command does so with the source port. This effectively covers all standard ssh connections.

The actual priority assignment is done, indirectly, by the --set-class 0:f fragment. From iptables-extensions manual:

CLASSIFY
This module allows you to set the skb->priority value (and thus clas-
sify the packet into a specific CBQ class).

--set-class major:minor

Set the major and minor class value. The values are always in-

terpreted as hexadecimal even if no 0x prefix is given.

Unfortunately the documentation doesn’t provide the actual correspondence between major:minor class specification and the affected skb->priority value. This can be found in iptable’s source (iptables-1.8.7/extensions/libxt_CLASSIFY.c):

static int CLASSIFY_string_to_priority(const char *s, unsigned int *p)
{
	unsigned int i, j;
	if (sscanf(s, "%x:%x", &i, &j) != 2)
		return 1;
	*p = TC_H_MAKE(i<<16, j);
	return 0;
}

and kernel’s source (include/uapi/linux/pkt_sched.h):

#define TC_H_MAJ_MASK (0xFFFF0000U)
#define TC_H_MIN_MASK (0x0000FFFFU)
...
#define TC_H_MAKE(maj,min) (((maj)&TC_H_MAJ_MASK)|((min)&TC_H_MIN_MASK))

From this it can be concluded that as long as major is 0 and minor < 0x10000 then skb->priority is simply the value of minor. To use a different priority 10, for example, one would have to use the --set-class 0:a. The values of skb->priority higher than 0xF aren’t recognized by the mqprio qdisc anyway.

The usage of POSTROUTING chain signifies that the prioritization occurs right before the packet is sent to the network interface. It’s not strictly required to do it at the last moment and the FORWARD chain could be used as well. The OUTPUT chain, however, applies only to the packets generated by whle_ls1046 itself, so the routed packets would remain unaffected, while PREROUTING and INPUT chains aren’t even accepted along with the CLASSIFY target by iptables command.

Setting tc

Set up the queues discipline for the eth1 interface.

whle_ls1046a
root@whle-ls1046a:~# tc qdisc del dev eth1 root handle 1:
root@whle-ls1046a:~# tc qdisc add dev eth1 \
    root handle 1: mqprio num_tc 4 map 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 hw 1

The first command deletes any qdisc that may have been assigned to eth1 already. It may return an error when there is none, that’s not a problem.

The second command initiates 1024 DPAA queues in 4 different classes, each having a different DPAA priority (to distinguish it from skb priority that iptables is concerned with).

Below is the detailed description of the command’s fragments and what setup they result in.

mqprio

Use the “Multiqueue Priority Qdisc (Offloaded Hardware QOS)“ man:tc-mqprio.

num_tc 4
Use 4 queue classes, identified with numbers 0, 1, 2, 3. The queue’s class (Packet Queue’s Traffic Class in DPAA’s nomenclature) maps directly to the Work Queue in eth1’s Direct Connected Channel the queue is put on when having at least one packet (see Upstream DPAA driver and Channels):

Traffic Class

Work Queue

0

WQ6

1

WQ2

2

WQ1

3

WQ0

DPAA arranges Work Queues into 4 groups, ordered by their increasing DPAA priority:

Work Queue Group

DPAA Priority Name

DPAA Priority Num

WQ5, WQ6, WQ7

Low

1

WQ2, WQ3, WQ4

Medium

2

WQ1

High

3

WQ0

Highest

4

They are governed by a strict priority rule: a group with priority number n must be emptied of all packets before any packet from the group with number k lower than n can be serviced (k, n in {1,2,3,4}). Because WQ0, WQ1, WQ2, WQ6 corresponding to different traffic classes all belong to different groups, the strict priority rule effectively applies to the traffic classes 0, 1, 2, 3 as well, with 3 having the highest priority.

Each class has exactly 256 queues, resulting in total of 1024 queues in this case. This number cannot be changed with the tc’s queues argument - it’s silently ignored by DPAA’s driver, including the provided offsets. No more than 4 classes can be used. When less than 4 classes are used then the queues are trimmed from the higher priority end. For example, using num_tc 3 would result in 768 queues (3 * 256) belonging to traffic classes 0, 1, 2, using work queues WQ6, WQ2, WQ1.

map 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3

This argument maps skb priority 0 .. 15 to the traffic class 0 .. 3 using the value’s position in the sequence as argument.

skb->priority

Traffic Class

0, 1, 2, 3

0

4, 5, 6, 7

1

8, 9, 10, 11

2

12, 13, 14, 15

3

hw 1

Tells tc to actually use hardware offloading implemented by DPAA architecture instead of emulating this queue discipline in kernel.

The created queue discipline can be displayed with

root@whle-ls1046a:~# tc qdisc show dev eth1
qdisc mqprio 1: root tc 4 map 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 
             queues:(0:255) (256:511) (512:767) (768:1023) 
             mode:dcb
             shaper:dcb
qdisc pfifo_fast 0: parent 1:400 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: parent 1:3ff bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: parent 1:3fe bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: parent 1:3fd bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: parent 1:3fc bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
...
qdisc pfifo_fast 0: parent 1:3 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: parent 1:2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: parent 1:1 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1

The qdisc pfifo_fast 0: parent ... are sub-qdiscs which correspond directly to the DPAA frame queues listed as queues.

The full relation between the entities described so far is as follows:

skb->priority

Traffic Class

Work Queue

DPAA Priority Name

DPAA Priority Num

DPAA Frame Queue

qdisc pfifo_fast

0, 1, 2, 3

0

WQ6

Low

1

(0:255)

1:1 … 1:100

4, 5, 6, 7

1

WQ2

Medium

2

(256:511)

1:101 … 1:200

8, 9, 10, 11

2

WQ1

High

3

(512:767)

1:201 … 1:300

12, 13, 14, 15

3

WQ0

Highest

4

(768:1023)

1:301 … 1:400

Keep in mind that the “DPAA Frame Queue“ indexes displayed in tc’s output are not the same as Frame Queue IDs for Tx which can be displayed for eth1 as Tx: 3337 - 4360 with

root@whle-ls1046a:~# cat /sys/class/net/eth1/fqids
Rx error: 2181
Rx default: 2182
Rx PCD: 2304 - 2431
Tx confirmation (mq): 2183 - 2303
Tx confirmation (mq): 2432 - 3334
Tx error: 3335
Tx default confirmation: 3336
Tx: 3337 - 4360

The Frame Queue IDs are low-level DPAA identifiers which must be globally unique across all network interfaces. The (0:255) (256:511) (512:767) (768:1023) ids are tc-specific and describe only the queues assigned to the interface provided in the argument, in this case eth1.

Although it’s not enforced by the configuration, it can be established empirically that packets from iperf3’s traffic fall into classes 0 and 1. Assuming that the iptables configuration properly assigns ssh packets the skb priority 15 before sending them to eth1 for transfer they should all fall into traffic class 3 and be enqueued on the highest priority Work Queue WQ0, to be serviced before all iperf3 packets. This should result in iperf3’s traffic being stopped completely for the duration of scp’s transfer.

Performing the test

Start the iperf3 flow to saturate the link.

PC
user@PC:~$ iperf3 --client 192.168.10.1 --time 0 --reverse
Connecting to host 192.168.10.1, port 5201
Reverse mode, remote host 192.168.10.1 is sending
[  5] local 192.168.3.1 port 41978 connected to 192.168.10.1 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec   112 MBytes   941 Mbits/sec                  
[  5]   1.00-2.00   sec   112 MBytes   942 Mbits/sec                  
[  5]   2.00-3.00   sec   112 MBytes   941 Mbits/sec                  
...

Perform the scp transfer in another console.

PC
root@PC:~# time ip netns exec isolated_ns scp /home/user/files/download.xz user@192.168.3.1:
download.xz                                                           100%  706MB 111.7MB/s   00:06    
real	0m6,773s
user	0m3,766s
sys	0m1,534s

The file transfer time is basically the same as if there was no other data transferred on the link. Meanwhile in iperf3’s logs:

...
[  5]  17.00-18.00  sec   112 MBytes   941 Mbits/sec                  
[  5]  18.00-19.00  sec   112 MBytes   942 Mbits/sec                  
[  5]  19.00-20.00  sec   112 MBytes   942 Mbits/sec                  
[  5]  20.00-21.00  sec   112 MBytes   941 Mbits/sec                  
[  5]  21.00-22.00  sec  70.2 MBytes   589 Mbits/sec                  <-- scp transfer start
[  5]  22.00-23.00  sec  0.00 Bytes  0.00 bits/sec                  
[  5]  23.00-24.00  sec  0.00 Bytes  0.00 bits/sec                  
[  5]  24.00-25.00  sec  0.00 Bytes  0.00 bits/sec                  
[  5]  25.00-26.00  sec  0.00 Bytes  0.00 bits/sec                  
[  5]  26.00-27.00  sec  0.00 Bytes  0.00 bits/sec                  
[  5]  27.00-28.00  sec  5.87 MBytes  49.3 Mbits/sec                  <-- scp transfer finish
[  5]  28.00-29.00  sec   112 MBytes   942 Mbits/sec                  
[  5]  29.00-30.00  sec   112 MBytes   942 Mbits/sec                  
[  5]  30.00-31.00  sec   112 MBytes   942 Mbits/sec                  
[  5]  31.00-32.00  sec   112 MBytes   942 Mbits/sec                  
...

The iperf3 flow ceased completely during the scp transfer, showcasing the strict priority rule in action.

  • No labels