...
This article discusses the Multiqueue Priority Queue Discipline (mqprio
qdisc) hardware offloading implemented by the standard kernel DPAA driver, how to set it up with the tc
command and how to monitor it.
...
tc
Command Analysis
The driver’s documentation mentions the following command:
Code Block |
---|
tc qdisc add dev <int>‹int› root handle 1: \ mqprio num_tc 4 map 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 hw 1 |
Set up the queues discipline for the eth1
interface.
whle_ls1046a
Code Block |
---|
root@whle-ls1046a:~# tc qdisc del dev eth1 root handle 1:
root@whle-ls1046a:~# tc qdisc add dev eth1 \
root handle 1: mqprio num_tc 4 map 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 hw 1 |
The first command deletes any qdisc that may have been assigned to eth1
already. It may return an error when there is none, that’s not a problem.
This command encapsulates
traffic classes,
packets skb priority,
mapping between skb priority and traffic classes,
DPAA Frame Queues,
DPAA Work Queues,
device’s channel
The second command initiates 1024
DPAA queues in 4
different classes, each having a different DPAA priority (to distinguish it from skb priority that iptables
is concerned with).
Below is the detailed description of the command’s fragments and what setup they result in.
mqprio
Use the “Multiqueue Priority Qdisc (Offloaded Hardware QOS)“ man:tc-mqprio.
num_tc 4
Use 4 queue classes, identified with numbers 0
, 1
, 2
, 3
. The queue’s class (Packet Queue’s Traffic Class in DPAA’s nomenclature) maps directly to the Work Queue in eth1
’s Direct Connected Channel the queue is put on when having at least one packet (see Upstream DPAA driver and Channels):
Traffic Class | Work Queue |
---|---|
0 | WQ6 |
1 | WQ2 |
2 | WQ1 |
3 | WQ0 |
DPAA arranges Work Queues into 4 groups, ordered by their increasing DPAA priority:
Work Queue Group | DPAA Priority Name | DPAA Priority Num |
---|---|---|
WQ5, WQ6, WQ7 | Low | 1 |
WQ2, WQ3, WQ4 | Medium | 2 |
WQ1 | High | 3 |
WQ0 | Highest | 4 |
They are governed by a strict priority rule: a group with priority number n must be emptied of all packets before any packet from the group with number k lower than n can be serviced (k, n in {1,2,3,4}). Because WQ0, WQ1, WQ2, WQ6 corresponding to different traffic classes all belong to different groups, the strict priority rule effectively applies to the traffic classes 0
, 1
, 2
, 3
as well, with 3
having the highest priority.
Each class has exactly 256
queues, resulting in total of 1024
queues in this case. This number cannot be changed with the tc
’s queues
argument - it’s silently ignored by DPAA’s driver, including the provided offsets. No more than 4 classes can be used. When less than 4 classes are used then the queues are trimmed from the higher priority end. For example, using num_tc 3
would result in 768
queues (3 * 256
) belonging to traffic classes 0
, 1
, 2
, using work queues WQ6, WQ2, WQ1.
map 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3
This argument maps skb priority 0 .. 15
to the traffic class 0 .. 3
using the value’s position in the sequence as argument.
skb->priority | Traffic Class |
---|---|
0, 1, 2, 3 | 0 |
4, 5, 6, 7 | 1 |
8, 9, 10, 11 | 2 |
12, 13, 14, 15 | 3 |
hw 1
Tells tc
to actually use hardware offloading implemented by DPAA architecture instead of emulating this queue discipline in kernel.It involves explicitly:
the choice of queue discipline,
definition of network traffic classes,
mapping between skb priority and traffic classes,
and implicitly:
packets skb priority itself,
DPAA Frame Queues,
DPAA Work Queues,
device’s Dedicated Channel.
Each of these elements and the relation between them will be explained below, by analyzing the command’s parts.
dev ‹int›
The tc
command defines how the egress traffic is to be prioritized in case of specific interface’s congestion, when the operating system produces more packets to send than link’s capacity. As such it operates on a network device ‹int›
. For WHLE boards it can be any of eth0
, eth1
, eth2
, eth3
, eth4
, eth5
. When there is no congestion condition the traffic control loses its meaning - any packet arriving on the interface from the operating system gets transferred immediately.
mqprio
The “queue discipline“, or “qdisc“ for short, is a method of handling the congestion condition. The tc
command provides multiple queue disciplines to chose from, identified by a short names like fq
, choke
, sfb
, or mqprio
(see man:tc). Using mqprio
tells tc
to pick the “Multiqueue Priority Qdisc (Offloaded Hardware QOS)“ (man:tc-mqprio). While any of the tc
provided qdiscs can be used on WHLE-LS1046A/26A board (provided that the kernel was compiled with a proper option enabling it), the mqprio
qdisc is special in the sense that the logic of deciding which packet to serve next is handled by LS1046A/26A hardware as an integral part of DPAA architecture, freeing CPU of cycles required to handle traffic.
The multiqueue priority qdisc is based on the existence of multiple hardware packet queues, all associated with a single network interface. In case of the discussed command a total of 1024 DPAA Frame Queues are initialized. The queues are divided into classes with different priorities. Packets from a single network flow are enqueued on the same queue, which helps in preserving the ordering. All queues within the same class are treated equally. The reason for having more than one queue per class is to smooth out the sharing of a link between many flows.
num_tc 4
Use 4 queue classes, identified with numbers 0
, 1
, 2
, 3
. The queue’s class (Packet Queue’s Traffic Class in DPAA’s nomenclature) maps directly to the Work Queue in ‹int›
’s Direct Connected Channel the queue is put on when having at least one packet (see Upstream DPAA driver and Channels):
Traffic Class | Work Queue |
---|---|
0 | WQ6 |
1 | WQ2 |
2 | WQ1 |
3 | WQ0 |
DPAA arranges Work Queues into 4 groups, ordered by their increasing DPAA priority:
Work Queue Group | DPAA Priority Name | DPAA Priority Num |
---|---|---|
WQ5, WQ6, WQ7 | Low | 1 |
WQ2, WQ3, WQ4 | Medium | 2 |
WQ1 | High | 3 |
WQ0 | Highest | 4 |
They are governed by a strict priority rule: a group with priority number n must be emptied of all packets before any packet from the group with number k lower than n can be serviced (k, n in {1,2,3,4}). Because WQ0, WQ1, WQ2, WQ6 corresponding to different traffic classes all belong to different groups, the strict priority rule effectively applies to the traffic classes 0
, 1
, 2
, 3
as well, with 3
having the highest priority.
Each class has exactly 256
queues, resulting in total of 1024
queues in this case. This number cannot be changed with the tc
’s queues
argument - it’s silently ignored by DPAA’s driver, including the provided offsets. No more than 4 classes can be used. When less than 4 classes are used then the queues are trimmed from the higher priority end. For example, using num_tc 3
would result in 768
queues (3 * 256
) belonging to traffic classes 0
, 1
, 2
, using work queues WQ6, WQ2, WQ1.
map 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3
This argument maps “skb priority“ 0 .. 15
to the traffic class 0 .. 3
using the value’s position in the sequence as argument.
skb priority | Traffic Class |
---|---|
0, 1, 2, 3 | 0 |
4, 5, 6, 7 | 1 |
8, 9, 10, 11 | 2 |
12, 13, 14, 15 | 3 |
The “skb priority“, often written as “skb->priority“ in documentation, is the field in the “socket buffer“ structure associated with an IP packet, used by kernel through the whole packet’s processing pipeline. It’s related to the IP’s TOS field, although it can be changed with the use of cgroups or iptables
. The control over the skb priority of packets is key to effective use LS1046A’s hardware prioritization feature and is discussed at length in Direct Connection (cgroups) and Ssh Prioritization (iptables).
hw 1
Tells tc
to actually use hardware offloading implemented by DPAA architecture instead of emulating this queue discipline in kernel.
Example
Queues Definition
Code Block |
---|
root@whle-ls1046a:~# tc qdisc add dev eth1 root handle 1: \
mqprio num_tc 4 map 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 hw 1 |
The created queue discipline can be displayed with
...
Code Block |
---|
Rx error: 2181 Rx default: 2182 Rx PCD: 2304 - 2431 Tx confirmation (mq): 2183 - 2303 Tx confirmation (mq): 2432 - 3334 Tx error: 3335 Tx default confirmation: 3336 Tx: 3337 - 4360 |
The Frame Queue IDs are low-level DPAA identifiers which must be globally unique across all network interfaces. The (0:255) (256:511) (512:767) (768:1023)
ids are tc
-specific and describe only the queues assigned to the interface provided in the argument, in this case eth1
.
Although it’s not enforced by the configuration, it can be established empirically that packets from iperf3
’s traffic fall into classes 0
and 1
. Assuming that the iptables
configuration properly assigns ssh packets the skb priority 15
before sending them to eth1
for transfer they should all fall into traffic class 3
and be enqueued on the highest priority Work Queue WQ0, to be serviced before all iperf3
packets. This should result in iperf3
’s traffic being stopped completely for the duration of scp
’s transfer.
Performing the test
Start the iperf3
flow to saturate the link.
PC
Code Block |
---|
user@PC:~$ iperf3 --client 192.168.10.1 --time 0 --reverse |
Code Block |
---|
Connecting to host 192.168.10.1, port 5201 Reverse mode, remote host 192.168.10.1 is sending [ 5] local 192.168.3.1 port 41978 connected to 192.168.10.1 port 5201 [ ID] Interval default confirmation: 3336 Tx: 3337 - 4360 |
The Frame Queue IDs are low-level DPAA identifiers which must be globally unique across all network interfaces. The (0:255) (256:511) (512:767) (768:1023)
ids are tc
-specific and describe only the queues assigned to the interface provided in the argument, in this case eth1
.
Queues Monitoring
It’s sometimes very useful to display the usage of all the defined Frame Queues. This can be done with the -statistics
option:
Code Block |
---|
root@whle-ls1046a:~# tc -statistics qdisc show dev eth1 |
Code Block |
---|
qdisc mqprio 1: root tc 4 map 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 queues:(0:255) (256:511) (512:767) (768:1023) mode:dcb Transfer Bitrateshaper:dcb [ Sent 5]738655392 bytes 0.00-1.00 sec 112 MBytes 941 Mbits/sec [ 5] 1.00-2.00 sec 112 MBytes 942 Mbits/sec [ 5] 2.00-3.00 sec 112 MBytes 941 Mbits/sec ... |
Perform the scp
transfer in another console.
PC
Code Block |
---|
root@PC:~# time ip netns exec isolated_ns scp /home/user/files/download.xz user@192.168.3.1: |
Code Block |
---|
download.xz 100% 706MB 111.7MB/s 00:06
real 0m6,773s
user 0m3,766s
sys 0m1,534s |
The file transfer time is basically the same as if there was no other data transferred on the link. Meanwhile in iperf3
’s logs:
Code Block |
---|
...
[ 5] 17.00-18.00 sec 112 MBytes 941 Mbits/sec
[ 5] 18.00-19.00 sec 112 MBytes 942 Mbits/sec
[ 5] 19.00-20.00 sec 112 MBytes 942 Mbits/sec
[ 5] 20.00-21.00 sec 112 MBytes 941 Mbits/sec
[ 5] 21.00-22.00 sec 70.2 MBytes 589 Mbits/sec <-- scp transfer start
[ 5] 22.00-23.00 sec 0.00 Bytes 0.00 bits/sec
[ 5] 23.00-24.00 sec 0.00 Bytes 0.00 bits/sec
[ 5] 24.00-25.00 sec 0.00 Bytes 0.00 bits/sec
[ 5] 25.00-26.00 sec 0.00 Bytes 0.00 bits/sec
[ 5] 26.00-27.00 sec 0.00 Bytes 0.00 bits/sec
[ 5] 27.00-28.00 sec 5.87 MBytes 49.3 Mbits/sec <-- scp transfer finish
[ 5] 28.00-29.00 sec 112 MBytes 942 Mbits/sec
[ 5] 29.00-30.00 sec 112 MBytes 942 Mbits/sec
[ 5] 30.00-31.00 sec 112 MBytes 942 Mbits/sec
[ 5] 31.00-32.00 sec 112 MBytes 942 Mbits/sec
... |
...
487904 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc pfifo_fast 0: parent 1:400 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc pfifo_fast 0: parent 1:3ff bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
...
qdisc pfifo_fast 0: parent 1:2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc pfifo_fast 0: parent 1:1 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0 |
The rather impractically long output can be reduced to just the used queues statistics with
Code Block |
---|
root@whle-ls1046a:~# tc -statistics qdisc show dev eth1 \
| tail -n +7 \
| grep -C 1 -e " Sent [^0]" |
Code Block |
---|
qdisc pfifo_fast 0: parent 1:2e4 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
Sent 888 bytes 9 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
--
qdisc pfifo_fast 0: parent 1:2df bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
Sent 731852460 bytes 483390 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
--
qdisc pfifo_fast 0: parent 1:1b1 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
Sent 1016 bytes 11 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
--
qdisc pfifo_fast 0: parent 1:166 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
Sent 1895528 bytes 1252 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
--
qdisc pfifo_fast 0: parent 1:10b bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
Sent 4905360 bytes 3240 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
--
qdisc pfifo_fast 0: parent 1:76 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
Sent 140 bytes 2 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0 |
or even just the list of used queues with:
Code Block |
---|
root@whle-ls1046a:~# tc -statistics qdisc show dev eth1 \
| tail -n +7 \
| grep -C 1 -e " Sent [^0]" \
| grep parent |
Code Block |
---|
qdisc pfifo_fast 0: parent 1:2e4 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: parent 1:2df bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: parent 1:1b1 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: parent 1:166 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: parent 1:100 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: parent 1:76 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 |
This particular output indicates that, so far, 3 classes were used to transfer data
class
2
: queues2e4
,2df
,class
1
: queues1b1
,166
,class
0
: queues100
,76
.
If some traffic was expected to be classified into the highest priority class 3
then this list would identify a problem on either skb priority assignment level or the classification level itself.