Control groups, or cgroups, are a way in Linux to control processes' hardware resources utilization by defining the resources limits, grouping them in a hierarchical structure and assigning processes to them. Cgroups can be used, in particular, to specify the skb priority of all network packets generated by specific process. This provides a convenient way to prioritize network traffic generated in communication with the WHLE board itself (as opposed to the traffic passing through it when it’s used as a router, a case described in Ssh Prioritization (iptables)).
Connection Diagram
The network used is very straightforward and consists of a single 1 Gb/s link between a testing machine (PC
) and a WHLE board (whle_ls1046
). Two iperf3
streams sending data from whle_ls1046
to PC
will be competing for the link’s throughput. Different traffic classes will be used using the cgroups mechanism for the associated iperf3
processes and the resulting changes in data transfer speed will be observed.
Setup
Network Setup
PC
root@PC~# ip addr flush enxc84d4423262e root@PC~# ip address add 192.168.3.1/24 dev enxc84d4423262e root@PC~# ip link set dev enxc84d4423262e up
whle_ls1046
root@whle-ls1046a:~# ip address flush eth1 root@whle-ls1046a:~# ip addr add 192.168.3.2/24 dev eth1 root@whle-ls1046a:~# ip link set dev eth1 up
By default the network interfaces on WHLE are controlled by NetworkManager service and the effects of the ip
commands above will be periodically overwritten with its own configuration. It may be necessary to temporarily stop the service
root@whle-ls1046a:~# systemctl stop NetworkManager
or to configure it to ignore the eth1
interface with a configuration like
root@whle-ls1046a:~# echo ' [main] plugins=ifupdown,keyfile [keyfile] unmanaged-devices=interface-name:eth1 ' > /etc/NetworkManager/NetworkManager.conf root@whle-ls1046a:~# systemctl restart NetworkManager
Cgroups Hierarchy Preparation
The cgroups hierarchy can be defined in many ways. Instead of creating the minimal hierarchy specific for the given scenario a more generic directory tree will be used, allowing for convenient assignment of skb priority from the 0 .. 15
range, thus covering all priority levels recognized by the tc
command, to all network packets generated by a process with a given PID, in a straightforward fashion like
echo ‹pid› > /sys/fs/cgroup/net_prio/prio-‹skb-priority›/cgroup.procs
for example:
echo 730 > /sys/fs/cgroup/net_prio/prio-4/cgroup.procs
The script is as follows:
cgroups-setup.sh:
#!/usr/bin/bash mkdir /sys/fs/cgroup/net_prio mount -t cgroup -o net_prio none /sys/fs/cgroup/net_prio mkdir /sys/fs/cgroup/net_prio/prio-{0..15} for p in {0..15}; do for if in $(cd /sys/class/net/; ls); do echo "${if} ${p}" > /sys/fs/cgroup/net_prio/prio-${p}/net_prio.ifpriomap done done
mkdir /sys/fs/cgroup/net_prio
This command creates the root directory for network priority hierarchy inside the/sys/fs/cgroup
which should already be present on the system. The namenet_prio
is arbitrary. It was chosen to reflect the name of the module used to mount the cgroups filesystem there.mount -t cgroup -o net_prio none /sys/fs/cgroup/net_prio
This command mounts the virtual filesystem used to communicate to the kernel the PIDs priority assignments. The-t cgroup
signifies the cgroups V1. Unfortunately the more modern cgroups V2 cannot be used in this case as thenet_prio
module is not defined for it yet. Upon mounting the system the following listing should appear:root@whle-ls1046a:~# ls -1 /sys/fs/cgroup/net_prio cgroup.clone_children cgroup.procs cgroup.sane_behavior net_prio.ifpriomap net_prio.prioidx notify_on_release release_agent tasks
Of these files only the following are relevant in further discussion:
net_prio.ifpriomap
The default priorities per network interface. More details below.cgroups.procs
List of all PIDs whose packets priority isn’t modified in any way.
mkdir /sys/fs/cgroup/net_prio/prio-{0..15}
Create directoriesprio-0
,prio-1
, …,prio-15
inside the/sys/fs/cgroup/net_prio
. Each of them will be automatically populated with files:root@whle-ls1046a:~# ls -1 /sys/fs/cgroup/net_prio/prio-13 cgroup.clone_children cgroup.procs net_prio.ifpriomap net_prio.prioidx notify_on_release tasks
Again, only two are of concern here:
net_prio.ifpriomap
The mapping of network interfaces to skb priorities, likeroot@whle-ls1046a:~# cat /sys/fs/cgroup/net_prio/prio-13/net_prio.ifpriomap lo 0 eth0 0 eth1 4 eth2 4 eth3 8 eth4 0 eth5 0
While the initial discussion of cgroups mentioned assigning skb priority to PIDs, the actual priority assignment’s subject is the (PID, interface) pair. This file covers the second part.
cgroups.procs
List of all PIDs whose packets are assigned the priority according to the map given innet_prio.ifpriomap
.
echo "${if} ${p}" > /sys/fs/cgroup/net_prio/prio-${p}/net_prio.ifpriomap
This line, executed for each network interfaceif
, results in a uniform mapping inprio-‹p›/net_prio.ifpriomap
likeeth0 ‹p› eth1 ‹p› eth2 ‹p› eth3 ‹p› eth4 ‹p› eth5 ‹p›
for example:
root@whle-ls1046a:~# cat /sys/fs/cgroup/net_prio/prio-13/net_prio.ifpriomap lo 0 eth0 13 eth1 13 eth2 13 eth3 13 eth4 13 eth5 13
This allows for abstracting over the interface prioritization granularity which isn’t needed.
Save the script in the cgroups-setup.sh
file and run it on a WHLE-LS1046A board.
whle_ls1046a
root@whle-ls1046a:~# chmod +x cgroups-setup.sh root@whle-ls1046a:~# ./cgroups-setup.sh
Iperf3 Setup
PC
Two iperf3
streams will be created, with servers launched on PC
and clients on whle_ls1046
, with the default client → server data flow direction.
The direction of the transfer is important in this experiment. The notion of queue prioritization in the DPAA architecture (or any other mqprio
architecture for that matter) is only applicable to the egress traffic. Sending data to the remote location 192.168.3.1
from isolated namespace implies the following order of processing for the majority of iperf3
traffic:
whle_ls1046
’s CPU,whle_ls1046
’seth1
interface (egress),PC
’senxc84d4423262e
interface (ingress),PC
’s CPU.
Given that the maximum throughput of 1 Gb/s for the whole connection leaves plenty of space on whle_ls1046
’s CPU (let alone PC
’s), the enxc84d4423262e
- eth1
link becomes the bottleneck, with packets congesting at the eth1
funnel where the DPAA prioritization can come into play. Having the transfer go the other way, eg. with clients run on PC
and servers on whle_ls1046
, the funnel would form at the testing machine’s enxc84d4423262e
interface.
Given the peculiarities of setting up iperf3
process' priority on whle_ls1046
it’s easier to track data transfer speed on PC
’s side by launching iperf3
in blocking mode instead of as a daemon.
PC, console 1
user@PC~$ iperf3 --server --port 5201
----------------------------------------------------------- Server listening on 5201 -----------------------------------------------------------
PC, console 2
user@PC~$ iperf3 --server --port 5202
----------------------------------------------------------- Server listening on 5202 -----------------------------------------------------------
whle_ls1046
Launching iperf3
clients on the WHLE board must follow a specific protocol:
Clean any
mqprio
qdiscs on theeth1
interface.Run
iperf3
client.Obtain the client’s PID
Assign a specific
net_prio
priority to the given PID, using cgroups.Define the
mqprio
qdisc on theeth1
interface.
While the (2) < (3) < (4) ordering is pretty obvious, the rest may not be so. Practice has shown that changing packets priority of a process with an ongoing connection while the mqprio
is already set up leads to inconsistent results, with the change sometimes reflected on the wire and sometimes not. In contrast, setting mqprio
qdisc while all the traffic is already set up and running results in consistent behavior.
Because of this it’s useful to define some bash procedures that would implement the above ordering. First a launch_iperf_with_priority
function will be defined which starts the iperf3
client and assigns it a specific priority.
whle_ls1046
root@whle-ls1046a:~# launch_iperf_with_priority() { local port=$1 local prio=$2 local iperf_time=$3 echo "Launching iperf3, port ${port}, priority ${prio}" iperf3 --port "${port}" --client 192.168.3.1 --time "${iperf_time}" > /dev/null & local pid=$(pgrep -f "iperf3 --port ${port}") echo "${pid}" > "/sys/fs/cgroup/net_prio/prio-${prio}/cgroup.procs" }
The opposite operation will be realized by the kill_iperf
procedure.
whle_ls1046
root@whle-ls1046a:~# kill_iperf() { local port=$1 pkill -f "iperf3 --port ${port}" }
Example uage:
whle_ls1046
root@whle-ls1046a:~# launch_iperf_with_priority 5201 0 10 Launching iperf3, port 5201, priority 0 [1] 493 root@whle-ls1046a:~# kill_iperf 5201
This would create a connection with at server at 192.168.3.1
, port 5201
, for 10 seconds, with the packets sent having skb priority 0
. Then it will be killed without waiting for it to finish.
Building on this a third, final procedure will be defined, which coordinates launching two iperf3 streams with different priorities, for the same time period.
whle_ls1046
root@whle-ls1046a:~# test_iperf() { local port1=$1 local prio1=$2 local port2=$3 local prio2=$4 local iperf_time=$5 kill_iperf "${port1}" kill_iperf "${port2}" tc qdisc del dev eth1 root handle 1: launch_iperf_with_priority "${port1}" "${prio1}" "${iperf_time}" launch_iperf_with_priority "${port2}" "${prio2}" "${iperf_time}" tc qdisc add dev eth1 root handle 1: mqprio num_tc 4 \ map 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 hw 1 sleep ${iperf_time} }
The example usage will be given below.
Tests
Same priorities
Assuming that iperf3
servers at ports 5201
, 5202
are running on PC
, run the following command on WHLE:
whle_ls1046
root@whle-ls1046a:~# test_iperf 5201 4 5202 4 6 Launching iperf3, port 5201, priority 4 [1] 735 Launching iperf3, port 5202, priority 4 [2] 738
This would create two iperf3
streams with the same skb priority 4, mapping to the traffic class 1
. Meanwhile, on the PC
side:
PC, console 1
Accepted connection from 192.168.3.2, port 54202 [ 5] local 192.168.3.1 port 5201 connected to 192.168.3.2 port 54208 [ ID] Interval Transfer Bitrate [ 5] 0.00-1.00 sec 55.0 MBytes 462 Mbits/sec [ 5] 1.00-2.00 sec 56.1 MBytes 471 Mbits/sec [ 5] 2.00-3.00 sec 56.1 MBytes 471 Mbits/sec [ 5] 3.00-4.00 sec 56.1 MBytes 471 Mbits/sec [ 5] 4.00-5.00 sec 56.1 MBytes 471 Mbits/sec [ 5] 5.00-6.00 sec 56.1 MBytes 471 Mbits/sec [ 5] 6.00-6.04 sec 2.46 MBytes 467 Mbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate [ 5] 0.00-6.04 sec 338 MBytes 469 Mbits/sec receiver ----------------------------------------------------------- Server listening on 5201 -----------------------------------------------------------
PC, console 2
Accepted connection from 192.168.3.2, port 46446 [ 5] local 192.168.3.1 port 5202 connected to 192.168.3.2 port 46458 [ ID] Interval Transfer Bitrate [ 5] 0.00-1.00 sec 53.9 MBytes 452 Mbits/sec [ 5] 1.00-2.00 sec 56.1 MBytes 471 Mbits/sec [ 5] 2.00-3.00 sec 56.1 MBytes 471 Mbits/sec [ 5] 3.00-4.00 sec 56.1 MBytes 471 Mbits/sec [ 5] 4.00-5.00 sec 56.1 MBytes 471 Mbits/sec [ 5] 5.00-6.00 sec 56.1 MBytes 471 Mbits/sec [ 5] 6.00-6.04 sec 3.69 MBytes 793 Mbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate [ 5] 0.00-6.04 sec 338 MBytes 470 Mbits/sec receiver ----------------------------------------------------------- Server listening on 5202 -----------------------------------------------------------
The experiment shows that the link’s throughput is shared evenly for traffic in the same class. Similar results would be obtained with calls:
test_iperf 5201 0 5202 0 6 test_iperf 5201 8 5202 8 6 test_iperf 5201 12 5202 12 6
(That would cover all 4 traffic classes defined by tc
, with skb priorities different from 0, 4, 8, 12 resulting in the same classes set.)
Different priorities
Run test_iperf
with different skb priorities, making sure that they map to different traffic classes, for example:
whle_ls1046
root@whle-ls1046a:~# test_iperf 5201 0 5202 4 6 Launching iperf3, port 5201, priority 0 [1] 774 Launching iperf3, port 5202, priority 4 [2] 776
Meanwhile, on the PC
side:
PC, console 1
Accepted connection from 192.168.3.2, port 48344 [ 5] local 192.168.3.1 port 5201 connected to 192.168.3.2 port 48350 [ ID] Interval Transfer Bitrate [ 5] 0.00-1.00 sec 8.84 MBytes 74.1 Mbits/sec [ 5] 1.00-2.00 sec 0.00 Bytes 0.00 bits/sec [ 5] 2.00-3.00 sec 0.00 Bytes 0.00 bits/sec [ 5] 3.00-4.00 sec 0.00 Bytes 0.00 bits/sec [ 5] 4.00-5.00 sec 0.00 Bytes 0.00 bits/sec [ 5] 5.00-6.00 sec 0.00 Bytes 0.00 bits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate [ 5] 0.00-6.08 sec 8.84 MBytes 12.2 Mbits/sec receiver ----------------------------------------------------------- Server listening on 5201 -----------------------------------------------------------
PC, console 2
Accepted connection from 192.168.3.2, port 42704 [ 5] local 192.168.3.1 port 5202 connected to 192.168.3.2 port 42720 [ ID] Interval Transfer Bitrate [ 5] 0.00-1.00 sec 100 MBytes 839 Mbits/sec [ 5] 1.00-2.00 sec 112 MBytes 942 Mbits/sec [ 5] 2.00-3.00 sec 112 MBytes 942 Mbits/sec [ 5] 3.00-4.00 sec 112 MBytes 942 Mbits/sec [ 5] 4.00-5.00 sec 112 MBytes 942 Mbits/sec [ 5] 5.00-6.00 sec 112 MBytes 942 Mbits/sec [ 5] 6.00-6.04 sec 4.71 MBytes 937 Mbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate [ 5] 0.00-6.04 sec 666 MBytes 925 Mbits/sec receiver ----------------------------------------------------------- Server listening on 5202 -----------------------------------------------------------
This shows that the traffic class 1
(skb priority 4) has a strict priority over traffic class 0
(skb priority 0). Similar results would be obtained with any of the calls:
test_iperf 5201 0 5202 8 6 test_iperf 5201 0 5202 12 6 test_iperf 5201 4 5202 8 6 test_iperf 5201 4 5202 12 6 test_iperf 5201 8 5202 12 6
(That would cover all pairs of 4 traffic classes defined by tc
, with skb priorities different from 0, 4, 8, 12 resulting in one of the classes pairs from above.)
Add Comment