Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 8 Next »

Control groups, or cgroups, are a way in Linux to control processes' hardware resources utilization by defining the resources limits, grouping them in a hierarchical structure and assigning processes to them. Cgroups can be used, in particular, to specify the skb priority of all network packets generated by specific process. This provides a convenient way to prioritize network traffic generated in communication with the WHLE board itself (as opposed to the traffic passing through it when it’s used as a router, a case described in Ssh Prioritization (iptables)).

Connection Diagram

Setup

Network Setup

PC
root@PC~# ip addr flush enxc84d4423262e
root@PC~# ip address add 192.168.3.1/24 dev enxc84d4423262e
root@PC~# ip link set dev enxc84d4423262e up
whle_ls1046
root@whle-ls1046a:~# ip address flush eth1
root@whle-ls1046a:~# ip addr add 192.168.3.2/24 dev eth1
root@whle-ls1046a:~# ip link set dev eth1 up

By default the network interfaces on WHLE are controlled by NetworkManager service and the effects of the ip commands above will be periodically overwritten with its own configuration. It may be necessary to temporarily stop the service

root@whle-ls1046a:~# systemctl stop NetworkManager

or to configure it to ignore the eth1 interface with a configuration like

root@whle-ls1046a:~# echo '
[main]
plugins=ifupdown,keyfile

[keyfile]
unmanaged-devices=interface-name:eth1
' > /etc/NetworkManager/NetworkManager.conf
root@whle-ls1046a:~# systemctl restart NetworkManager

Cgroups Hierarchy Preparation

The cgroups hierarchy can be defined in many ways. Instead of creating the minimal hierarchy specific for the given scenario a more generic directory tree will be used, allowing for convenient assignment of skb priority from the 0 .. 15 range, thus covering all priority levels recognized by the tc command, to all network packets generated by a process with a given PID, in a straightforward fashion like

echo ‹pid› > /sys/fs/cgroup/net_prio/prio-‹skb-priority›/cgroup.procs

for example:

echo 730 > /sys/fs/cgroup/net_prio/prio-4/cgroup.procs

The script is as follows:

cgroups-setup.sh:
#!/usr/bin/bash
mkdir /sys/fs/cgroup/net_prio
mount -t cgroup -o net_prio none /sys/fs/cgroup/net_prio
mkdir /sys/fs/cgroup/net_prio/prio-{0..15}
for p in {0..15}; do
    for if in $(cd /sys/class/net/; ls); do
        echo "${if} ${p}" > /sys/fs/cgroup/net_prio/prio-${p}/net_prio.ifpriomap
    done
done
  • mkdir /sys/fs/cgroup/net_prio
    This command creates the root directory for network priority hierarchy inside the /sys/fs/cgroup which should already be present on the system. The name net_prio is arbitrary. It was chosen to reflect the name of the module used to mount the cgroups filesystem there.

  • mount -t cgroup -o net_prio none /sys/fs/cgroup/net_prio
    This command mounts the virtual filesystem used to communicate to the kernel the PIDs priority assignments. The -t cgroup signifies the cgroups V1. Unfortunately the more modern cgroups V2 cannot be used in this case as the net_prio module is not defined for it yet. Upon mounting the system the following listing should appear:

    root@whle-ls1046a:~# ls -1 /sys/fs/cgroup/net_prio
    cgroup.clone_children
    cgroup.procs
    cgroup.sane_behavior
    net_prio.ifpriomap
    net_prio.prioidx
    notify_on_release
    release_agent
    tasks

    Of these files only the following are relevant in further discussion:

    • net_prio.ifpriomap
      The default priorities per network interface. More details below.

    • cgroups.procs
      List of all PIDs whose packets priority isn’t modified in any way.

  • mkdir /sys/fs/cgroup/net_prio/prio-{0..15}
    Create directories prio-0, prio-1, …, prio-15 inside the /sys/fs/cgroup/net_prio. Each of them will be automatically populated with files:

    root@whle-ls1046a:~# ls -1 /sys/fs/cgroup/net_prio/prio-13
    cgroup.clone_children
    cgroup.procs
    net_prio.ifpriomap
    net_prio.prioidx
    notify_on_release
    tasks

    Again, only two are of concern here:

    • net_prio.ifpriomap
      The mapping of network interfaces to skb priorities, like

      root@whle-ls1046a:~# cat /sys/fs/cgroup/net_prio/prio-13/net_prio.ifpriomap
      lo 0
      eth0 0
      eth1 4
      eth2 4
      eth3 8
      eth4 0
      eth5 0

      While the initial discussion of cgroups mentioned assigning skb priority to PIDs, the actual priority assignment’s subject is the (PID, interface) pair. This file covers the second part.

    • cgroups.procs
      List of all PIDs whose packets are assigned the priority according to the map given in net_prio.ifpriomap.

  • echo "${if} ${p}" > /sys/fs/cgroup/net_prio/prio-${p}/net_prio.ifpriomap
    This line, executed for each network interface if, results in a uniform mapping in prio-‹p›/net_prio.ifpriomap like

    eth0 ‹p›
    eth1 ‹p›
    eth2 ‹p›
    eth3 ‹p›
    eth4 ‹p›
    eth5 ‹p›

    for example:

    root@whle-ls1046a:~# cat /sys/fs/cgroup/net_prio/prio-13/net_prio.ifpriomap
    lo 0
    eth0 13
    eth1 13
    eth2 13
    eth3 13
    eth4 13
    eth5 13

    This allows for abstracting over the interface prioritization granularity which isn’t needed.

Save the script in the cgroups-setup.sh file and run it on a WHLE-LS1046A board.

whle_ls1046a
root@whle-ls1046a:~# chmod +x cgroups-setup.sh
root@whle-ls1046a:~# ./cgroups-setup.sh

Iperf Setup

PC

Two iperf3 streams will be created, with servers launched on PC and clients on whle_ls1046, with the default client → server data flow direction.

The direction of the transfer is important in this experiment. The notion of queue prioritization in the DPAA architecture (or any other mqprio architecture for that matter) is only applicable to the egress traffic. Sending data to the remote location 192.168.3.1 from isolated namespace implies the following order of processing for the majority of iperf3 traffic:

  1. whle_ls1046’s CPU,

  2. whle_ls1046’s eth1 interface (egress),

  3. PC’s enxc84d4423262e interface (ingress),

  4. PC’s CPU.

Given that the maximum throughput of 1 Gb/s for the whole connection leaves plenty of space on whle_ls1046’s CPU (let alone PC’s), the enxc84d4423262e - eth1 link becomes the bottleneck, with packets congesting at the eth1 funnel where the DPAA prioritization can come into play. Having the transfer go the other way, eg. with clients run on PC and servers on whle_ls1046, the funnel would form at the testing machine’s enxc84d4423262e interface.

Given the peculiarities of setting up iperf3 process' priority on whle_ls1046 it’s easier to track data transfer speed on PC’s side by launching iperf3 in blocking mode instead of as a daemon.

PC, console 1
user@PC~$ iperf3 --server --port 5201
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
PC, console 2
user@PC~$ iperf3 --server --port 5202
-----------------------------------------------------------
Server listening on 5202
-----------------------------------------------------------

whle_ls1046

Launching iperf3 clients on the WHLE board must follow a specific protocol:

  1. Clean any mqprio qdiscs on the eth1 interface.

  2. Run iperf3 client.

  3. Obtain the client’s PID

  4. Assign a specific net_prio priority to the given PID, using cgroups.

  5. Define the mqprio qdisc on the eth1 interface.

While the (2) < (3) < (4) ordering is pretty obvious, the rest may not be so. Practice has shown that changing packets priority of a process with an ongoing connection while the mqprio is already set up leads to inconsistent results, with the change sometimes reflected on the wire and sometimes not. In contrast, setting mqprio qdisc while all the traffic is already set up and running results in consistent behavior.

Because of this it’s useful to define some bash procedures that would implement the above ordering. First a launch_iperf_with_priority function will be defined which starts the iperf3 client and assigns it a specific priority.

whle_ls1046
root@whle-ls1046a:~#
launch_iperf_with_priority() {
    local port=$1
    local prio=$2
    local iperf_time=$3
    echo "Launching iperf3, port ${port}, priority ${prio}"
    iperf3 --port "${port}" --client 192.168.3.1 --time "${iperf_time}" > /dev/null &
    local pid=$(pgrep -f "iperf3 --port ${port}")
    echo "${pid}" > "/sys/fs/cgroup/net_prio/prio-${prio}/cgroup.procs"
}

The opposite operation will be realized by the kill_iperf procedure.

whle_ls1046
root@whle-ls1046a:~#
kill_iperf() {
    local port=$1
    pkill -f "iperf3 --port ${port}"
}

Example uage:

whle_ls1046
root@whle-ls1046a:~# launch_iperf_with_priority 5201 0 10
Launching iperf3, port 5201, priority 0
[1] 493
root@whle-ls1046a:~# kill_iperf 5201

This would create a connection with at server at 192.168.3.1, port 5201, for 10 seconds, with the packets sent having skb priority 0. Then it will be killed without waiting for it to finish.

Building on this a third, final procedure will be defined, which coordinates launching two iperf3 streams with different priorities, for the same time period.

whle_ls1046
root@whle-ls1046a:~#
test_iperf() {
    local port1=$1
    local prio1=$2
    local port2=$3
    local prio2=$4
    local iperf_time=$5
    kill_iperf "${port1}"
    kill_iperf "${port2}"
    tc qdisc del dev eth1 root handle 1:
    launch_iperf_with_priority "${port1}" "${prio1}" "${iperf_time}"
    launch_iperf_with_priority "${port2}" "${prio2}" "${iperf_time}"
    tc qdisc add dev eth1 root handle 1: mqprio num_tc 4 \
       map 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 hw 1
    sleep ${iperf_time}
}

The example usage will be given below.

Tests

Same priority

Assuming that iperf3 servers at ports 5201, 5202 are running on PC run the following command on WHLE:

whle_ls1046
root@whle-ls1046a:~# test_iperf 5201 4 5202 4 6
Launching iperf3, port 5201, priority 4
[1] 735
Launching iperf3, port 5202, priority 4
[2] 738

This would create two iperf3 streams with the same skb priority 4, mapping to the traffic class 1. Meanwhile, on the PC side:

PC, console 1
Accepted connection from 192.168.3.2, port 54202
[  5] local 192.168.3.1 port 5201 connected to 192.168.3.2 port 54208
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  55.0 MBytes   462 Mbits/sec                  
[  5]   1.00-2.00   sec  56.1 MBytes   471 Mbits/sec                  
[  5]   2.00-3.00   sec  56.1 MBytes   471 Mbits/sec                  
[  5]   3.00-4.00   sec  56.1 MBytes   471 Mbits/sec                  
[  5]   4.00-5.00   sec  56.1 MBytes   471 Mbits/sec                  
[  5]   5.00-6.00   sec  56.1 MBytes   471 Mbits/sec                  
[  5]   6.00-6.04   sec  2.46 MBytes   467 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-6.04   sec   338 MBytes   469 Mbits/sec                  receiver
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
PC, console 2
Accepted connection from 192.168.3.2, port 46446
[  5] local 192.168.3.1 port 5202 connected to 192.168.3.2 port 46458
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  53.9 MBytes   452 Mbits/sec                  
[  5]   1.00-2.00   sec  56.1 MBytes   471 Mbits/sec                  
[  5]   2.00-3.00   sec  56.1 MBytes   471 Mbits/sec                  
[  5]   3.00-4.00   sec  56.1 MBytes   471 Mbits/sec                  
[  5]   4.00-5.00   sec  56.1 MBytes   471 Mbits/sec                  
[  5]   5.00-6.00   sec  56.1 MBytes   471 Mbits/sec                  
[  5]   6.00-6.04   sec  3.69 MBytes   793 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-6.04   sec   338 MBytes   470 Mbits/sec                  receiver
-----------------------------------------------------------
Server listening on 5202
-----------------------------------------------------------

The experiment shows that the link’s throughput is shared evenly for traffic in the same class. Similar results would be obtained with calls:

test_iperf 5201 0 5202 0 6
test_iperf 5201 8 5202 8 6
test_iperf 5201 12 5202 12 6

(That would cover all 4 traffic classes defined by tc, with skb priorities different from 0, 4, 8, 12 resulting in the same classes set.)

  • No labels