Setup
For the test a single WHLE board was used, identified as whle_ls1046_1
, and a PC
equipped with a network card having two 10G ports. A single PC machine was set up to emulate two separate machines serving as endpoints for the WHLE board to route between. Please note that the actual speed test results depend on the CPU power of the PC, which, for the tests to measure only WHLE performance undisturbed, must be able to handle traffic at both ends at level exceeding the ability of a single WHLE board, with a good margin. In the carried out tests the 12-core 4.5 GHz x86_64 machine was more than sufficient for the task.
Before proceeding please make sure you followed the common setup described in WHLE-LS1046 kernel DPAA drivers .
Router
Connection diagram
Connection speed from ens1f1
interface to ens1f0
on PC
will be measured, with whle_ls1046_1
configured as a router. The isolated_ns denotes network namespace in which the ens1f0
interface had to be enclosed to force PC
to send the packets through whle_ls1046_1
instead of short-circuiting to the local interface.
Network Setup
PC
root@PC:~# ip netns add isolated_ns root@PC:~# ip link set ens1f0 netns isolated_ns root@PC:~# ip netns exec isolated_ns ip addr flush ens1f0 root@PC:~# ip netns exec isolated_ns ip addr add 192.168.10.1/24 dev ens1f0 root@PC:~# ip netns exec isolated_ns ip link set dev ens1f0 up root@PC:~# ip netns exec isolated_ns ip route add 192.168.30.0/24 via 192.168.10.2 root@PC:~# ip addr flush ens1f1 root@PC:~# ip address add 192.168.30.2/24 dev ens1f1 root@PC:~# ip link set dev ens1f1 up root@PC:~# ip route add 192.168.10.0/24 via 192.168.30.1
whle_ls1046_1
root@whle-ls1046a:~# ip address flush eth1 root@whle-ls1046a:~# ip address flush eth2 root@whle-ls1046a:~# ip address flush eth3 root@whle-ls1046a:~# ip address flush eth5 root@whle-ls1046a:~# ip address flush eth4 root@whle-ls1046a:~# ip addr add 192.168.10.2/24 dev eth5 root@whle-ls1046a:~# ip addr add 192.168.30.1/24 dev eth4 root@whle-ls1046a:~# ip link set dev eth4 up root@whle-ls1046a:~# ip link set dev eth5 up root@whle-ls1046a:~# echo 1 > /proc/sys/net/ipv4/ip_forward
Tests
Iperf servers
On PC
launch four instances of iperf3 servers, listening on ports 5201-5204
. The ip netns exec
command requires root access.
PC
root@PC:~# ip netns exec isolated_ns iperf3 -s -p 5201 & root@PC:~# ip netns exec isolated_ns iperf3 -s -p 5202 & root@PC:~# ip netns exec isolated_ns iperf3 -s -p 5203 & root@PC:~# ip netns exec isolated_ns iperf3 -s -p 5204 &
Iperf clients
Launch four instances of iperf3
simultaneously.
PC
root@PC:~# ( iperf3 -c 192.168.10.1 --port 5201 --cport 55000 --time 0 --omit 5 --title A & iperf3 -c 192.168.10.1 --port 5202 --cport 55002 --time 0 --omit 5 --title B & iperf3 -c 192.168.10.1 --port 5203 --cport 55006 --time 0 --omit 5 --title C & iperf3 -c 192.168.10.1 --port 5204 --cport 55001 --time 0 --omit 5 --title D & ) A: Connecting to host 192.168.10.1, port 5201 B: Connecting to host 192.168.10.1, port 5202 C: Connecting to host 192.168.10.1, port 5203 D: Connecting to host 192.168.10.1, port 5204 C: [ 5] local 192.168.30.2 port 55006 connected to 192.168.10.1 port 5203 B: [ 5] local 192.168.30.2 port 55002 connected to 192.168.10.1 port 5202 D: [ 5] local 192.168.30.2 port 55001 connected to 192.168.10.1 port 5204 A: [ 5] local 192.168.30.2 port 55000 connected to 192.168.10.1 port 5201 B: [ ID] Interval Transfer Bitrate Retr Cwnd C: [ ID] Interval Transfer Bitrate Retr Cwnd B: [ 5] 0.00-1.00 sec 268 MBytes 2.25 Gbits/sec 56 337 KBytes (omitted) C: [ 5] 0.00-1.00 sec 285 MBytes 2.39 Gbits/sec 152 433 KBytes (omitted) A: [ ID] Interval Transfer Bitrate Retr Cwnd D: [ ID] Interval Transfer Bitrate Retr Cwnd D: [ 5] 0.00-1.00 sec 258 MBytes 2.16 Gbits/sec 195 370 KBytes (omitted) A: [ 5] 0.00-1.00 sec 285 MBytes 2.39 Gbits/sec 162 584 KBytes (omitted) B: [ 5] 1.00-2.00 sec 294 MBytes 2.46 Gbits/sec 15 741 KBytes (omitted) ...
The port numbers are picked in such a way that every iperf3 flow is handled by a different core on whle_ls1046_1
- cores 0
, 2
, 1
, 3
, respectively. (The iperf3 calls fix the source port used for the data transfer connection (the --cport
parameter). There is a small chance some of them are already used in the system. In this case it’s necessary to locate these processes with netstat -tnp
and kill them.)
Stop all the clients after some time.
PC
root@PC:~# kill $(ps a | grep 'iperf3 -[c]' | awk '{ print $1; }')
... B: [ 5] 53.00-53.34 sec 125 MBytes 3.13 Gbits/sec 0 1014 KBytes A: [ 5] 53.00-53.33 sec 124 MBytes 3.13 Gbits/sec 0 732 KBytes C: [ 5] 53.00-53.34 sec 62.5 MBytes 1.56 Gbits/sec 0 472 KBytes B: - - - - - - - - - - - - - - - - - - - - - - - - - D: [ 5] 53.00-53.33 sec 61.2 MBytes 1.55 Gbits/sec 0 454 KBytes A: - - - - - - - - - - - - - - - - - - - - - - - - - C: - - - - - - - - - - - - - - - - - - - - - - - - - B: [ ID] Interval Transfer Bitrate Retr D: - - - - - - - - - - - - - - - - - - - - - - - - - C: [ ID] Interval Transfer Bitrate Retr A: [ ID] Interval Transfer Bitrate Retr B: [ 5] 0.00-53.34 sec 14.1 GBytes 2.29 Gbits/sec 2306 sender D: [ ID] Interval Transfer Bitrate Retr C: [ 5] 0.00-53.34 sec 14.1 GBytes 2.37 Gbits/sec 2890 sender A: [ 5] 0.00-53.33 sec 15.2 GBytes 2.33 Gbits/sec 2889 sender B: [ 5] 0.00-53.34 sec 0.00 Bytes 0.00 bits/sec receiver D: [ 5] 0.00-53.33 sec 14.1 GBytes 2.35 Gbits/sec 2636 sender C: [ 5] 0.00-53.34 sec 0.00 Bytes 0.00 bits/sec receiver A: [ 5] 0.00-53.33 sec 0.00 Bytes 0.00 bits/sec receiver D: [ 5] 0.00-53.33 sec 0.00 Bytes 0.00 bits/sec receiver iperf3: interrupt - the client has terminated iperf3: interrupt - the client has terminated iperf3: interrupt - the client has terminated iperf3: interrupt - the client has terminated
Sum the values from the lines with "sender" at the end.
B: [ 5] 0.00-53.34 sec 14.1 GBytes 2.29 Gbits/sec 2306 sender ... C: [ 5] 0.00-53.34 sec 14.1 GBytes 2.37 Gbits/sec 2890 sender ... A: [ 5] 0.00-53.33 sec 15.2 GBytes 2.33 Gbits/sec 2889 sender ... D: [ 5] 0.00-53.33 sec 14.1 GBytes 2.35 Gbits/sec 2636 sender
The total bandwidth achieved is 2.29 + 2.37 + 2.33 + 2.35 = 9.34 Gb/s. This is the upper limit for the TCP protocol on a 10 Gb/s physical link, proving that WHLE-LS1046A board is able to handle routing at its network interface's limit using standard kernel drivers.
WHLE work analysis
Consider the snapshot from the top
command ran on whle_ls1046_1
during the performance test:
The si column shows the CPU time spent in software interrupts, in this case the network interrupts almost exclusively. Nearly zero time spent by the system or user shows that the routing task is carried out in the interrupts alone. The load spread evenly at ~73% between all cores stems from picking the right parameters (ip source address, ip dest address, tcp source port, tcp dest port) defining four data flows assigned by driver's RSS to four separate CPUs. The idle time id at ~25% shows that WHLE operates at 75% capacity, providing a decent margin to account for more realistic routing tasks, with bigger routing tables and less than perfectly CPU-even traffic.
L2 Bridge
Connection diagram
Network Setup
PC
root@PC:~# ip netns add isolated_ns root@PC:~# ip link set ens1f0 netns isolated_ns root@PC:~# ip netns exec isolated_ns ip addr flush ens1f0 root@PC:~# ip netns exec isolated_ns ip addr add 192.168.30.1/24 dev ens1f0 root@PC:~# ip addr flush ens1f1 root@PC:~# ip address add 192.168.30.2/24 dev ens1f1
whle_ls1046_1
root@whle-ls1046a:~# ip address flush eth4 root@whle-ls1046a:~# ip address flush eth5 root@whle-ls1046a:~# ip link set dev eth4 down root@whle-ls1046a:~# ip link set dev eth5 down root@whle-ls1046a:~# brctl addbr br0 root@whle-ls1046a:~# brctl addif br0 eth4 root@whle-ls1046a:~# brctl addif br0 eth5 root@whle-ls1046a:~# ip link set dev br0 up root@whle-ls1046a:~# ip link set dev eth4 up root@whle-ls1046a:~# ip link set dev eth5 up
Tests
Iperf servers
On PC
launch four instances of iperf3 servers, listening on ports 5201
to 5204
. The ip netns exec
command requires root access.
PC
root@PC:~# ip netns exec isolated_ns iperf3 -s -p 5201 & root@PC:~# ip netns exec isolated_ns iperf3 -s -p 5202 & root@PC:~# ip netns exec isolated_ns iperf3 -s -p 5203 & root@PC:~# ip netns exec isolated_ns iperf3 -s -p 5204 &
Iperf clients
Run four clients simultaneously:
PC
root@PC:~# ( iperf3 -c 192.168.30.1 --port 5201 --cport 55000 --time 0 --title A & iperf3 -c 192.168.30.1 --port 5202 --cport 55002 --time 0 --title B & iperf3 -c 192.168.30.1 --port 5203 --cport 55004 --time 0 --title C & iperf3 -c 192.168.30.1 --port 5204 --cport 55003 --time 0 --title D & ) A: Connecting to host 192.168.30.1, port 5201 B: Connecting to host 192.168.30.1, port 5202 C: Connecting to host 192.168.30.1, port 5203 D: Connecting to host 192.168.30.1, port 5204 B: [ 5] local 192.168.30.2 port 55002 connected to 192.168.30.1 port 5202 D: [ 5] local 192.168.30.2 port 55003 connected to 192.168.30.1 port 5204 A: [ 5] local 192.168.30.2 port 55000 connected to 192.168.30.1 port 5201 C: [ 5] local 192.168.30.2 port 55004 connected to 192.168.30.1 port 5203 B: [ ID] Interval Transfer Bitrate Retr Cwnd B: [ 5] 0.00-1.00 sec 243 MBytes 2.04 Gbits/sec 148 386 KBytes C: [ ID] Interval Transfer Bitrate Retr Cwnd C: [ 5] 0.00-1.00 sec 382 MBytes 3.21 Gbits/sec 243 331 KBytes D: [ ID] Interval Transfer Bitrate Retr Cwnd D: [ 5] 0.00-1.00 sec 251 MBytes 2.11 Gbits/sec 214 250 KBytes A: [ ID] Interval Transfer Bitrate Retr Cwnd A: [ 5] 0.00-1.00 sec 249 MBytes 2.09 Gbits/sec 83 370 KBytes B: [ 5] 1.00-2.00 sec 210 MBytes 1.76 Gbits/sec 404 454 KBytes A: [ 5] 1.00-2.00 sec 470 MBytes 3.95 Gbits/sec 173 551 KBytes C: [ 5] 1.00-2.00 sec 224 MBytes 1.88 Gbits/sec 5 539 KBytes D: [ 5] 1.00-2.00 sec 218 MBytes 1.83 Gbits/sec 23 362 KBytes B: [ 5] 2.00-3.00 sec 229 MBytes 1.92 Gbits/sec 422 609 KBytes ...
The addresses and ports are picked in such a way that each iperf3 flow is handled by a different core on whle_ls1046_1
- cores 3
, 1
, 0
, 2
, respectively.
Stop all the clients after some time.
root@PC:~# kill $(ps a | grep 'iperf3 -[c]' | awk '{ print $1; }')
... D: [ 5] 139.00-140.00 sec 280 MBytes 2.35 Gbits/sec 168 611 KBytes D: [ 5] 140.00-140.95 sec 348 MBytes 3.06 Gbits/sec 108 617 KBytes B: [ 5] 140.00-140.96 sec 272 MBytes 2.39 Gbits/sec 940 516 KBytes D: - - - - - - - - - - - - - - - - - - - - - - - - - D: [ ID] Interval Transfer Bitrate Retr B: - - - - - - - - - - - - - - - - - - - - - - - - - A: [ 5] 140.00-140.95 sec 246 MBytes 2.17 Gbits/sec 754 598 KBytes B: [ ID] Interval Transfer Bitrate Retr D: [ 5] 0.00-140.95 sec 40.3 GBytes 2.45 Gbits/sec 32702 sender A: - - - - - - - - - - - - - - - - - - - - - - - - - A: [ ID] Interval Transfer Bitrate Retr B: [ 5] 0.00-140.96 sec 37.4 GBytes 2.28 Gbits/sec 56664 sender D: [ 5] 0.00-140.95 sec 0.00 Bytes 0.00 bits/sec receiver A: [ 5] 0.00-140.95 sec 37.0 GBytes 2.25 Gbits/sec 64981 sender B: [ 5] 0.00-140.96 sec 0.00 Bytes 0.00 bits/sec receiver C: [ 5] 140.00-140.95 sec 195 MBytes 1.72 Gbits/sec 290 461 KBytes C: - - - - - - - - - - - - - - - - - - - - - - - - - C: [ ID] Interval Transfer Bitrate Retr C: [ 5] 0.00-140.95 sec 38.9 GBytes 2.37 Gbits/sec 34875 sender C: [ 5] 0.00-140.95 sec 0.00 Bytes 0.00 bits/sec receiver iperf3: interrupt - the client has terminated A: [ 5] 0.00-140.95 sec 0.00 Bytes 0.00 bits/sec receiver iperf3: interrupt - the client has terminated iperf3: interrupt - the client has terminated iperf3: interrupt - the client has terminated
Sum the values from the lines with "sender" at the end
D: [ 5] 0.00-140.95 sec 40.3 GBytes 2.45 Gbits/sec 32702 sender ... B: [ 5] 0.00-140.96 sec 37.4 GBytes 2.28 Gbits/sec 56664 sender ... A: [ 5] 0.00-140.95 sec 37.0 GBytes 2.25 Gbits/sec 64981 sender ... C: [ 5] 0.00-140.95 sec 38.9 GBytes 2.37 Gbits/sec 34875 sender
The total bandwidth achieved is 2.45 + 2.28 + 2.25 + 2.37 = 9.35 Gb/s. This is the upper limit for the TCP protocol on a 10 Gb/s physical link, proving that WHLE-LS1046A board is able to handle bridging at network interface's limit using standard kernel drivers.
WHLE work analysis
Consider the snapshot from the top command ran on whle_ls1046_1
during the performance test:
Just like in the case of router (https://conclusive.atlassian.net/wiki/spaces/CW/pages/edit-v2/398721025#WHLE-work-analysis) the only meaningful columns are id (idle) and si (software interrupt). Unlike with the router, however, the CPU load in the bridge mode has a high variance and thus a single top command snapshot can be misleading. It’s useful to record the numbers for a minute or so:
whle_ls1046_1
top -d 0.5 -b \ | grep -e ^%Cpu \ | sed -e 's/[,:]/ /g' \ | awk '{print $1 "\t" $8 "\t" $14}' \ | tee cpu-load-id-si-per-5-ds.log
Plotting them, along with the averages, would obtain a graph similar to this one:
From this graph it’s clear that every core’s idle time oscillates at 30% average, leaving healthy margin to account for more realistic bridging scenarios with less than perfectly CPU-even traffic.
Add Comment