Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 11 Current »

Setup

For the test a single WHLE board was used, identified as whle_ls1046_1, and a PC equipped with a network card having two 10G ports. A single PC machine was set up to emulate two separate machines serving as endpoints for the WHLE board to route between. Please note that the actual speed test results depend on the CPU power of the PC, which, for the tests to measure only WHLE performance undisturbed, must be able to handle traffic at both ends at level exceeding the ability of a single WHLE board, with a good margin. In the carried out tests the 12-core 4.5 GHz x86_64 machine was more than sufficient for the task.

Before proceeding please make sure you followed the common setup described in WHLE-LS1046 kernel DPAA drivers .

Router

Connection diagram

Connection speed from ens1f1 interface to ens1f0 on PC will be measured, with whle_ls1046_1 configured as a router. The isolated_ns denotes network namespace in which the ens1f0 interface had to be enclosed to force PC to send the packets through whle_ls1046_1 instead of short-circuiting to the local interface.

bridge_whle-pc_1.jpg

Network Setup

PC
root@PC:~# ip netns add isolated_ns
root@PC:~# ip link set ens1f0 netns isolated_ns
root@PC:~# ip netns exec isolated_ns ip addr flush ens1f0
root@PC:~# ip netns exec isolated_ns ip addr add 192.168.10.1/24 dev ens1f0
root@PC:~# ip netns exec isolated_ns ip link set dev ens1f0 up
root@PC:~# ip netns exec isolated_ns ip route add 192.168.30.0/24 via 192.168.10.2
root@PC:~# ip addr flush ens1f1
root@PC:~# ip address add 192.168.30.2/24 dev ens1f1
root@PC:~# ip link set dev ens1f1 up
root@PC:~# ip route add 192.168.10.0/24 via 192.168.30.1
whle_ls1046_1
root@whle-ls1046a:~# ip address flush eth1
root@whle-ls1046a:~# ip address flush eth2
root@whle-ls1046a:~# ip address flush eth3
root@whle-ls1046a:~# ip address flush eth5
root@whle-ls1046a:~# ip address flush eth4
root@whle-ls1046a:~# ip addr add 192.168.10.2/24 dev eth5
root@whle-ls1046a:~# ip addr add 192.168.30.1/24 dev eth4
root@whle-ls1046a:~# ip link set dev eth4 up
root@whle-ls1046a:~# ip link set dev eth5 up
root@whle-ls1046a:~# echo 1 > /proc/sys/net/ipv4/ip_forward

Tests

Iperf servers

On PC launch four instances of iperf3 servers, listening on ports 5201-5204. The ip netns exec command requires root access.

PC
root@PC:~# ip netns exec isolated_ns iperf3 -s -p 5201 &
root@PC:~# ip netns exec isolated_ns iperf3 -s -p 5202 &
root@PC:~# ip netns exec isolated_ns iperf3 -s -p 5203 &
root@PC:~# ip netns exec isolated_ns iperf3 -s -p 5204 &

Iperf clients

Launch four instances of iperf3 simultaneously.

PC
root@PC:~# (
    iperf3 -c 192.168.10.1 --port 5201 --cport 55000 --time 0 --omit 5 --title A &
    iperf3 -c 192.168.10.1 --port 5202 --cport 55002 --time 0 --omit 5 --title B &
    iperf3 -c 192.168.10.1 --port 5203 --cport 55006 --time 0 --omit 5 --title C &
    iperf3 -c 192.168.10.1 --port 5204 --cport 55001 --time 0 --omit 5 --title D &
)
A:  Connecting to host 192.168.10.1, port 5201
B:  Connecting to host 192.168.10.1, port 5202
C:  Connecting to host 192.168.10.1, port 5203
D:  Connecting to host 192.168.10.1, port 5204
C:  [  5] local 192.168.30.2 port 55006 connected to 192.168.10.1 port 5203
B:  [  5] local 192.168.30.2 port 55002 connected to 192.168.10.1 port 5202
D:  [  5] local 192.168.30.2 port 55001 connected to 192.168.10.1 port 5204
A:  [  5] local 192.168.30.2 port 55000 connected to 192.168.10.1 port 5201
B:  [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
C:  [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
B:  [  5]   0.00-1.00   sec   268 MBytes  2.25 Gbits/sec   56    337 KBytes       (omitted)
C:  [  5]   0.00-1.00   sec   285 MBytes  2.39 Gbits/sec  152    433 KBytes       (omitted)
A:  [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
D:  [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
D:  [  5]   0.00-1.00   sec   258 MBytes  2.16 Gbits/sec  195    370 KBytes       (omitted)
A:  [  5]   0.00-1.00   sec   285 MBytes  2.39 Gbits/sec  162    584 KBytes       (omitted)
B:  [  5]   1.00-2.00   sec   294 MBytes  2.46 Gbits/sec   15    741 KBytes       (omitted)
...

The port numbers are picked in such a way that every iperf3 flow is handled by a different core on whle_ls1046_1 - cores 0, 2, 1, 3, respectively. (The iperf3 calls fix the source port used for the data transfer connection (the --cport parameter). There is a small chance some of them are already used in the system. In this case it’s necessary to locate these processes with netstat -tnp and kill them.)

Stop all the clients after some time.

PC
root@PC:~# kill $(ps a | grep 'iperf3 -[c]' | awk '{ print $1; }')
...
B:  [  5]  53.00-53.34  sec   125 MBytes  3.13 Gbits/sec    0   1014 KBytes       
A:  [  5]  53.00-53.33  sec   124 MBytes  3.13 Gbits/sec    0    732 KBytes       
C:  [  5]  53.00-53.34  sec  62.5 MBytes  1.56 Gbits/sec    0    472 KBytes       
B:  - - - - - - - - - - - - - - - - - - - - - - - - -
D:  [  5]  53.00-53.33  sec  61.2 MBytes  1.55 Gbits/sec    0    454 KBytes       
A:  - - - - - - - - - - - - - - - - - - - - - - - - -
C:  - - - - - - - - - - - - - - - - - - - - - - - - -
B:  [ ID] Interval           Transfer     Bitrate         Retr
D:  - - - - - - - - - - - - - - - - - - - - - - - - -
C:  [ ID] Interval           Transfer     Bitrate         Retr
A:  [ ID] Interval           Transfer     Bitrate         Retr
B:  [  5]   0.00-53.34  sec  14.1 GBytes  2.29 Gbits/sec  2306             sender
D:  [ ID] Interval           Transfer     Bitrate         Retr
C:  [  5]   0.00-53.34  sec  14.1 GBytes  2.37 Gbits/sec  2890             sender
A:  [  5]   0.00-53.33  sec  15.2 GBytes  2.33 Gbits/sec  2889             sender
B:  [  5]   0.00-53.34  sec  0.00 Bytes  0.00 bits/sec                  receiver
D:  [  5]   0.00-53.33  sec  14.1 GBytes  2.35 Gbits/sec  2636             sender
C:  [  5]   0.00-53.34  sec  0.00 Bytes  0.00 bits/sec                  receiver
A:  [  5]   0.00-53.33  sec  0.00 Bytes  0.00 bits/sec                  receiver
D:  [  5]   0.00-53.33  sec  0.00 Bytes  0.00 bits/sec                  receiver
iperf3: interrupt - the client has terminated
iperf3: interrupt - the client has terminated
iperf3: interrupt - the client has terminated
iperf3: interrupt - the client has terminated

Sum the values from the lines with "sender" at the end.

B:  [  5]   0.00-53.34  sec  14.1 GBytes  2.29 Gbits/sec  2306             sender
...
C:  [  5]   0.00-53.34  sec  14.1 GBytes  2.37 Gbits/sec  2890             sender
...
A:  [  5]   0.00-53.33  sec  15.2 GBytes  2.33 Gbits/sec  2889             sender
...
D:  [  5]   0.00-53.33  sec  14.1 GBytes  2.35 Gbits/sec  2636             sender

The total bandwidth achieved is 2.29 + 2.37 + 2.33 + 2.35 = 9.34 Gb/s. This is the upper limit for the TCP protocol on a 10 Gb/s physical link, proving that WHLE-LS1046A board is able to handle routing at its network interface's limit using standard kernel drivers.

WHLE work analysis

Consider the snapshot from the top command ran on whle_ls1046_1 during the performance test:

whle-load.png

The si column shows the CPU time spent in software interrupts, in this case the network interrupts almost exclusively. Nearly zero time spent by the system or user shows that the routing task is carried out in the interrupts alone. The load spread evenly at ~73% between all cores stems from picking the right parameters (ip source address, ip dest address, tcp source port, tcp dest port) defining four data flows assigned by driver's RSS to four separate CPUs. The idle time id at ~25% shows that WHLE operates at 75% capacity, providing a decent margin to account for more realistic routing tasks, with bigger routing tables and less than perfectly CPU-even traffic.

L2 Bridge

Connection diagram

bridge_whle-pc.drawio.png

Network Setup

PC
root@PC:~# ip netns add isolated_ns
root@PC:~# ip link set ens1f0 netns isolated_ns
root@PC:~# ip netns exec isolated_ns ip addr flush ens1f0 
root@PC:~# ip netns exec isolated_ns ip addr add 192.168.30.1/24 dev ens1f0
root@PC:~# ip addr flush ens1f1
root@PC:~# ip address add 192.168.30.2/24 dev ens1f1
whle_ls1046_1
root@whle-ls1046a:~# ip address flush eth4
root@whle-ls1046a:~# ip address flush eth5
root@whle-ls1046a:~# ip link set dev eth4 down
root@whle-ls1046a:~# ip link set dev eth5 down
root@whle-ls1046a:~# brctl addbr br0
root@whle-ls1046a:~# brctl addif br0 eth4
root@whle-ls1046a:~# brctl addif br0 eth5
root@whle-ls1046a:~# ip link set dev br0 up
root@whle-ls1046a:~# ip link set dev eth4 up
root@whle-ls1046a:~# ip link set dev eth5 up

Tests

Iperf servers

On PC launch four instances of iperf3 servers, listening on ports 5201 to 5204. The ip netns exec command requires root access.

PC
root@PC:~# ip netns exec isolated_ns iperf3 -s -p 5201 &
root@PC:~# ip netns exec isolated_ns iperf3 -s -p 5202 &
root@PC:~# ip netns exec isolated_ns iperf3 -s -p 5203 &
root@PC:~# ip netns exec isolated_ns iperf3 -s -p 5204 &

Iperf clients

Run four clients simultaneously:

PC
root@PC:~# (
    iperf3 -c 192.168.30.1 --port 5201 --cport 55000 --time 0 --title A &
    iperf3 -c 192.168.30.1 --port 5202 --cport 55002 --time 0 --title B &
    iperf3 -c 192.168.30.1 --port 5203 --cport 55004 --time 0 --title C &
    iperf3 -c 192.168.30.1 --port 5204 --cport 55003 --time 0 --title D &
)
A:  Connecting to host 192.168.30.1, port 5201
B:  Connecting to host 192.168.30.1, port 5202
C:  Connecting to host 192.168.30.1, port 5203
D:  Connecting to host 192.168.30.1, port 5204
B:  [  5] local 192.168.30.2 port 55002 connected to 192.168.30.1 port 5202
D:  [  5] local 192.168.30.2 port 55003 connected to 192.168.30.1 port 5204
A:  [  5] local 192.168.30.2 port 55000 connected to 192.168.30.1 port 5201
C:  [  5] local 192.168.30.2 port 55004 connected to 192.168.30.1 port 5203
B:  [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
B:  [  5]   0.00-1.00   sec   243 MBytes  2.04 Gbits/sec  148    386 KBytes       
C:  [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
C:  [  5]   0.00-1.00   sec   382 MBytes  3.21 Gbits/sec  243    331 KBytes       
D:  [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
D:  [  5]   0.00-1.00   sec   251 MBytes  2.11 Gbits/sec  214    250 KBytes       
A:  [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
A:  [  5]   0.00-1.00   sec   249 MBytes  2.09 Gbits/sec   83    370 KBytes       
B:  [  5]   1.00-2.00   sec   210 MBytes  1.76 Gbits/sec  404    454 KBytes       
A:  [  5]   1.00-2.00   sec   470 MBytes  3.95 Gbits/sec  173    551 KBytes       
C:  [  5]   1.00-2.00   sec   224 MBytes  1.88 Gbits/sec    5    539 KBytes       
D:  [  5]   1.00-2.00   sec   218 MBytes  1.83 Gbits/sec   23    362 KBytes       
B:  [  5]   2.00-3.00   sec   229 MBytes  1.92 Gbits/sec  422    609 KBytes       
...

The addresses and ports are picked in such a way that each iperf3 flow is handled by a different core on whle_ls1046_1 - cores 3, 1, 0, 2, respectively.

Stop all the clients after some time.

root@PC:~# kill $(ps a | grep 'iperf3 -[c]' | awk '{ print $1; }')
...
D:  [  5] 139.00-140.00 sec   280 MBytes  2.35 Gbits/sec  168    611 KBytes       
D:  [  5] 140.00-140.95 sec   348 MBytes  3.06 Gbits/sec  108    617 KBytes       
B:  [  5] 140.00-140.96 sec   272 MBytes  2.39 Gbits/sec  940    516 KBytes       
D:  - - - - - - - - - - - - - - - - - - - - - - - - -
D:  [ ID] Interval           Transfer     Bitrate         Retr
B:  - - - - - - - - - - - - - - - - - - - - - - - - -
A:  [  5] 140.00-140.95 sec   246 MBytes  2.17 Gbits/sec  754    598 KBytes       
B:  [ ID] Interval           Transfer     Bitrate         Retr
D:  [  5]   0.00-140.95 sec  40.3 GBytes  2.45 Gbits/sec  32702             sender
A:  - - - - - - - - - - - - - - - - - - - - - - - - -
A:  [ ID] Interval           Transfer     Bitrate         Retr
B:  [  5]   0.00-140.96 sec  37.4 GBytes  2.28 Gbits/sec  56664             sender
D:  [  5]   0.00-140.95 sec  0.00 Bytes  0.00 bits/sec                  receiver
A:  [  5]   0.00-140.95 sec  37.0 GBytes  2.25 Gbits/sec  64981             sender
B:  [  5]   0.00-140.96 sec  0.00 Bytes  0.00 bits/sec                  receiver
C:  [  5] 140.00-140.95 sec   195 MBytes  1.72 Gbits/sec  290    461 KBytes       
C:  - - - - - - - - - - - - - - - - - - - - - - - - -
C:  [ ID] Interval           Transfer     Bitrate         Retr
C:  [  5]   0.00-140.95 sec  38.9 GBytes  2.37 Gbits/sec  34875             sender
C:  [  5]   0.00-140.95 sec  0.00 Bytes  0.00 bits/sec                  receiver
iperf3: interrupt - the client has terminated
A:  [  5]   0.00-140.95 sec  0.00 Bytes  0.00 bits/sec                  receiver
iperf3: interrupt - the client has terminated
iperf3: interrupt - the client has terminated
iperf3: interrupt - the client has terminated

Sum the values from the lines with "sender" at the end

D:  [  5]   0.00-140.95 sec  40.3 GBytes  2.45 Gbits/sec  32702             sender
...
B:  [  5]   0.00-140.96 sec  37.4 GBytes  2.28 Gbits/sec  56664             sender
...
A:  [  5]   0.00-140.95 sec  37.0 GBytes  2.25 Gbits/sec  64981             sender
...
C:  [  5]   0.00-140.95 sec  38.9 GBytes  2.37 Gbits/sec  34875             sender

The total bandwidth achieved is 2.45 + 2.28 + 2.25 + 2.37 = 9.35 Gb/s. This is the upper limit for the TCP protocol on a 10 Gb/s physical link, proving that WHLE-LS1046A board is able to handle bridging at network interface's limit using standard kernel drivers.

WHLE work analysis

Consider the snapshot from the top command ran on whle_ls1046_1 during the performance test:

whle-load-bridge-x4.png

Just like in the case of router (https://conclusive.atlassian.net/wiki/spaces/CW/pages/edit-v2/398721025#WHLE-work-analysis) the only meaningful columns are id (idle) and si (software interrupt). Unlike with the router, however, the CPU load in the bridge mode has a high variance and thus a single top command snapshot can be misleading. It’s useful to record the numbers for a minute or so:

whle_ls1046_1
top -d 0.5 -b \
    | grep -e ^%Cpu \
    | sed -e 's/[,:]/ /g' \
    | awk '{print $1 "\t" $8 "\t" $14}' \
    | tee cpu-load-id-si-per-5-ds.log

Plotting them, along with the averages, would obtain a graph similar to this one:

cpu-load-plot.png

From this graph it’s clear that every core’s idle time oscillates at 30% average, leaving healthy margin to account for more realistic bridging scenarios with less than perfectly CPU-even traffic.

  • No labels