Has anybody noticed this behavior after installing that fix on R80.30 Gaia 2.6:
Good:
- VPN load is now smoothly spread across available SND cores (and not just only on the first SND core anymore)
- There are now as many cphwd_q_init_ke PIDs in the top of CPU load as there are fw_worker. Before, it was just one.
Bad:
- All cphwd_q_init_ke PIDs visible in top with high CPU load are assigned to the same CPU core as fw_worker_0.
Lets take a closer look at that:
Here we have an OpenServer with 12 cores for example. Licence allows 8 cores. 2 cores are assigned as SND, 6 as firewall worker. Pretty much default.
# fw ctl affinity -l -r -a -v
CPU 0: eth0 (irq 147)
CPU 1: eth1 (irq 163) eth2 (irq 179)
CPU 2: fw_5
lpd dtlsd fwd wsdnsd in.acapd mpdaemon rad vpnd rtmd in.asessiond usrchkd pepd dtpsd cprid cpd
CPU 3: fw_4
lpd dtlsd fwd wsdnsd in.acapd mpdaemon rad vpnd rtmd in.asessiond usrchkd pepd dtpsd cprid cpd
CPU 4: fw_3
lpd dtlsd fwd wsdnsd in.acapd mpdaemon rad vpnd rtmd in.asessiond usrchkd pepd dtpsd cprid cpd
CPU 5: fw_2
lpd dtlsd fwd wsdnsd in.acapd mpdaemon rad vpnd rtmd in.asessiond usrchkd pepd dtpsd cprid cpd
CPU 6: fw_1
lpd dtlsd fwd wsdnsd in.acapd mpdaemon rad vpnd rtmd in.asessiond usrchkd pepd dtpsd cprid cpd
CPU 7: fw_0
lpd dtlsd fwd wsdnsd in.acapd mpdaemon rad vpnd rtmd in.asessiond usrchkd pepd dtpsd cprid cpd
CPU 8:
CPU 9:
CPU 10:
CPU 11:
All:
The current license permits the use of CPUs 0, 1, 2, 3, 4, 5, 6, 7 only.
Interface eth4: has multi queue enabled
Interface eth5: has multi queue enabled
Let's take a look at the affinity map for all cphwd_q_init_ke PIDs:
# for p in `ps -eo pid,psr,comm | grep "cphwd_q_init_ke" | cut -d" " -f2`; do taskset -p $p; done
pid 8290's current affinity mask: 80
pid 8291's current affinity mask: 80
pid 8292's current affinity mask: 80
pid 8293's current affinity mask: 80
pid 8294's current affinity mask: 80
pid 8295's current affinity mask: 80
pid 8296's current affinity mask: 80
pid 8297's current affinity mask: 80
pid 8298's current affinity mask: 80
pid 8299's current affinity mask: 80
pid 8300's current affinity mask: 80
pid 8301's current affinity mask: 80
pid 8302's current affinity mask: fff
pid 8330's current affinity mask: 80
pid 8331's current affinity mask: 80
pid 8332's current affinity mask: 80
pid 8333's current affinity mask: 80
pid 8334's current affinity mask: 80
pid 8335's current affinity mask: 80
pid 8336's current affinity mask: 80
pid 8337's current affinity mask: 80
pid 8338's current affinity mask: 80
pid 8339's current affinity mask: 80
pid 8340's current affinity mask: 80
pid 8341's current affinity mask: 80
pid 8342's current affinity mask: fff
So we have 26 cphwd_q_init_ke PIDs (why 26?) and 2 of them have no restriction of the number of CPUs but 24 are restricted to core 7 (hex 80 is binary 10000000 and you count core from right to left on this bitmask).
When we check top on a highly loaded VPN gateway, it looks like this after installing this patch:
top - 09:56:01 up 2 days, 20:52, 1 user, load average: 7.45, 7.21, 6.94
Tasks: 233 total, 7 running, 226 sleeping, 0 stopped, 0 zombie
Cpu0 : 0.0%us, 0.0%sy, 0.0%ni, 43.1%id, 0.0%wa, 2.0%hi, 54.9%si, 0.0%st
Cpu1 : 0.0%us, 0.0%sy, 0.0%ni, 45.1%id, 0.0%wa, 2.0%hi, 52.9%si, 0.0%st
Cpu2 : 1.9%us, 3.8%sy, 0.0%ni, 59.6%id, 0.0%wa, 0.0%hi, 34.6%si, 0.0%st
Cpu3 : 1.9%us, 3.8%sy, 0.0%ni, 52.8%id, 0.0%wa, 0.0%hi, 41.5%si, 0.0%st
Cpu4 : 0.0%us, 4.0%sy, 0.0%ni, 66.0%id, 0.0%wa, 0.0%hi, 30.0%si, 0.0%st
Cpu5 : 1.9%us, 1.9%sy, 0.0%ni, 63.5%id, 0.0%wa, 0.0%hi, 32.7%si, 0.0%st
Cpu6 : 2.0%us, 2.0%sy, 0.0%ni, 66.7%id, 0.0%wa, 0.0%hi, 29.4%si, 0.0%st
Cpu7 : 0.0%us, 2.0%sy, 0.0%ni, 9.8%id, 0.0%wa, 0.0%hi, 88.2%si, 0.0%st
Cpu8 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu9 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu10 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu11 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 32680048k total, 10670220k used, 22009828k free, 465576k buffers
Swap: 33551744k total, 0k used, 33551744k free, 3085604k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ P COMMAND
6754 admin 15 0 0 0 0 S 43 0.0 103:28.34 3 fw_worker_4
6755 admin 15 0 0 0 0 R 37 0.0 97:36.02 2 fw_worker_5
6750 admin 15 0 0 0 0 R 33 0.0 92:26.70 7 fw_worker_0
6753 admin 15 0 0 0 0 R 33 0.0 90:05.07 4 fw_worker_3
6752 admin 15 0 0 0 0 S 31 0.0 98:53.03 5 fw_worker_2
6751 admin 15 0 0 0 0 S 29 0.0 94:25.99 6 fw_worker_1
8298 admin 15 0 0 0 0 S 25 0.0 27:53.63 7 cphwd_q_init_ke
8297 admin 15 0 0 0 0 R 14 0.0 22:37.70 7 cphwd_q_init_ke
8292 admin 15 0 0 0 0 S 8 0.0 20:27.46 7 cphwd_q_init_ke
8296 admin 15 0 0 0 0 S 8 0.0 31:20.98 7 cphwd_q_init_ke
16558 admin 15 0 772m 216m 43m R 8 0.7 88:07.54 3 fw_full
17016 admin 15 0 327m 101m 31m S 6 0.3 22:11.44 3 vpnd
8290 admin 15 0 0 0 0 S 2 0.0 28:00.28 7 cphwd_q_init_ke
8291 admin 15 0 2248 1196 836 R 4 0.0 71:34.38 5 top
8290 admin 15 0 0 0 0 R 2 0.0 28:00.27 7 cphwd_q_init_ke
So we see only 6 of these 26 PIDs for cphwd_q_init_ke here.
Why?
Of course, we could use our root priviledges here to reassign these PIDs to other cores using taskset, but I guess this will be called unsupported. And maybe we will broke something with that. Maybe the devs had a good reason, why they bound this PIDs to the same core with fw_worker_0.
Any thoughts on that?
@PhoneBoyMaybe you can ask R&D, if this is the desired behaviour with this patch? That would really help.