Performance issue on R80.30 Take 219 - SND cores h...

Nick_Vidiadakis · ‎2020-12-01

Hello to all,

This is my first post to this community, it's a privilege though and I'm confident that the high level of technical skillset will be of great assistance.

I've recently updated our R80.30 cluster to Take 219 JHF, because we had a lot of private fixes that got included to this and to gain overall performance. The cluster consists of 2x 23900 in Active-Standby with multiple 10G interfaces and multiqueue enabled (plus some static affinity configured) and hyperthreading disabled.

So the above basically translated to 6 CPUs for SND and 30 for CoreXL. From 6 SND, 4 are multiqueue allocated and 2 are responsible for physical remaining interfaces:

[Expert@WALL1.2:0]# cpmq get

Active ixgbe interfaces:

eth1-01 [On]

eth1-02 [Off]

eth4-01 [On]

eth4-02 [Off]

Active igb interfaces:

Mgmt [Off]

Sync [Off]

eth2-01 [Off]

eth2-02 [Off]

eth2-03 [On]

eth2-04 [On]

[Expert@WALL1.2:0]# fw ctl affinity -l -r | grep -e CPU\ 4 -e CPU\ 5

CPU 4: eth4-02 Sync eth2-01

CPU 5: Mgmt eth1-02 eth2-02

We had also configured 4 RX queues for multi queue. Since yesterday morning though, we experienced heavy utilization on the CPU cores 4 & 5, which at the end impacted all traffic traversing these interfaces. After a lot of troubleshooting, including CP support, what we ended configuring was changing the interfaces participating in multi queue (since there is a limitation of only 5 that this can be enabled), reconfigured CoreXL from 30 to 28 and statically configuring affinity for eth1-01 and eth4-01 to the newly introduced SND cores:

[Expert@WALL1.1:0]# sim affinity -l

eth4-01 : 6

eth2-01 : 4

eth1-01 : 7

eth2-02 : 5

Mgmt : 5

Sync : 4

Multi queue interfaces: eth1-02 eth4-02 eth2-03 eth2-04

[Expert@WALL1.1:0]# fw ctl affinity -l -r

CPU 0:

CPU 1:

CPU 2:

CPU 3:

CPU 4: Sync eth2-01

CPU 5: Mgmt eth2-02

CPU 6: eth4-01

CPU 7: eth1-01

[Expert@WALL1.1:0]# cpmq get

Active ixgbe interfaces:

eth1-01 [Off]

eth1-02 [On]

eth4-01 [Off]

eth4-02 [On]

Active igb interfaces:

Mgmt [Off]

Sync [Off]

eth2-01 [Off]

eth2-02 [Off]

eth2-03 [On]

eth2-04 [On]

Now, even though performance is way way better, the SND cores are still used quite enough and we have not reached yet the day's peak, e.g.:

[Expert@WALL1.1:0]# cpstat -f multi_cpu os

Processors load
---------------------------------------------------------------------------------
|CPU#|User Time(%)|System Time(%)|Idle Time(%)|Usage(%)|Run queue|Interrupts/sec|
---------------------------------------------------------------------------------
| 1| 0| 38| 62| 38| ?| 87408|
| 2| 0| 52| 48| 52| ?| 87409|
| 3| 0| 35| 65| 35| ?| 87409|
| 4| 0| 31| 69| 31| ?| 87410|
| 5| 0| 26| 74| 26| ?| 87411|
| 6| 0| 42| 58| 42| ?| 87411|
| 7| 0| 17| 83| 17| ?| 87412|
| 8| 0| 47| 53| 47| ?| 87413|

Also, SecureXL seems to do a decent job:

[Expert@WALL1.1:0]# fwaccel stats -s
Accelerated conns/Total conns : 5212/40836 (12%)
Accelerated pkts/Total pkts : 466598460/846769916 (55%)
F2Fed pkts/Total pkts : 87904870/846769916 (10%)
F2V pkts/Total pkts : 4047817/846769916 (0%)
CPASXL pkts/Total pkts : 0/846769916 (0%)
PSLXL pkts/Total pkts : 292266586/846769916 (34%)
QOS inbound pkts/Total pkts : 0/846769916 (0%)
QOS outbound pkts/Total pkts : 0/846769916 (0%)
Corrected pkts/Total pkts : 0/846769916 (0%)

Accept Templates : enabled
Drop Templates : enabled
NAT Templates : enabled

So, after all the above, my question is: since this seems to be a connection handling case, how do I get what is causing all this CPU utilization? How can I understand what traffic could be causing this? During the high load, I've done a tcpdump & cpmonitor debug to try and identify traffic that is new (?) or could break acceleration, but to no avail. I've also added some fast_accel rules for backup traffic and CIFS (CIFS was already in place), but was not able to identify any performance gains. And yes, this gateway cluster is indeed responsible for VPN and i've already configured cphwd_medium_path_qid_by_mspi=0, which really skyrocketed VPN performance.

Thank anybody for just reading the above.

G_W_Albrecht · ‎2020-12-01

Just a general suggestion: What about an upgrade to R80.40 or at least R80.30 ?

CCSP - CCSE / CCTE / CTPS / CCME / CCSM Elite / SMB Specialist

Nick_Vidiadakis · ‎2020-12-01

Very sorry, did a typo: the version is R80.30, not R80.10. I've corrected the title and the post.

Thanks!

PhoneBoy · ‎2020-12-02

R80.40 or R81 with Dynamic Workloads as well as the ability to use multiqueue on more than 5 interfaces might be helpful here.

Timothy_Hall · ‎2020-12-03

Gotta agree with Phoneboy and Gunter here; instead of trying to juggle manual Multi-Queue settings to stay under the 5 interface limit in Gaia 2.6.18, upgrade to R80.40 which has Gaia 3.10 and Multi-Queue enabled by default on all interfaces except the defined management interface, with no 5 interface limit. You could spend many hours trying to figure out what is happening here, but going down the road of manual affinities is generally not a good idea if you can avoid it. Your time is better spent upgrading.

Gaia 4.18 (R82) Immersion Tips, Tricks, & Best Practices Video Course
Now Available at https://shadowpeak.com/gaia4-18-immersion-course

Are you a member of CheckMates?

Performance issue on R80.30 Take 219 - SND cores heavily used