Solved: Re: High CPU utilization causing slowness

LowJ · ‎2020-09-07

Last week we face application slowness issue and while check one of the FW CPU core are reaching 100% utilization. We temporary solve the problem by doing traffic failover. But even secondary firewall we can see the CPU core can reach upto 80-90% in peak time.

The gateway are open server (DL380 G9) with 16 cores (with 16 cores licenses). Currently CoreXL enabled with 14 firewall instances. SecureXL enabled but broken in middle of the policy based. We didn't enable any multi-queue yet.

# fw ctl affinity -l -r
CPU 0: eth2 eth9
CPU 1: fw_13
CPU 2: fw_11
CPU 3: fw_9
CPU 4: fw_7
CPU 5: fw_5
CPU 6: fw_3
CPU 7: fw_1
CPU 8: eth1 eth11
CPU 9: fw_12
CPU 10: fw_10
CPU 11: fw_8
CPU 12: fw_6
CPU 13: fw_4
CPU 14: fw_2
CPU 15: fw_0
All: fwd mpdaemon in.geod cpd cprid

# fw ctl multik stat
ID | Active | CPU | Connections | Peak
----------------------------------------------
0 | Yes | 15 | 8110 | 24285
1 | Yes | 7 | 3651 | 7632
2 | Yes | 14 | 3079 | 10424
3 | Yes | 6 | 4763 | 16841
4 | Yes | 13 | 4788 | 22256
5 | Yes | 5 | 3046 | 10010
6 | Yes | 12 | 4332 | 10062
7 | Yes | 4 | 4389 | 12022
8 | Yes | 11 | 5155 | 21808
9 | Yes | 3 | 4062 | 21996
10 | Yes | 10 | 4062 | 15100
11 | Yes | 2 | 18330 | 26722
12 | Yes | 9 | 8003 | 31327
13 | Yes | 1 | 3604 | 15908

# cpmq get

Active ixgbe interfaces:
eth1 [Off]
eth2 [Off]

Active igb interfaces:
eth11 [Off]
eth9 [Off]

If i should adjust any CoreXL setting (as currently already 14 firewall instances)? Any method to distribute the eth1 and eth2 load with more CPU? Should i enable multiqueue?

HeikoAnkenbrand · ‎2020-09-08

Hi @LowJ,

here is good to see that most of the packages pass the acceleration path (96%).
I would reduce the CoreXL instances and use more SND's. In your case I would start with 6/10 (SND/CoreXL) instead of 2/14. If that is not enough, you can also change to 8/8.
I would activate multi queueing for the 10 GBit/s interface (driver ixgbe).

More read here:
R80.x - Performance Tuning Tip - Multi Queue
R80.x - Security Gateway Architecture (Logical Packet Flow)

➜ CCSM Elite, CCME, CCTE ➜ www.checkpoint.tips

View solution in original post

HeikoAnkenbrand · ‎2020-09-07

Hi @LowJ

Can you still provide the following outputs, then we can say something more:

top (and type 1)
fwaccel stats -s
enabled_blades

If you use 10 GBit/s interface, you should use MQ. With 1 GBit/s this is not always necessary.

➜ CCSM Elite, CCME, CCTE ➜ www.checkpoint.tips

LowJ · ‎2020-09-08

Hi HeikoAnkenbrand

Please find below output.

# fwaccel stats -s
Accelerated conns/Total conns : 59761/76014 (78%)
Accelerated pkts/Total pkts : 15656842377/16281837058 (96%)
F2Fed pkts/Total pkts : 602750666/16281837058 (3%)
PXL pkts/Total pkts : 22244015/16281837058 (0%)
QXL pkts/Total pkts : 0/16281837058 (0%)

# enabled_blades
fw ips

HeikoAnkenbrand · ‎2020-09-08

Hi @LowJ,

here is good to see that most of the packages pass the acceleration path (96%).
I would reduce the CoreXL instances and use more SND's. In your case I would start with 6/10 (SND/CoreXL) instead of 2/14. If that is not enough, you can also change to 8/8.
I would activate multi queueing for the 10 GBit/s interface (driver ixgbe).

More read here:
R80.x - Performance Tuning Tip - Multi Queue
R80.x - Security Gateway Architecture (Logical Packet Flow)

➜ CCSM Elite, CCME, CCTE ➜ www.checkpoint.tips

LowJ · ‎2020-09-08

Hi HeikoAnkenbrand,

As per my understanding change of CoreXL instances will need to do it like in maintenance window as reboot require. So i may not be able to do it soon.

If enable multi-queue for the 10 GB interface have any dependency to assignment of SND/CoreXL? Mean i don't change CoreXL setting 2/14 [SND/CoreXL] (which need to plan in a maintenance window) but enable multi-queue first - which can activate via WebUI (no reboot require)?

Thanks for you help and reply.

Timothy_Hall · ‎2020-09-08

Agree with Heiko here, I'd move to a 6/10 split by reducing firewall workers/kernel instances from 14 to 10. Enabling Multi-Queue before adjusting the split will NOT help and will probably make things worse by putting even more load on the SND cores. Adjust the split and reboot. Then enable Multi-Queue on your busy interfaces and reboot. Do not try to adjust the split and enable Multi-Queue with only one reboot as that will cause all kinds of problems.

What code version and Gaia kernel (2.6.18 vs. 3.10) are you using?

Attend my Gateway Performance Optimization R81.20 course
CET (Europe) Timezone Course Scheduled for July 1-2

LowJ · ‎2020-09-08

Still with v2.6.18

Noted on those recommendation.

Will plan to adjust the SND/CoreXL first then only enable the multi-queue for the interface.

Are you a member of CheckMates?

High CPU utilization causing slowness