Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
EnriqueGD
Participant

High CPU in only one core R77.30

Jump to solution

Hello,

 

We are having performance issues on one machine, and we beliave that it´s due to high CPU utilization of one of the core:

 

1.- CPU 3 vary between 80-100% in a normal situation

 

| CPU: |
| |
| Num of CPUs: 4 |
| |
| CPU Used |
| 3 99% | <<<<<<<<<<<<<<<<
| 1 47% |
| 0 36%

 

[Expert@fw-extra-jc-02:0]# cpstat -f multi_cpu os

Processors load
---------------------------------------------------------------------------------
|CPU#|User Time(%)|System Time(%)|Idle Time(%)|Usage(%)|Run queue|Interrupts/sec|
---------------------------------------------------------------------------------
| 1| 4| 10| 85| 15| ?| 49|
| 2| 3| 2| 95| 5| ?| 49|
| 3| 3| 2| 96| 4| ?| 49|
| 4| 0| 94| 6| 94| ?| 49|  <<<<<<<<<<
---------------------------------------------------------------------------------

 

2.- We can se how the FW drops traffic due to the high CPU utilization:

Drops: |
| |
| Software Blades 2,121,406,315 |
| Interface incoming drops 5,107 |
| Instance high CPU 293,267 | <<<<<<<<<<<<<<<<<<<<<<<<<
| Rulebase 26,058,776 |
| Capacity 0 |
| SecureXL 0 |
| Drop out of state TCP enabled

 

3.- The affinity is as follow:

[Expert@fw-extra-jc-02:0]# fw ctl affinity -l -r
CPU 0: eth2 eth3 eth6 eth7 eth8 eth12 eth13
CPU 1: fw_2
CPU 2: fw_1
CPU 3: fw_0
All: fwpushd rtmd mpdaemon fwd vpnd cprid cpd

 

4.- The output of the fwaccel is as follow

[Expert@fw-extra-jc-02:0]# fwaccel stat
Accelerator Status : on
Accept Templates : disabled by Firewall 
disabled from rule #1427 <<<<<<<<<<<<<<<<<<<<<<<<<<<
Drop Templates : disabled
NAT Templates : disabled by user <<<<<<<<<<<<<<<<<<<<<<<<<<<

Accelerator Features : Accounting, NAT, Cryptography, Routing,
HasClock, Templates, Synchronous, IdleDetection,
Sequencing, TcpStateDetect, AutoExpire,
DelayedNotif, TcpStateDetectV2, CPLS, McastRouting,
WireMode, DropTemplates, NatTemplates,
Streaming, MultiFW, AntiSpoofing, Nac,
ViolationStats, AsychronicNotif, ERDOS,
NAT64, GTPAcceleration, SCTPAcceleration,
McastRoutingV2
Cryptography Features : Tunnel, UDPEncapsulation, MD5, SHA1, NULL,
3DES, DES, CAST, CAST-40, AES-128, AES-256,
ESP, LinkSelection, DynamicVPN, NatTraversal,
EncRouting, AES-XCBC, SHA256

 

<<< I can´t not understand why it sais that the "Accept Templates" are disabled by the rule 1427, bacause we don´t have so many rules defined.

<<< Is it recomendable to enabled NAT Templates?

https://supportcenter.checkpoint.com/supportcenter/portal?eventSubmit_doGoviewsolutiondetails=&solut...

 

5.- Most part of the packets that the FW receives goes to the slow path. Is that percentage normal?

[Expert@fw-extra-jc-02:0]# fwaccel stats -s
Accelerated conns/Total conns : 357/4899 (7%)
Accelerated pkts/Total pkts : 60146117/347837617 (17%)
F2Fed pkts/Total pkts : 286984245/347837617 (82%) <<<<<<<<<<<
PXL pkts/Total pkts : 707255/347837617 (0%)
QXL pkts/Total pkts : 0/347837617 (0%)

 

Thank´s a lot in advance.

 

Regards,

Enrique.

0 Kudos
1 Solution

Accepted Solutions
Timothy_Hall
Champion
Champion

Your firewall seems to be doing OK health-wise other than the bottleneck on CPU #3 (fw_0).  No need to adjust CoreXL or SecureXL.  A few points:

1) Setting fwmultik_dispatch_skip_global to 1 will only help a little, since the primary purpose of this box is VPNs.

2) I see you have cvpn/MAB enabled which also must always be handled on fw_0 by default in R77.30.  However it is possible to spread out the cvpn/MAB load across all the workers in your code version, see sk101223: MultiCore Support for SSL in R77.20 and above.  Depending on how much of the fw_0 load is cvpn-related (probably quite a bit given the current work-from-home situation), enabling this might really help a lot.  MultiCore SSL is enabled by default starting in R80.10.

3) Your F2F/slowpath percentage is quite high for the limited features you have enabled.  But I think cvpn/MAB traffic is always handled F2F so if you have a lot of that traffic type such behavior would be expected.  But a few things to check anyway:

  • Are you using SHA-384 in any of your VPNs?  This will cause that VPN's traffic to always go F2F.
  • Are you using 3DES in any of your VPNs (including cvpn/MAB)?  Converting them to only use AES (if possible) will dramatically drop CPU load, especially if your firewall hardware supports the AES-NI processor extensions (dmesg | grep -i AES-NI to check for this).  Note that in your old code version the GCM flavors of AES cannot be handled by SecureXL (and will always go F2F) so don't use those.  See sk73980: Relative speeds of algorithms for IPsec and SSL and sk98950: Slow traffic speed (high latency) when transferring files over VPN tunnel with 3DES encrypt... for more info.
  • Do you have a lot of fragmented packets present?  These will always go F2F in your old code version, run fwaccel stats -p to check this.

Once again, practically all these known issues would be immediately solved by upgrading to R80.30.  🙂

"Max Capture: Know Your Packets" Video Series
now available at http://www.maxpowerfirewalls.com

View solution in original post

10 Replies
Timothy_Hall
Champion
Champion

Need to also see output of enabled_blades, free -m, netstat -ni, and fw ctl multik stat.

Given that CPU #3 where the high load is being encountered also is the lowest-numbered Firewall Worker instance (fw_0), my guess is you have a lot of VPN and/or VoIP traffic.  In R77.30 and earlier this type of traffic may only be processed in fw_0 (the "IPSec core") and may not be balanced among the other Firewall Workers, even if the Dynamic Dispatcher is enabled.  This limitation was lifted in R80.10 (sk118097: MultiCore Support for IPsec VPN in R80.10 and above)

First off, make sure the Dynamic Dispatcher is active as it is not enabled by default on R77.30.  This won't directly help your VPN/VoIP problem but will keep the Firewall Workers more balanced in general.

To alleviate the bottleneck on CPU #3, there are two courses of action for version R77.30 which will help a little bit, but upgrading to at least R80.10 will help the most:

1) Quoted from the first edition of my book:

However there is a kernel variable called fwmultik_dispatch_skip_global that,
when changed from its default of 0 to a value of 1, will instruct the IPSec Firewall
Worker Core to stop acting as a “generic” Firewall Worker Core and focus exclusively
on IPSec operations, and nothing else. Needless to say this can dramatically reduce the
overall utilization of the IPSec core and free up resources for more timely IPSec
processing.

This new kernel variable was actually introduced in R77 as part of Hyperspect
(covered in Chapter 8), so the information about this kernel variable is located here:
sk93000: SMT (HyperThreading) Feature Guide). Do NOT attempt to set this kernel
variable “on the fly” with the fw ctl set command; doing so is not supported. This
kernel variable must be set in the fwkern.conf file and the firewall rebooted for the
change to take effect. Keep in mind though that you are essentially reducing the
effective number of Firewall Worker Cores by one, so the general recommendation is to
only consider setting this variable if you have at least 6 Firewall Worker Cores and the
aggregate idle percentage of all Firewall Worker Cores prior to the change is greater than
30% during the firewall’s busiest period. You can verify the new behavior after the
reboot by running fw ctl multik stat again and noting the low connection count
passing through the lead IPSec Firewall Worker Core; the only connections it processes
now are associated with IPSec VPN tunnels (and VoIP as mentioned earlier).

2) Tune your configuration such that more IPSec VPN traffic can be accelerated by SecureXL and get handled on Core #0.  To what degree you will be able to accomplish this will depend on the results of the four commands I requested at the start of the post.  You do have a very high F2F percentage, so this may be a productive course of action.

Bottom line: this is a well-known issue in R77.30 and earlier and you really should upgrade, R77.30 isn't even supported anymore.

 

"Max Capture: Know Your Packets" Video Series
now available at http://www.maxpowerfirewalls.com
EnriqueGD
Participant

Hello Timothy,

 

Thank´s for your answer, I really apreciate it.

 

You are totally right regarding this is a device which it´s main function is the establishment of VPNs. Here I let the output of the the commands you asked for.

===

[Expert@fw-extra-jc-02:0]# enabled_blades
fw vpn cvpn
[Expert@fw-extra-jc-02:0]# free -m
total used free shared buffers cached
Mem: 3976 3849 126 0 275 1279
-/+ buffers/cache: 2294 1682
Swap: 9470 0 9470

===
[Expert@fw-extra-jc-02:0]# netstat -ni
Kernel Interface table
Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
bond0 1500 0 1251166100 0 5107 0 2644862534 0 0 0 BMmRU
bond0.3953 1500 0 0 0 0 0 268461 0 0 0 BMmRU
bond0.4060 1500 0 1238189811 0 0 0 2638452065 0 0 0 BMmRU
bond0.4063 1500 0 0 0 0 0 0 0 0 0 BMmRU
bond0.4078 1500 0 15256 0 0 0 18670 0 0 0 BMmRU
bond0.4091 1500 0 6039554 0 0 0 6065573 0 0 0 BMmRU
eth2 1500 0 2733612021 0 2569 0 2632359481 0 0 0 BMsRU
eth3 1500 0 2812521375 0 2538 0 12503053 0 0 0 BMsRU
eth6 1500 0 1957444 0 0 0 35326750 0 0 0 BMRU
eth7 1500 0 30039231 0 0 0 52396317 0 0 0 BMRU
eth8 1500 0 29855038 0 0 0 52201893 0 0 0 BMRU
eth12 1500 0 0 0 0 0 0 0 0 0 BMU
eth13 1500 0 0 0 0 0 0 0 0 0 BMU
lo 16436 0 41494040 0 0 0 41494040 0 0 0 LRU

===
[Expert@fw-extra-jc-02:0]# fw ctl multik stat
ID | Active | CPU | Connections | Peak
----------------------------------------------
0 | Yes | 3 | 2670 | 11746
1 | Yes | 2 | 51 | 1216
2 | Yes | 1 | 36 | 854

===
[Expert@fw-extra-jc-02:0]# fw ctl multik get_mode
Current mode is On

===
[Expert@fw-extra-jc-02:0]# fw ctl multik stat
ID | Active | CPU | Connections | Peak
----------------------------------------------
0 | Yes | 3 | 2592 | 11746
1 | Yes | 2 | 41 | 1216
2 | Yes | 1 | 24 | 854

===

 

In the meantime I will look into the SK93000, as It could be a release for that CPU.

 

Best Regards,

Enrique.

Timothy_Hall
Champion
Champion

Your firewall seems to be doing OK health-wise other than the bottleneck on CPU #3 (fw_0).  No need to adjust CoreXL or SecureXL.  A few points:

1) Setting fwmultik_dispatch_skip_global to 1 will only help a little, since the primary purpose of this box is VPNs.

2) I see you have cvpn/MAB enabled which also must always be handled on fw_0 by default in R77.30.  However it is possible to spread out the cvpn/MAB load across all the workers in your code version, see sk101223: MultiCore Support for SSL in R77.20 and above.  Depending on how much of the fw_0 load is cvpn-related (probably quite a bit given the current work-from-home situation), enabling this might really help a lot.  MultiCore SSL is enabled by default starting in R80.10.

3) Your F2F/slowpath percentage is quite high for the limited features you have enabled.  But I think cvpn/MAB traffic is always handled F2F so if you have a lot of that traffic type such behavior would be expected.  But a few things to check anyway:

  • Are you using SHA-384 in any of your VPNs?  This will cause that VPN's traffic to always go F2F.
  • Are you using 3DES in any of your VPNs (including cvpn/MAB)?  Converting them to only use AES (if possible) will dramatically drop CPU load, especially if your firewall hardware supports the AES-NI processor extensions (dmesg | grep -i AES-NI to check for this).  Note that in your old code version the GCM flavors of AES cannot be handled by SecureXL (and will always go F2F) so don't use those.  See sk73980: Relative speeds of algorithms for IPsec and SSL and sk98950: Slow traffic speed (high latency) when transferring files over VPN tunnel with 3DES encrypt... for more info.
  • Do you have a lot of fragmented packets present?  These will always go F2F in your old code version, run fwaccel stats -p to check this.

Once again, practically all these known issues would be immediately solved by upgrading to R80.30.  🙂

"Max Capture: Know Your Packets" Video Series
now available at http://www.maxpowerfirewalls.com

View solution in original post

EnriqueGD
Participant

Hello again,

Very usefull your infomation, I will perfom the sk101223 on next monday, and will let you know the result of it.

We are currently using AES, and this is somthing that we can not modify because will impact in all the VPNs. Regarding the fragment packets, here is the output:

fwaccel_stats_p.JPG

Best regards,

Enrique.

0 Kudos
Timothy_Hall
Champion
Champion

AES is the algorithm you want to be using instead of 3DES (but not the GCM variants for your old code version) for better performance, so you are all set there.

Number of fragments doesn't seem excessive and is probably not the cause of high F2F.  Pretty sure high F2F is due to cvpn/MAB traffic and is unavoidable.

 

"Max Capture: Know Your Packets" Video Series
now available at http://www.maxpowerfirewalls.com
0 Kudos

Just to mention that on some appliances you need to grep for 'sha-ni' insetad of 'aes-ni':

# dmesg | grep -i sha

[Wed Apr 15 22:52:55 2020] sha1_ssse3: Using SHA-NI optimized SHA-1 implementation
[Wed Apr 15 22:52:55 2020] sha256_ssse3: Using SHA-256-NI optimized SHA-256 implementation

Timothy_Hall
Champion
Champion

Hmm interesting, this might explain what I was seeing here:

https://community.checkpoint.com/t5/SecureKnowledge/Tip-of-the-Week-VPN-Performance-Best-Practices/b...

Are you referring to a 6000 series appliance?

"Max Capture: Know Your Packets" Video Series
now available at http://www.maxpowerfirewalls.com
0 Kudos

Tim, this is on a 3600 appliance. CPU flags:

flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg cx16 xtpr pdcm sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave rdrand lahf_lm 3dnowprefetch epb cat_l2 intel_pt ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust smep erms mpx rdt_a rdseed smap clflushopt sha_ni xsaveopt xsavec xgetbv1 dtherm arat pln pts md_clear spec_ctrl intel_stibp arch_capabilities

EnriqueGD
Participant

Hello,

We have performed yesterday the sk101223 with great results. Now the CPU utilization is properly distributed along all CORE and the high CPU drops has disappered. 

Thank you a lot for your time and knowledge.

Best Regards,

Enrique.

Timothy_Hall
Champion
Champion

Excellent, thanks for the followup.  Now you can schedule an upgrade of your gateway to R80.30+ to save yourself any future problems.  😀

"Max Capture: Know Your Packets" Video Series
now available at http://www.maxpowerfirewalls.com