Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
lucafabbri365
Collaborator

Check Point Node Freeze

Hello,

we have an Open Server with Check Point R80.10 ClusterXL (two nodes) with these enabled blades: IPSec VPN, Mobile Access, Application Control, URL Filtering, IPS, Anti-Virus, Identity Awareness, Monitoring. The server has 16 CPUs but was licensed for 3.

Sometimes/randomly it happens the active node freezes; the console (SSH) becomes unavailable. Before resetting it forcibly I managed to launch cpview and take this screenshot from iLO (it is an HPE server):

Now I understand it should be necessary a deep investigation; but what's could be the root cause for all CPUs reaching more than 90% of usage ? Do you think 3 processors license are enough for all enabled blades ?

Thank you,

Luca

12 Replies
Vladimir
Champion
Champion

Please read last post by Timothy Hall‌ here that likely describes your situation perfectly:

Multiple questions (licensing, number of cores) that starts with:

"Any time the number of licensed cores differs from the number of physical cores on open hardware gateways, watch out for what call I call the licensing "core crunch" in the second edition of my book."

lucafabbri365
Collaborator

Hello Vladimir,

here the current output of fw ctl affinity -l -r:

The "Core Crunch" behavior doesn't seem to occur here, isn't it ? We have 3 Firewall Worker assigned to each CPU.

Bye,

Luca

Vladimir
Champion
Champion

You are licensed for 4 cores, not 3 and are using all of them.

Please go with Danny's suggestions to get more info.

0 Kudos
lucafabbri365
Collaborator

Sorry, you are right. My mystake: 4 CPUs licensed.

Danny
Champion Champion
Champion

Could you please show me a screenshot of the main menu of our ccc script running?

Let's find the root cause.

  • Perform a Gaia health check and show us the resulting html output
  • Generat a 30-day eval license to have your system running at 16 cores for the moment to be able to further debug
  • Check the system history statistics within SmartView Monitor
  • Check your system logs /var/log/messages, $FWDIR/log/fwd.elg etc. for errors and warnings
  • Use our ccc script to run Tim's Super7 commands and further check your system settings
lucafabbri365
Collaborator

Hello Danny,

this is the screenshot from affected node (now standby):

This is from active node:

I'l will check your other points...

Bye,

Luca

Timothy_Hall
Legend Legend
Legend

As Danny mentioned, Super Seven outputs would be helpful here.  Your screenshot shows cores 1-3 which are all your Firewall Workers getting very busy which indicates a lot of PXL or even F2F path traffic.  Some tuning will probably help, but not as much as licensing another 4 cores I would imagine, as the only things cores 4-15 can do is handle generic Gaia/Linux processes.

There is not a "core crunch" present as there is only one Firewall Worker assigned to cores 1-3 each.

--
Second Edition of my "Max Power" Firewall Book
Now Available at http://www.maxpowerfirewalls.com

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
Ofir_Shikolski
Employee Alumnus
Employee Alumnus

Firewall Priority Queues in R77.30 / R80.10 and above 

Packets could be dropped by Firewall when CPU cores, on which Firewall runs, are fully utilized. Such packet loss might occur regardless of the connection's type (for example, local SSH or connection to Security Management Server server).

To help mitigate the above issue, Firewall Priority Queues feature was introduced in R77.30 Security Gateway.

  • Explanation about Control Connections

    Firewall R77.30 / R80.10 and above assigns higher priority to control connections than to other connections.

    By default, the following services are considered by Firewall R77.30 / R80.10 and above as control connections:

    • Check Point CPMI
    • Policy installation/fetch (CPD daemon)
    • Check Point Remote Installation (CPRID daemon)
    • SSH
    • DHCP
    • OSPF
    • BGP
    • VRRP

    Not all control services get the same priority. Firewall R77.30 / R80.10 and above prioritizes some control services over the other control services.

lucafabbri365
Collaborator

Hello all,

just an update regarding this issue, so it could be helpful for other users.

After IPS and Anti-Virus rules optimization, overall performances were increased (we still have 4 CPUs but we planned to update to 8 CPUs).

Thank you for your support.

Regards,

Luca

0 Kudos
Vladimir
Champion
Champion

Can you share the data on how many hosts are behind this cluster (approximation), if you are running HTTPS inspection and what is the Internet bandwidth utilization looks like?

I am interested in these data points to be able to size appliances more accurately based on the real world experience.

Thank you,

Vladimir

0 Kudos
lucafabbri365
Collaborator

Hello Vladimir,

here some overall information regarding our Check Point environment:

  • 2 Open Servers as Security Gateway, one VM as Security Management
  • IPS, Anti-Virus/Anti-Bot, Identity Awareness, Application & URLs Filtering, IPSec VPN, Mobile Access, Identity Awareness, ClusterXL, Monitoring blades enabled
  • Application & ULRs Filtering enabled for all
  • HTTPs Inspection enabled, but all traffic is in ByPass mode with the exception of my client computer that is in Inspect (I'm just testing it and before enabling for all, we need to update CPUs license to 😎
  • 465 Network Objects
  • 29 Site-To-Site VPNs (all Interoperable Devices: Juniper, Cisco, Fortigate, SonicWall, Cisco Mearki)
  • 28 - inbound (x2 - outbound) manual NATs
  • About 1100 clients behind defined networks (VLANs)

I think it could be enough.

Bye,

Luca

Vladimir
Champion
Champion

Thank you for the data!

If I may trouble you some more:

1. what is the average Internet bandwidth consumption you are seeing?

2. were you using "Optimized" IPS profile before encountering high utilization?

3. if you were using customized IPS profiles, were there any particular protections that were found to be responsible for the  bulk of the impact on CPU utilization?

Vladimir

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events