Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Lesley
Mentor Mentor
Mentor

Let's discuss: Important Notification to Customers with SecureXL User Mode (UPPAK) enabled, on R81.2

We have received the following message today, I assume more people did. Let's discuss this topic. 

Anyone experienced this issue yet? I have some setups running this take but have not seen this issue. 

I see SK already has been updated with more information so that is good. I assume that the fix will be included in upcoming takes? And with a crash, gateway does full reboot? Or some processes are restarting and there is a ''hickup''? 

We would like to bring the following to your immediate attention. We have identified that configuration of your environment with SecureXL User Mode (UPPAK) enabled on your Security Gateway may have compatibility issues with Take 96 / Take 98.

Action required :
Please follow sk183181 for remediation steps.

Symptoms

  • Security Gateway with the R81.20 Jumbo Hotfix Accumulator Takes 96 and Take 98 may crash frequently when SecureXL works in the User Mode (UPPAK).

Cause

Race condition may occur in the ADP acceleration driver when updating network routes for SecureXL in the User Mode (UPPAK):

  • In Take 96, the process "usim_x86" crashes with a core dump file in the /var/log/dump/usermode/ directory.
  • In Take 98, the Security Gateway crashes with a VMcore dump file in the /var/log/crash/<DATE>/vmcore/ directory.

Affected versions:

  • R81.20 Jumbo Hotfix Accumulator Take 96
  • R81.20 Jumbo Hotfix Accumulator Take 98

How to check if your Security Gateway may be affected?

  1. Connect to the command line on the Security Gateway.
  2. Check the current R81.20 Jumbo Hotfix Accumulator Take:
    cpinfo -y all
  3. Check the current SecureXL mode:
    1. Run:
      fwaccel stat
    2. Examine the column "Name":
      • "KPPAK" means Kernel Mode - this SK article does not apply.
      • "UPPAK" means User Mode - this SK article applies.

Solution

Contact Check Point Support to get a Hotfix for this issue.

-------
If you like this post please give a thumbs up(kudo)! 🙂
(1)
10 Replies
the_rock
Legend
Legend

Thanks for that Lesley. Lets hope fix is rolled into next jumbo release.

Andy

0 Kudos
Thomas_Eichelbu
Advisor
Advisor

Hello Team, 

yes i just saw it ...
the question i have ...

if you run Quantum Force or QLS gateways with Light Speed NICs. What will happen to those Light Speed functionalities and most important the speed you gain by Light Speed cards if you disable UPPAK for SXL?
Is the performance boost of Light Speed also available if SXL runs in KPPAK or just at UPPAK?
when you check the limitations on UPPAK you will be surprised how much is not supported.
https://sc1.checkpoint.com/documents/Appliances/100G_Ports_AdminGuide/Content/Topics-100G-Card-AG/Kn...

For Maestro UPPAK is supported since HFA 89.
But some TAC say they dont recommend UPPAK for Maestro ... the reasons are not yet known to me ... someone has any clues?

0 Kudos
Thomas_Eichelbu
Advisor
Advisor

aha, people who can read have an advantage .. it is written:

uppak.PNG

so only UPPAK brings the promoted performance advertised by the datasheet, without UPPAK its not the case ... 

0 Kudos
Timothy_Hall
Legend Legend
Legend

Interesting that this notification came out on the same day I delivered my "Be Your Own Tac Part Deux" presentation at CPX Vegas which extensively discusses the limitations imposed by UPPAK.   I think @PhoneBoy  will be posting my slide deck today; the list of UPPAK limitations I compiled is quite extensive.  F2F/slowpath performance is definitely lower than KPPAK, however I believe Medium Path and Fastpath performance is better but I can't quantify the amount of difference as I do not have access to Quantum Force or Lightspeed hardware which is required to work with UPPAK.

My current understanding is that UPPAK does improve performance mainly by using poll mode instead of being interrupt-driven for packet handling on the SND cores.  The UPPAK framework will also permit hardware-based acceleration in the Mellanox/Nvidia SmartNICs but I do not believe that part is actually in use yet.

Attend my 60-minute "Be your Own TAC: Part Deux" Presentation
Exclusively at CPX 2025 Las Vegas Tuesday Feb 25th @ 1:00pm
the_rock
Legend
Legend

Excellent! Just had a quick look and I find table on page 51 super useful.

Andy

0 Kudos
OxO
Explorer

Hi,

I think I had this problem...

A few weeks ago I had to exchange about nine clusters and with/to R81.20 JHF96

Appliance types that were replaced:

6x00 -> 9x00
6x00 -> 6900
3x00 -> 6x00

Affected rate:

9000 clusters: affected: 100% 
6000 clusters: affected: 0%

...and always/only at the same cluster change/implementation step:

Cluster Node1: Replaced and Running as HA Status: Active (Policy: Access + TP)
Cluster Node2: Replaced and Running as HA Status: Standby (Policy: Access)


so we can say: the cluster is "full" replaced and layer2/3 connected (the accessibility of router gateways and arp tables were checked)... what is still missing is the last policy install, so that the second node also gets the TP policy...

...and only with this "last" policy install I got the core dumps (vmcore_zero64.gz)

impact (varies):

9000-cluster#1: both new-members died at the same time!
9000-cluster#2: "only" the new-active node died (which has already been running as active for 2 hours (differen cluster-node locations))


After the dump(s), both cluster nodes were rebooted one after the other and then the last policy install worked... since then, policy installs (regardless of whether: full / tp only / access only) worked without new dumps.

But now jhf98 is installed everywhere...

 

I therefore had several core dumps and CP (Case# 6-0004180684) said:

PRJ-59644, PRHF-38332
----------------------
jhf96: ontop fix -> accel_HOTFIX_R81_20_JHF_T96_828_MAIN_GA_FULL.tar
jhf98: informed me that the fix would be integrated and it was released immediately


that's why I'm surprised that jhf98 is listed here again....

I reopened my case and wait for an answer whether it is an typo in the SK, or there are further/other problems in this direction.

Best
Michael

0 Kudos
Alex-
Leader Leader
Leader

I seem to have encountered that issue on new Force appliances and the IKE crash as well. Fun times.

When we look at Take 98, it was released and set to recommended in a very short timeframe, for what happens to be a serious issue. I don't see how much QA and feedback from production can be obtained in just a few days.

Probably a though call to make by the release group who must be under quite some pressure to articulate all of this. Anyway, we're stopping new deployments of Take 98 and stay on Take 92 which seems stable for now.

(1)
the_rock
Legend
Legend

I had few customers also asked me about it, since it showed jumbo 98 came out Feb 12th and it was recommended take already on Feb 19th, just 1 week later. Seems way to fast to do so, in my humble opinion.

Andy

0 Kudos
Lesley
Mentor Mentor
Mentor

Take 96 was already affected by this bug. It is true that 98 was put to recommended quickly but it only contained one fix. So it makes sense to make it more quickly recommended then when it was a big patch. 

Release notes take 98:

PRJ-59644,

PRHF-38332

Security Gateway

Security Gateway may crash when route lookups encounter an unresolved next hop.

See the Important Notes section.

-------
If you like this post please give a thumbs up(kudo)! 🙂
0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events