R80.40 Jumbo Hotfix Accumulator - New GA Take #154

eranzo · ‎2022-03-08

Hi All,

R80.40 Jumbo HF Take #154 is now our GA take (replacing take 139) and is available for download to all via CPUSE (as recommended) and via sk165456.

Full list of resolved issues can be found in sk165456.

New: Starting from R80.40, Central Deployment allows you to perform a batch deployment of Hotfixes on your Security Gateways and clusters from SmartConsole!!

For more information, see sk168597.

Thanks,

Release Operation group

JohAicher · ‎2022-03-15

We found that the 80.40 HFA154 cores many times after upgrade

MatanYanay · ‎2022-03-15

Hi @JohAicher

We have a known issue when we have SIP reinvite traffic and SIP configured in the rulebase (SIP multicore is enabled, which is the default mode) we are handling this now and will provide solution during next week.

Thanks

Matan.

Thomas_Eichelbu · ‎2022-03-16

Hello,

yes i have just discovered the same ...
Take 154 has problems with SIP ...
TAC adviced us to use Take 139 instead ...

also we saw httpd crashing every 5min ... this goes hand in hand with packetloss/soft lockups ...
but we did not see any core dumps what so ever ...
but this installation iam referring too, has not many blades enabled.

😞

PhoneBoy · ‎2022-03-16

SIP enforcement is done in the firewall blade, which is why you are seeing this even with few blades enabled @Thomas_Eichelbu

Baumi77 · ‎2022-03-16

We had to roll back as the update was causing route daemon to die (continously) to Take 125 (what we had before).

Whole network died because of this - poor QA

Thomas_Eichelbu · ‎2022-03-16

Yes correctly,

is somebody aware of how to disable SIP Multicore support?
via a kernelparameter or something?

it is perhaps worth to try it

MatanYanay · ‎2022-03-16

Hi @Thomas_Eichelbu

beside the SIP issue that we are handling now and will provide solution during next week, Take 154 adoption is high and is a very good take.
Customer that are not facing any issue , can keep using take 154

Thanks

Matan.

MatanYanay · ‎2022-03-16

Hi @Thomas_Eichelbu

SIP MultiCore allows distribution of SIP traffic across all FW instances, disabling it will cause performance impact on instance fw_0.

The crash can occur when SIP is configured in the rulebase and as its, a race condition it may not reproduce for every GW.

Due to the above we are not recommending to disable it, we are working to fix the issue and provide solution early next week

If needed you can contact TAC and ask for HF on top the take you have

Thanks Matan

MatanYanay · ‎2022-03-20

Hi all

We just released a new ongoing jumbo with a fix for the sip issue

https://community.checkpoint.com/t5/Product-Announcements/R80-40-Jumbo-Hotfix-Accumulator-New-Ongoin...

Thanks

Matan.

David_C1 · ‎2022-03-21

Take 150 had this SIP issue as well, which we learned the hard way (not a good day in the office). I opened a TAC case and they have provided patches for Takes 139 and 150 (and am expecting one for Take 154). Based on the new on-going Take 156, I will likely wait a bit and install Take 156. For those who have experienced the SIP issue, I will be interested to hear if the problem is solved with Take 156.

Dave

Duane_Toler · ‎2022-03-26

For R80.40 HFA 154, I had a customer with AWS IPsec VPNs that were failing to pass packets (with NAT-T enabled, too), with or without permanent tunnels. Route-based VPN (albeit static routes), as well. Fought the issue all week last week. Eventually turned into Remote Access VPN issues, as well; users disconnected after 10 seconds, and no packets being passed, even with office mode IP and topology download. VPN debug looked ok. I have T3 case open with TAC for all of it. Eventually gave up and changed Grub boot menu to reboot back into R80.30 volume. 😞

I'll coordinate with the customer to reboot into R80.40 and rollback HFA154 to HFA 139.

Terribly unfortunate for HFA 150/154... Reminds me of R80.30 HFA 216/219... ouch.

R80.40 HFA 139 has been excellent, however!

Naama_Specktor · ‎2022-03-26

@Duane_Toler

Hi ,

My name is Naama Specktor , I am a checkpoint employee ,

I will appreciate it if you will share the TAC SR # , here or in PM.

thank you!

Naama

Duane_Toler · ‎2022-03-27

@Naama_Specktor SR sent as PM. Thanks!

idants · ‎2022-03-29

Hi,

We reviewed the case and we don't see any evidence to a degradation in take 154.

As I understand, the problem exists also in R80.30. We believe that we have a solution for this case and we shared a possible solution in the SR.

Anyway, we will be happy to discuss it offline with you and the customer to investigate it and assist.

Thanks,

Idan Tsarfati

IPsec VPN & HTTPSi R&D group manager.

Duane_Toler · ‎2022-03-29

Yes, I did have a very similar issue in R80.30. After a policy install, the AWS VPN would fail to pass packets. SA alleges to be established. In R80.30, I could always delete the SA with "vpn tu" and the VPN session would restore. However, in R80.40, the behavior is changed. The same behavior occurs (one-way packets to AWS via IPsec NAT-T and no replies), but now I cannot delete the SA with "vpn tu". I expect the SAs may renegotiate after the configured timeout (3600 seconds for Phase 2), but I could never wait that long due to the outage. The only recourse was either "cpstop;cpstart" or "vpn drv reset" (yes, both are terrible). I tried with or without permanent tunnels, but the effect was the same either way.

Additionally, in R80.40, the issue eventually expanded to cause problems for Remote Access VPN users (Endpoint VPN IPsec, not MAB/SSL-VPN). In R80.30, RA VPN users never had a problem. In R80.30, I never had to use "vpn drv reset".

For comparison, I have many other customers on R80.40 JHF 139 and none of them have this issue. This customer was the first one to use JHF 154 (someone has to go first!).

I do not yet see any updates in my SR, but I do have a call scheduled today to speak to someone about this SR. Thank you for your time to look into this!

Chris_Atkinson · ‎2022-03-29

@Duane_Toler Do you already have the solution outlined here in place for your AWS setup?

https://community.checkpoint.com/t5/Connect/2-Tunnels-Active-Backup-inside-the-same-Site-to-Site-VPN...

Duane_Toler · ‎2022-03-29

Hey @Chris_Atkinson ! Thanks for the links. sk142355 looks to match my scenario. I searched the VPN debug and see the debug line it mentions. The veritable needle in the haystack! 🙂

vpnd.elg.5:[vpnd 32511 4074985408]@nutmeg[22 Mar 23:41:56][tunnel] SAdeleteAll: keep_IKE_SAs is not set

I'm going to discuss this on my call today and discuss it with the customer afterwards. I hope this fixes it!

(BTW - should we move this to a new Check Mates post? Can Val make that happen? Don't want to pollute this one if it's unnecessary.)

Thanks again!

Duane_Toler · ‎2022-03-30

Hey all,

Thanks to @Chris_Atkinson @idants @eranzo and the escalation manager I spoke to yesterday, indeed our issue came down to "keep_IKE_SAs"! Wow... crazy that a simple checkbox changes/fixes everything! 🙂 I enabled that and it resolved our issue for the AWS VPNs (in R80.30). I expect this will also fix the issue in R80.40 (and HFA 154). I am coordinating a time with the customer to switch back to R80.40 and make the same change.

idants · ‎2022-03-30

Great news 🙂

It should work on R80.40 154 as well.

Duane_Toler · ‎2022-04-12

FYI - this change in SmartConsole Advanced config to keep_IKE_SAs indeed worked to stabilize the AWS VPN! I've done numerous installs since then and all has been stable.

R80.40 Jumbo Hotfix Accumulator - New GA Take #154

Are you a member of CheckMates?