Solved: Re: R81.10 VPN site-2-site Palo Alto

BjornErichsen · ‎2024-08-30

EDIT:
Sorry guys. I was misinformed - it now proves that the remote peer is in fact cisco C8500-12X, not Palo Alto firewalls... They are not making it easy on me 🙂

History:
I am managing a CP R81.10 secure GW (VSX) with several VPNs to different vendors.
In late April we created yet another site-2-site VPN tunnel - towards Cisco IOS XE (for the first time), and it worked flawlessly.
In early July we deployed most recent (at that time) Jumbo Hotfix take 152.

Issue:
Since the JHF deployed in July it appears we have had problems when IPsec SA keys are renegotiated (at default time interval of 3660 seconds).
Note that the tunnel works for the vast majority of the time, and the tunneled subnets does reestablish communication eventually without manual intervention, but we do see traffic impact.

VPN Blade logs Rejects of various types - but generally in sequence:

From remote Cisco IOS XE to CP:
Child SA exchange: Ended with error
Initial exchange: Sending notification to peer: Invalid Key Exchange payload

Then from CP to remote Cisco IOS XE:
Child SA exchange: Received notification from peer: No proposal chosen MyMethods Phase2: AES-GCM-256 + HMAC-SHA2-384, No IPComp, No ESN, Group 20 (384-bit random ECP group)

And from Cisco IOS XE to CP
Informational exchange: Ended with error
Initial exchange: Sending notification to peer: Invalid Key Exchange payload

Actions:
We will be upgrading to latest Jumbo Hotfix (which claims to fix some VPN issues though none appear directly related) next week, but in case that does not solve the issue any help would be greatly appreciated.

We already have our eye on DPD @onfigured on Cisco IOS XE since the CP side has not been configured with the tunnel as "Permanent", but I doubt that would cause IPsec renegotiation to fail periodically.

Also we have requested Cisco IOS XE side to first try with a IKEv2 proposal that exactly matches CP configuration. This has not yet been implemented - as these proposals are "global" - but they are looking into it.

I'd really like to hear if anybody have fixed identical issues?

CP VPN community config:
We have a VPN community ("policy based") tunnel with verified encryption domains (subnets) at both ends.
Only allow encrypted traffic
IKEv2 only
Phase 1: AES-256,SHA384.Group 20
Phase 2: AES-GCM-256, PFS group 20
Not permanent and One VPN tunnel per subnet pair
Shared secret (which of course works)
Renegotiate IKE 1440 (minutes)
Renegotiate IPsec 3600 (seconds)
Anything not mentioned should be at default values for R81.10 (initial deployment for this VSX cluster was on R80.40)

Cisco IOS XE config (which I do not control):
#show crypto ikev2 proposal
IKEv2 proposal: VPN_XXXX_PROPOSAL_AES_CBC
     Encryption : AES-CBC-256 AES-CBC-192 AES-CBC-128
     Integrity : SHA512 SHA384 SHA256
     PRF        : SHA512 SHA384 SHA256
     DH Group   : DH_GROUP_2048_256_MODP/Group 24 DH_GROUP_521_ECP/Group 21 DH_GROUP_384_ECP/Group 20
IKEv2 proposal: VPN_XXXX_PROPOSAL_AES_GCM
     Encryption : AES-GCM-256 AES-GCM-128
     Integrity : none
     PRF        : SHA512 SHA384 SHA256
     DH Group   : DH_GROUP_2048_256_MODP/Group 24 DH_GROUP_521_ECP/Group 21 DH_GROUP_384_ECP/Group 20
IKEv2 proposal: default Disabled

#show crypto ipsec profile VPN_XXXX_PROFILE_10029
IPSEC profile VPN_XXXX_PROFILE_10029
        IKEv2 Profile: VPN_XXXX_PROFILE_10029
        Kilobyte Volume Rekey has been disabled.
        Security association lifetime:3600 seconds
        Responder-Only (Y/N): N
        PFS (Y/N): Y
        DH group: group20
        Mixed-mode : Disabled
        Transform sets={
                TS_XXXX_AES_GCM256: { esp-gcm 256 } ,
        }

#show crypto ikev2 profile VPN_XXXX_PROFILE_10029

IKEv2 profile: VPN_XXXX_PROFILE_10029
Ref Count: 5
Match criteria:
  Fvrf: INFRA
  Local address/interface:
   yyy.zzz.xxx.ooo
  Identities:
   address vvv.uuu.ddd.qqq 255.255.255.255
  Certificate maps: none
Local identity: none
Remote identity: none
Local authentication method: pre-share
Remote authentication method(s): pre-share
EAP options: none
Keyring: VPN_XXXX_KEYRING_10029
Trustpoint(s): none
Lifetime: 86400 seconds
no lifetime certificate
DPD: interval 10, retry-interval 5, periodic
NAT-keepalive: disabled
Ivrf: none
Virtual-template: none
mode auto: none
AAA AnyConnect EAP authentication mlist: none
AAA EAP authentication mlist: none
AAA Accounting: none
AAA group authorization: none
AAA user authorization: none
PPK Dynamic: 0 PPK Required : 0 PPK Instance ID:

CaseyB · ‎2024-09-19

@VSX_Bernie @Radu_Ciobanu @BjornErichsen

SR#6-0003995456
fw1_wrapper_HOTFIX_R81_10_JHF_T156_162_MAIN_GA_FULL.tgz

View solution in original post

BjornErichsen · ‎2024-11-08

Just for your information. Checkpoint finally delivered a hotfix that solved the issue for us last week. Long journey 🙂
fw1_wrapper_HOTFIX_R81_10_JHF_T158_356_MAIN_GA_FULL.tar

View solution in original post

the_rock · ‎2024-08-30

See if my post below helps and how we got it working.

Andy

Might not be exact same issue, but you may find it useful.

VPN route based query - Check Point CheckMates

BjornErichsen · ‎2024-09-01

Thanks - but IKEv1 (as I read your solution) is not really an option for us. Audit will be on our backs 🙂

CaseyB · ‎2024-08-30

This sounds like a familiar issue I had with a set of Palos. Looking at my old e-mails, I found this snippet:

These are the errors I have been seeing:

Child SA exchange: Sending notification to peer: No proposal chosen MyMethods Phase2: AES-GCM-256 + HMAC-SHA2-256, No IPComp, No ESN, Group 19 (256-bit random ECP group)
Initial exchange: Sending notification to peer: Invalid Key Exchange payload
Child SA exchange: Exchange failed: timeout reached.

The Palo side was using this:

Phase 1 has these configured:

DH Groups 21, 20, 19, 14, 5, 2

Encryption AES-256-GCM, AES-256-CBC

Authentication sha512, sha384, sha256, non-auth

Lifetime: 24 hours

Phase 2 has these configured:

DH Group 19

Lifetime 1 hour

Lifesize 4608 MB

Encryption: aes-256-gc, aes-256-cbc, aes-192,cbc, aes-128-gcm, aes-128-ccm, aes-128-cbc

Authentication: sha512, sha384, sha256

The resolution: Palo side created a profile specifically for our tunnel to use the same encryption ciphers we were sending instead of using a global profile with several ciphers enabled.

the_rock · ‎2024-08-30

Thats an excellent point! I know PAN guy did that in our case (the post I referenced), though there, main issue was ID he needed from CP side to make it work fully. I believe things like what you advised @CaseyB are less relevant for say Cisco or Fortinet, but for PAN, I do know it matters more.

Andy

BjornErichsen · ‎2024-09-01

Thanks - yes I read somewhere that is solved the issue (maybe an older comment by you? ), and this is actually what I meant by "Also we have requested Cisco IOS XE side to first try with a IKEv2 proposal that exactly matches CP configuration" in OP
Specifically put this in top of global proposals or as specific for this tunnel:
IKEv2 proposal: VPN_XXXX_PROPOSAL_AES_GCM_CP
     Encryption : AES-GCM-256
     Integrity : none
     PRF        : SHA384
     DH Group   : DH_GROUP_384_ECP/Group 20

This gives me hope that it might actually solve the issue.

Its just sickens me that the issue apparently arose after an CP JHF deployment, because it leaves little confidence in the future stability of this VPN. Thankfully its mostly used for mgmt access, but still...

the_rock · ‎2024-09-02

I think that may had been coincidental, I really do.

Andy

the_rock · ‎2024-09-02

Forgot to ask, did tunnel work constantly without any issues BEFORE jumbo install?

Andy

BjornErichsen · ‎2024-09-03

Yes - at least no users reported errors - and firewall log contains none of these Rejects prior to reload after JHF was applied. If there were any issues they were insignificant. Edit: or not logged...

JozkoMrkvicka · ‎2024-09-02

What JHF was installed on CP when all seemed to work ?

To which JHF you updated CP when you noticed the issue?

Did you uninstall the problematic JHF to confirm it is 100% related to JHF update ? If so, TAC should be involved and provide root cause

Kind regards,
Jozko Mrkvicka

BjornErichsen · ‎2024-09-03

We upgraded from JHF take 129 to JHF take 152

We did not try to uninstall JHF take 152 since it took a long time before the issue was discovered (due to summer vacation) and we need security fixes in 152 more than we need this mgmt access to be stable.

We just installed latest JHF take 158, but it does not appear to have solved anything. Next step is to have the Cisco IOS XE IKEv2 proposal changed.

the_rock · ‎2024-09-03

Hey @BjornErichsen

Did you check out the post I referenced? I can even send you sreenshot of the ID Im referring to. Again, not sure if its 100% applicable in your situation, but worth checking.

Andy

CaseyB · ‎2024-09-03

FYI - I have an open TAC case with R&D, a lot, if not all my IKEv2 tunnels started generating IKE failures on re-keys in R81.10 JHF 152+. I had to revert to JHF 150 to resolve the errors.

Most of my tunnels are IKEv2: AES-256, SHA256, AES-GCM-256.

Alex- · ‎2024-09-03

We have solved a lot of VPN issues by going to R81.20 Take 65+ on both VSX and ClusterXL on all sort of VPN configurations.

The VPN feature seems really improved in that version of the OS.

BjornErichsen · ‎2024-09-03

Thanks - good to know.
Unfortunately I am stuck on R81.10 management servers a little while longer due to appliances - soon to be hardware refreshed.

BjornErichsen · ‎2024-09-04

ooh well - that sounds ominous.
You can probably add this post to the case then. Looks like VPN blade errors are not limited to Cisco IOS XE in my end either, but that specific VPN just has more error-logs than all the others combined, and we don't have user complains about the other VPNs (yet).
Currently running R81.10 JHF take 158 (I know JHF 156 is the recommended take for the time being).

Edit: All errors are Rejects - 94% of errors are inbound to checkpoint firewall

Edit-2: Ended up creating a TAC case of my own - will post results once its concluded.

Radu_Ciobanu · ‎2024-09-17

Greetings,

We're facing the same thing after deploying the latest JHF on 81.10, what was the outcome of the ticket in your case?

The issue isn't restricted to Cisco peers, we can see tunnels with all sorts of vendors exhibiting the same behavior - full re-establishment every ~2 minutes due to "Invalid Key Exchange payload"

Thank you.

BjornErichsen · ‎2024-09-18

Unfortunately our access to CheckPoint support is not direct, but we finally managed to get them to build HF as per
Troubleshooting the "no proposal chosen" error (checkpoint.com)
This has not been delivered yet.
I'll keep you posted.

VSX_Bernie · ‎2024-09-18

Hello @BjornErichsen,

Did you have any conclusion on your TAC case?
We are experiencing the exact same issue on all IKEv2 tunnels on a VSX cluster with 45 VS, after having upgraded from R81.10 Take 139 to Take 156.

It really seems very likely that a change to the behavior of phase 2 re-negotiations for IKEv2 tunnels have been made in the code, and this has not been amply (or at all) described in the release notes of Take 152.

The only thing I can find in the release notes that sound like something that could cause this, is the following:
"
PRJ-53366/PRHF-32706

VPN IKEv2 negotiation with a third party peer may fail when the peer offers multiple combined encryption algorithms in one proposal. For example, AWS by default offers AES-GCM and AES-GCM-256. The issue triggers an IKE failure log.
"

This though is noted as a fix in Take 139 - so it does not really add up.

I am quite confident that changing to IKEv1 will solve the issue.
I have not tried this out yet however - as it is not really the solution to the problem - only a "hotfix" so to speak.

I am contemplating creating my own TAC case because of this.

BjornErichsen · ‎2024-09-18

We are still awaiting the hotfix, or at least more detailed information from CheckPoint.
I think you should definitely open your own TAC case as it will probably be faster than mine (which is "by proxy"), and the more TAC cases concerning this, the more likely CheckPoint is to take it seriously.

Limiting VPN encryption options at remote sites (or even go with IKEv1) is not at all a viable option in my humble opinion. CheckPoint peer should be able to negotiate a common ground from available options.

VSX_Bernie · ‎2024-09-18

All right - thanks for sharing.
I already created a TAC case (we are partners) and referred to this CheckMates post.

I would be a bit wary of the proposed solution by your TAC Engineer.
The way I interpret sk114834, is that this is only an issue when using AES-GCM-256 as the phase 2 encryption.

I just inspected three of my GWs experiencing this issue - and they are all configured for AES-CBC-256.
And they all experience the same symptoms that you are describing.

Admittedly - I do not know if the peers are configured with both - I will investigate this.
Have you tried removing GCM on both sides, to verify if it is indeed the issue described in sk114834?

You would have to remove all proposals from the peer sides that contain AES-GCM-256, and only leave the ones with AES-CBC-256.
The article describes it as only for phase 2 - but I would do this for both phases just to be on the safe side.

Then you would change your Phase 2 in the drop-down menu for the community from AES-GCM-256 to AES-256.
AES-256 is AES-CBC-256 - Check Point just does not write the CBC part for some reason.

It could also be that our issues are different and not related.
Admittedly - we have not seen the "No proposal chosen" entries that I know of.

Ours is:
Local: Informational exchange: Exchange failed: timeout reached
Peer: Child SA exchange: Ended with error

It would just be quite the coincidence, seeing as our symptoms is the same, we have all updated to Take 152 or higher, and @CaseyB mitigated the issue by downgrading to Take 150.

I also think that the majority of my peers are actually Palo Alto - maybe that is why I do not get the "No proposal chosen" that you receive from the Cisco's.

I will keep you posted of my findings, and my TAC case (it just got created though, so may not be today they have something for us).

VSX_Bernie · ‎2024-09-18

Hello Again,

Just letting you know that I tried reaching out to two of my peers failing.
None of them had any proposals configured for GCM - only CBC proposals.

Funny thing is - the one peer - a Cisco ASA - we tried to adjust a few things.
Permanent tunnels was configured on this side, and DPD was configured on Cisco ASA side.

I have seen this cause issues earlier, so I asked to remove this.
Then we were actually really close to a re-key - 10 minutes, so we decided to wait it out.

Cisco opted to re-key 1 or 2 minutes beforehand.
CP did not see this at all - it thought the tunnel was still up.

On Cisco side it looked as though phase 1 went fully down.
I think CP did not do anything before phase 2 actually timed out on itself.

A theory could be that the re-key packet from Cisco was dropped in a firewall chain module before it reached the VPN module.

CaseyB · ‎2024-09-18

I have been testing this issue on a pair of backup firewalls that we have, so generating the error is not really a problem for us.

Upgrading to R81.10 JHF 156 did not resolve the issue; however, I did receive a hotfix for JHF 156 that appears to have resolved it. Unfortunately, this test scenario is only Check Point to Check Point, so I cannot validate third-party vendors. No news on if / when this hotfix would make it to a jumbo.

Personally, I won't upgrade past JHF 150 on my production cluster until this has been added to a jumbo, I do not want to be managing hotfixes.

Radu_Ciobanu · ‎2024-09-18

Hi @CaseyB , can you please share some details regarding this "hotfix for JHF 156 that appears to have resolved it" ?

What SK is it related to? Or how can I reference it for TAC to know what I'm talking about?

Thank you for your time.

VSX_Bernie · ‎2024-09-19

Hello @CaseyB,

Thank you so much for sharing this.
I am with @Radu_Ciobanu on this one - could you please share the file name of the .tar file that you received?

Something like "cvpn_HOTFIX_R81_10_JS_THROTTLING_V3_289_MAIN_GA_FULL.tar" (example is a previous hotfix we received).
I bet that would be enough for us to let TAC know what we are on about - if an SK does not exist.

This really would be the best solution for me, as we are dependent on the fix PRJ-52047/PRHF-31811 that was released for Take 152, for this cluster.

Thanks for helping out - it is really appreciated.

VSX_Bernie · ‎2024-09-19

@CaseyB - or better yet - share with us the SR number

CaseyB · ‎2024-09-19

@VSX_Bernie @Radu_Ciobanu @BjornErichsen

SR#6-0003995456
fw1_wrapper_HOTFIX_R81_10_JHF_T156_162_MAIN_GA_FULL.tgz

VSX_Bernie · ‎2024-09-19

Perfect - thank you for this.
I am actually sitting in a Zoom waiting for TAC, so your timing could not be better.

VSX_Bernie · ‎2024-09-19

Received the hotfix, and will install it tonight.
Relying on users to test during tomorrow, and then I will get back to you guys with our results.

the_rock · ‎2024-09-19

Fingers crossed...hope it works!

Are you a member of CheckMates?

R81.10 VPN site-2-site to Cisco C8500-12X IOS XE (not Palo Alto as previously stated)