Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Albert_Chang
Explorer

Packets from IPSec tunnel were dropped. It seems there is an issue on the coreXL connections table

Our security gateway sometimes drops packets from IPSec tunnel. The workaround is usually to reinstall policy and the issue will be fixed for a few days.

By using the "fw ctl zdebug drop" to capture the drop message, it says "failed to resolve SA (VPN Error code 01)".

But in the kernel debug, it looks like it cannot find the connection in the connections table.

Has anyone encounter similar issue and has a solution? Thanks in advance!

 

;20Jun2019  3:30:27.466084;[cpu_1];[fw4_2];fwconn_lookup: not found in connections table; 

;20Jun2019  3:30:27.466088;[cpu_1];[fw4_2];forward_if_not_mine: forwarded to another instance (rc=0); 

....

;20Jun2019  3:30:27.466102;[cpu_1];[fw4_2];fwconn_key_lookup_ex: conn

10.13.1.29:0 IPP 10,0,0,0,0,UUID: 00000000-0000-0000-00-0-0-0-0-0-0-0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0> 

 

not found in connections table; 

.....

;20Jun2019  3:30:27.466268;[cpu_1];[fw4_2];fwconn_key_lookup_ex: conn

172.28.0.126:15 IPP 10,0,0,0,0,UUID: 00000000-0000-0000-00-0-0-0-0-0-0-0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0> 

 

not found in connections table; 

;20Jun2019  3:30:27.466282;[cpu_1];[fw4_2]; 

  

vpnk_conn_log: in the kernel  - calling fwchainlog_delayed_rulebase_log with alert -1 ; 

;20Jun2019  3:30:27.466284;[cpu_1];[fw4_2]; 

action = 0  

schemename = IKE  

user =  

methods = ESP: AES-256 + SHA384 + PFS (group 2)  

fail_reason = Encryption/Decryption failure, failed to resolve SA (VPN Error code 01)  

xpo_loghandle = 0 

community_loghandle = 0 

  

0 Kudos
16 Replies
PhoneBoy
Admin
Admin

Have you opened a TAC case on this by chance?
0 Kudos
Albert_Chang
Explorer

Yes, I have opened several cases for this issue in the past. The last one I opened is SR# 6-0001657403. Solutions provided included install Jumbo takes, adjust IKE connections and others. But none of these solved the issue. When the issue happen, the pepd process runs high cpu usage. I am not sure which one is the cause and which is the effect.
0 Kudos
Ryan_Ryan
Advisor

Did you ever get a solution to this? We have the exact same problem on R80.20

 

Followed sk122532 which did not solve.

0 Kudos
Timothy_Hall
Legend Legend
Legend

Assuming you have at least Jumbo HFA 47 installed, try disabling SecureXL acceleration for VPN with the vpn accel off command.  Note that doing so will cause a disruption of all current VPN tunnels, read sk151114: "fwaccel off" does not affect disabling acceleration of VPN tunnels in R80.20 and above thoroughly before doing anything.

 

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
0 Kudos
Ryan_Ryan
Advisor

Thanks, yes we are JHF 47.

 

I will give this a try, unfortunately it is a bit difficult to test if it was successful, we have the issue happen only once every 2 weeks or so. I like doing it on the basis of per peer, as only 2 route based vpn's are affected where-as my dozen policy based vpn's have never had a hitch in years.

0 Kudos
Mrigen_Sane
Explorer

Hello  Ryan,

                      Did someone provided any update, as we are receiving the exact same issue on are checkpoint G/Ws running on R80.10 Take 272.

vpn_drop_and_log Reason: Encryption/Decryption failure, failed to resolve SA (VPN Error code 01);

vpn_encrypt_chain Reason: Could not change connection vpn interface.;

And we have static route-based VPN tunnels integrated with  AWS.

Regards

Mrigen

0 Kudos
Ryan_Ryan
Advisor

Hi Timothy,

 

Unfortunately turning VPN accel off has not solved the issue. I performed that change for two peer IP's last week but we had another re-occurrence of the issue after that, 

TAC has not been able to assist, just told me to try my luck with the latest Jumbo. (I will patch the gateways so they continue to investigate).

 

I have a debug taken from when the issue occurred if you are interested to take a look?

 

One new message I hadn't noticed before was this:

dropped by vpn_encrypt_chain Reason: Could not change connection vpn interface.;

that is showing for every session

0 Kudos
Timothy_Hall
Legend Legend
Legend

Is this a route-based VPN using VTI's?  If so check this out:

sk119143: "encryption fail reason: Cannot change dynamic vpn interface - new interface not accepted ...

 

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
0 Kudos
Ryan_Ryan
Advisor

Hello, yes its route based with VTI's (static routing only though)

 

Interestingly, i did have to delete and re-create the interfaces for a separate reason (and did the JHF 91) and have not had a reoccurance of the issue.. so far 🙂

 

0 Kudos
Ryan_Ryan
Advisor

Should give an update, issue still reoccurs once a week roughly. Previously it was every day.

 

The only notable thing with this vpn is, remote side gets switched off every night to save money on azure. The live ones that don't get switched off have never once had this issue.

 

@Albert_Chang did you ever get a fix for this?

0 Kudos
Timothy_Hall
Legend Legend
Legend

OK so if turning off SecureXL had no effect it is not an issue with IPSec, but IKE.  Interoperable VPNs have had a longstanding problem with not handling "Delete SA" notifications correctly when one side of the tunnel goes down prior to the SA Lifetime expiration.  See if you have any interesting error messages getting logged to $FWDIR/log/vpnd.elg around the time of the issue.

Looks like Azure supports DPD in a route-based configuration, enabling that would be the best way to deal with this issue.  See sk108600: VPN Site-to-Site with 3rd party, scenario 5, be sure to enable DPD on the Azure end too.  Alternatively you could try to significantly shorten up your IKE Phase 1 and Phase 2 SA Lifetimes on both ends so it detects the problem quicker and recovers from it.

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
0 Kudos
Ryan_Ryan
Advisor

Thanks for reply, that is not a bad idea, I had tried the other method and made the lifetimes the longest supported values which seemed to make it a bit better but still had occurrences weekly.

 

I've turned it down to 10 minute p1 and 5 min p2. Will see if that helps. If not ill take a look into DPD.

 

cheers

 

0 Kudos
Ryan_Ryan
Advisor

Unfortunately, 5 minute lifetime has not solved the issue. tunnel still went down and doesn't come back up until a policy push is completed.

 

Ill look into DPD. 

 

 

0 Kudos
Albert_Chang
Explorer

Hi Ryan,

We had DPD enabled but that did not fix the issue. We are still working with Checkpoint support for investigating the issue.

0 Kudos
Timothy_Hall
Legend Legend
Legend

Bummer, sounds like you may be in bug territory now.

 

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
0 Kudos
Andrew_Tonna
Explorer

I had the same error (encryption/decryption failure, failed to resolve sa (vpn error code 01))... and it does seem to be related to a Checkpoint bug.

I can get the VPN tunnel up and running again by just publishing any change associated with the gateway and installing the policy. In my case I change the vpn interface description.

0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events