Re: R80.10 Gateways drops traffic after policy Ins...

Sundar_Ramanath · ‎2017-10-24

Having issues with R80.10 gateways, which are dropping traffic after a policy install. Re-installing the policy again brings everything back to normal. Issue specific to R80.10 gateways, have R77.30's which are working fine. Appreciate any inputs in troubleshooting this further.

Thanks

Danny · ‎2017-10-25

Which R80.10 Jumbo Hotfix do you have installed?

What is the output of: fw ctl zdebug drop | grep <IP of dropped traffic>

What is the output of: fw monitor

Did you use any of the troubleshooting commands our ccc script provides?

Kiran_Naidu1 · ‎2017-10-26

Please check your connection persistence settings of the gateway were you are installing the policy.

Let me know what have you selected from below :
1: keep all the connections
2: keep data connections
3: Rematch all the connections

If Rematch all the connections is enabled, Then select keep all the connections.

and check if are facing any issues.

If it does not work, follow the above procedure given to isolating the issue.

Beverley_Cudd · ‎2018-12-17

We have just upgraded our R77.30 gateways to R80.10 and now whenever we install a policy our AWS and Amazon VPN’s go down and will only come back up if we use ‘vpn tu’ option 7.

any idea why as this has become a big issue as we have 20+ VPN’s running on the gateway in question,

Timothy_Hall · ‎2018-12-18

By default all IKE Phase 1 tunnels are invalidated every time policy is installed which can sometimes cause this behavior, try checking keep_IKE_SAs in the SmartConsole under Global Properties...Advanced...Configure...VPN Advanced Properties...VPN IKE Properties".

--
Second Edition of my "Max Power" Firewall Book
Now Available at http://www.maxpowerfirewalls.com

Gaia 4.18 (R82) Immersion Tips, Tricks, & Best Practices Video Course
Now Available at https://shadowpeak.com/gaia4-18-immersion-course

PBC_Cyber · ‎2019-01-28

I have seen NAT-T cause issues similar to this, might want to try disabling and test your policy push.

Beverley_Cudd · ‎2019-01-29

Thank you all that replied. We did enable keep_IKE_SAs but this didn't fully fix the issue. In the end we had to get the remote peer vpn's reset. This appears to have resolved the issues we had following the upgrade.

Steve_Vandegaer · ‎2019-08-06

Dear,

What do you mean with getting te other peer to reset the VPN? Just have them clear the tunnel?

Kind regards

Steve

Maik · ‎2019-09-10

Hello,

I am currently experiencing the same issue, related to IKE traffic that is sent via the firewall. The related VPN tunnel does not terminate at the firewall, however the tunnel seems to get killed after a policy install. We are running GAIA R80.20, Jumbo Take 47 in VSX cluster VSLS mode [I don't think the software is related to this].

As mentioned in sk103598, Scenario 3, I have tried the first solution, by over writing the global domain settings for ESP and setting a check mark for the option "Keep connections open after the policy has been installed". Still, after a policy install I am seeing the following message via "fw ctl zdebug + drop | grep <related_ip>" (output obfuscated):

[Expert@FIREWALL:2]# fw ctl zdebug + drop | grep 'x.x.x.x'
@;2413646;[vs_2];[tid_0];[fw4_0];fw_log_drop_ex: Packet proto=50 y.y.y.y:57794 -> x.x.x.x:17788 dropped by fw_handle_old_conn_recovery Reason: Other protocol packet that belongs to an old connection;
@;94946841;[kern];[tid_24];[SIM-206973856];simi_reorder_enqueue_packet: reached the limit of maximum enqueued packets for conn:<y.y.y.y,0,x.x.x.x,0,50>, fw_key:<x.x.x.x,0,y.y.y.y,0,50> !;</x.x.x.x,0,y.y.y.y,0,50></y.y.y.y,0,x.x.x.x,0,50>

Now I am thinking about the second solution described in the mentioned SK;

"2) Allow all traffic to persist past a policy push. Open the firewall (cluster) objects properties, expand Advanced and select Connection Persistence. Select "Keep all connections"."

Here my question is: What exactly is the outcome of this change? Are old connections allowed even after the related rule which allowed them before has been deleted? Or is the security policy still checked and when the rule that allowed a specific connection has not been touched the connection is kept?

Thanks for any reply.

Regards,

Maik

Timothy_Hall · ‎2019-09-15

Setting "Keep all connections" disables the rematching of connections that is normally performed every time a policy is installed. When the rematch is disabled, existing connections at the time of policy install will be allowed to continue even if the new policy would not allow a "new" connection with those same attributes. If enabled the rematch ensures existing connections that would not be allowed by the newly-installed policy are immediately killed. If a firewall is overloaded, the rematch operation can cause a sharp prolonged rise in latency or even packet loss; as such disabling rematch can keep traffic disruptions from occurring during policy install on an overloaded firewall.

Gaia 4.18 (R82) Immersion Tips, Tricks, & Best Practices Video Course
Now Available at https://shadowpeak.com/gaia4-18-immersion-course

Maik · ‎2019-09-16

Hey Tim,

Thanks for your reply. Seems like my assumption was correct in that case. Still I am wondering why "sk103598 => Scenario 3 => Solution 1" does not work in my case:

1) Select only this type of traffic to persist through a policy push. Open the TCP Service for the packet that is being dropped, and select "Keep connections open after policy has been installed".

Even with this setting my ESP packets are getting dropped when I initiate a policy install. I talked to a few colleagues of mine who told me that this odd behaviour is kinda common with Check Point and that only the allowance for all traffic to persist past a policy push does help in this case. Seems to be an issue only related to IPsec tunnels that pass the firewall, where the actual "persistence setting" in the IKE service object does not help. (The solution mentions "TCP Service" objects - does this mean it only works for TCP based services...?) One colleague told me that he assumes this behaviour is related to CPU issues, CoreXL to be precise. Unfortunately I was not able to check the load on the specific cores, nor the actual "setup" with workers and SND. Will make sure to check that when I'm on-site again.

Regards

Ilya_Yusupov · ‎2019-09-16

Hi Maik,

please install latest JHF of R80.20 as i think you hit the same issue described in the following SK:

https://supportcenter.checkpoint.com/supportcenter/portal?eventSubmit_doGoviewsolutiondetails=&solut...

which was fixed on latest JHF.

please update us if the issue resolved.

Thanks,

Ilya

Timothy_Hall · ‎2019-09-16

If the firewall through which the VPN is transiting is performing any kind of NAT, the two endpoints will detect this and double encapsulate the VPN traffic. This is called NAT Traversal (service IKE_NAT_TRAVERSAL) which utilizes UDP/4500 wrapped around ESP, have you set "keep" for this service too?

Disabling Connection Persistence rematch does significantly reduce the firewall CPU load during policy install which may help with various problems encountered during policy load.

Gaia 4.18 (R82) Immersion Tips, Tricks, & Best Practices Video Course
Now Available at https://shadowpeak.com/gaia4-18-immersion-course

Igor_Prokopinsk · ‎2019-09-16

Have seen the simi_reorder queue drops for NAT-T vpn traffic before, where tunnels were being dropped after policy pushes. Resolved it by disabling that udp queue all together. You can try it on the fly and see if it fixes your drops:

#fw ctl set int simi_reorder_hold_udp_on_f2v 0 -a

If that does the trick you can just make that change permanent:

# echo simi_reorder_hold_udp_on_f2v=0 > $PPKDIR/conf/simkern.conf

If you already have something in that simkern.conf file, do >> append instead of >

If that doesn't fix the issue, undo the on-the fly change re-running the same command above and changing 0 to 1, and obviously don't do the "make it permanent" step.

Timothy_Hall · ‎2019-09-16

Yeah not sure why the firewall would attempt to reorder UDP packets by default in the first place, since by its very nature the UDP protocol does not assume or frankly care that datagrams will always be delivered in order.

Gaia 4.18 (R82) Immersion Tips, Tricks, & Best Practices Video Course
Now Available at https://shadowpeak.com/gaia4-18-immersion-course

Maik · ‎2019-09-19

Thanks for your replies.

Changing the setting for all related service objects [NAT-T in particular] did not change the behaviour. Not sure if sk148432 is matching in this case as couldn't find the message

"simi_reorder_enqueue_packet: reached the limit of maximum enqueued packets for conn"

... in the /var/log/messages file [checking shortly after iniating a policy install]. Nevertheless I'm not allowed to install a jumbo greater than take 47 at this time as we have a special hotfix installed as well - one that is only compatible with take 47. As soon as I will be able to use "standard" releases again I will test the latest take and verify if the behaviour is changing.

For now changing the "global" settings in the firewall object itself did the trick.

Are you a member of CheckMates?

R80.10 Gateways drops traffic after policy Install