NFS issues after upgrade to R81.20

Martijn · ‎2023-11-29

Hi all,

One of our customers has a VSX cluster running R81.20 Take 26.
Since the upgrade from R81.10 Take 95 they have issues with NFS traffic through the firewall.
This traffic is going through two VS's on the same VSX cluster. 16200 hardware.

In the Data Center there is a storage environment where a lot of VM's get their data from.
The VM's are Jenkins servers they use for deploying and automating projects.
These Jenkins servers rely on the NFS share and when the NFS share is not there, the deployment stops untill the NFS is back.

This has been working OK on R81.10 for two years, but on R81.20 we suddenly see a lot of out-of-state drops for NFS traffic.
One of the ways to solve this, is by restarting the Jenkins servers so they initiate a new connection to the storage.

In order to investigate an solve the issue, the customer has done the following.

1.
Create a custom TCP and UDP service for NFS and configured the time-out to 24 hours and disable Agressive aging for those services.
This was OK for some VM's, but not all of them.

2.
Increased the connection limit on the affected VS's because they where getting close to their limits.

3.
Enable fast accell for port 2049 (NFS).

4.
Disabled Smart Connection Re-use, but this made things worse. So we enabled it again.

5.
Performed a fail-over the other VSX cluster member for both VS's.

6.
Excluded NFS from Threat Prevetion.

All steps above did not solve the issues. We do not see an increase in load on the system. CPU and memory are normal.
Even now we are seeing dropped out-of-state packets for NFS, but at the moment no issues are reported.

In the end, the customer created an extra interface on the storage environment so it is in the same network as the VM's. Bypassing the firewall's.

The customer has a strict security policy and it is not allowd to share site related data to Check Point support.
So our first step is to try the get the help from the CheckMates community.

Why was this working in R81.10 and is causing problems in R81.20? Is there something changed in the code for handling NFS traffic?

Any help is appriciated. Thanks.

Martijn

G_W_Albrecht · ‎2023-11-29

I would open a SR# with TAC - i can not imagine that you will get very much help here...

CCSP - CCSE / CCTE / CTPS / CCME / CCSM Elite / SMB Specialist

Chris_Atkinson · ‎2023-11-29

Note sk60768 may provide a workaround rather than restarting servers to force new connections.

CCSM R77/R80/ELITE

Tobias_Moritz · ‎2023-11-29

You really tried a lot already to narrow it down, but I did not read anything about connection persistance. Just in case the connection persistance setting on relevant gateways is not on "keep all connections": Have you tried to enable "keep connections open after the policy has been installed" on the service objects you use?

Otherwise, the way how "rematch connections after policy installation" is implemented can break client-server connections.

I have to admit, that I do not really think this is your problem, because nothing really changed on that from R81.10 to R81.20. But as you already tried so much, is just one more thing to try.

If no one comes up with a great idea, it may be needed to capture traffic (at least layer 4 headers) at all tree points (client, server and between these two VS (sk167462)) in a ring buffer. Stop that capture, as soon you see out-of-state-drops and then check the captures to understand, why this VS is treating it as out-of-state.

Example:

Some years ago, I had a TCP problem that could be tracked down this way.

Connection is established and packets are exchanged without problems.
Policy was installed. Relevant rule was not changed. Connection persistance was set to "rematch connections".
Gateway removed that connection from its connection table / marked it as canditate for a connection rematch.
Server closed connection after a server side timeout with RST ACK.
Gateway was dropping that packet, because server to client packets are not eligible to reinstate a connection marked as canditate for rematch.
Client sending a another packet for that connection, because it does not know, that the server has closed it.
Gateway reinstate the connection back into live connection table and forward that packet to server.
Server has no established connection anymore and drops that packet (no RST send from server now).
Client resend its packet until it finally gives up.
Application logic did not reopen a new connection and application failed.

Solutions:

Enable keep-alives on client-side or
Enable RST packets on server side for out-of-state packets or
enable "keep connection after policy install" on service or gateways level

CheckPointerXL · ‎2023-11-30

keep all connections it should impact/improve only UDP connections, right?

the_rock · ‎2023-11-30

It applies to any sort of connection.

Best regards,

Andy

Best,
Andy
"Have a great day and if its not, change it"

the_rock · ‎2023-11-29

Im with @Tobias_Moritz on "keep all connections" setting...that definitely may help.

Best regards,

Andy

Best,
Andy
"Have a great day and if its not, change it"

CheckPointerXL · ‎2023-11-30

which TCP flag for out-of-state drops? did you try to disable the dropping for OOS packets?

Tobias_Moritz · ‎2023-11-30

From a neighbor thread:

https://community.checkpoint.com/t5/General-Topics/R81-20-T26-Traffic-disruption-during-policy-insta...

Author wrote:

lot of problem raised up:

- during policy installation it happens that telnet/vnc sessions are dropped

- randomly some connections are dropped during the day

- first packet isn't syn is on fire (TCP Flag is ACK), no asymmetric routing

- "keep all connections" is flagged

Martijn wrote:

Had one customer on R81.20 take 24 with this issue. VDI connections got disconnected during a policy installation.

We got a custom fix for this problem we needed to install on top of take 26.
Installed take 26 and the custom fix and problem was solved.

From support we got this fix ID: PRHF-31092. Fix will be included in future HFA's. ETA unknown.

Maybe this is also something to look into.

PS:

@CheckPointerXL: "keep all connections it should impact/improve only UDP connections, right?" --> No, not only UDP.

Timothy_Hall · ‎2023-11-30

You may want to consider enabling TCP state logging which will log information about why a particular connection is considered "ended" by the firewall: sk101221: TCP state logging

However the extra data it logs is a bit tricky to interpret, see here for guidance: tcp state logging and SecureXL

New Book: "Max Power 2026" Coming Soon
Check Point Firewall Performance Optimization

AaronCP · ‎2023-11-30

Hey @Martijn,

When you say the traffic goes through two VS's - are they two separate firewalls?

I experienced a similar NFS issue with a previous company when NFS traffic traversed two gateways, although they weren't VS's and were running R80.40 at the time.

See the response from Alexander_Wilke on https://community.checkpoint.com/t5/Maestro/NFS-Issues/m-p/165151/highlight/true#M1295 that explains the issue very well. It appears it was a bug in Smart Connection Reuse feature. Might be worth checking if PRHF-26493 is included in your current HFA version?

Are you a member of CheckMates?

NFS issues after upgrade to R81.20