Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Mark_Shaw
Explorer

NFS Issues

Does Checkpoint and Unix NFS mounts generally not work well together?

 

Reason I ask is we continually see issues if we have to failover or reboot our firewalls that it breaks NFS connections between hosts and we have to manually clear the connection table, which involves converting decimal to HEX, etc so not ideal. This has been happening with our on-prem firewalls that are running Maestro and R80.20SP for a while. Well I say a while the issues disappeared and has now reared its head again. We have a case open with TAC and they have a load of debug data.

 

Now the issues have occurred with our Azure IaaS Firewalls, we had a memory spike yesterday which we think is down to us under spec’ing the VM’s but it broke some NFS connectivity and last night when we added more memory to the VM’s and installed the latest on going HFA it broke NFS again when failing the firewalls over. Again we had to manually clear the connections table.

It just seems strange that we have seen the issue on two different firewall architectures

On-Prem – Running Maestro and R80.20SP latest HFA

Azure IaaS – Running a IaaS HA Cluster on R81 latest ongoing HFA (take 29).

0 Kudos
2 Replies
Timothy_Hall
Champion
Champion

Any chance the associated RPC protocol is using TCP for the transport instead of the standard UDP?  See here:

sk44982: TCP based RPC traffic is dropped by the Security Gateway.

Also what is the L4 protocol in use for NFS itself, TCP or UDP?  UDP itself is stateless of course, but the firewall still tracks "state" for that traffic for purposes of stateful inspection.  The Virtual Timeout enforced for NFS over UDP is quite short compared to TCP and may be part of your problem.  If possible can you check the L4 transport for your NFS mounts and see if there is any difference in the problematic behavior between TCP and UDP mounts?  If only UDP is in use, can you create a test mount with TCP and see if it experiences the same issues?

The fact that this seems to happen upon failover suggests that not everything is being sync'ed between the members; this is expected behavior to some degree as any inspection operations occurring in process-based security servers on the firewall are generally not synced between the cluster members. 

Threat Prevention does perform some of its operations in security server processes, whose inspection state will be lost upon failover.  So the next thing I would suggest if you are using any of the 5 TP blades is to create a "null" TP profile rule at the top of your TP policy (the null profile has all 5 TP blades unchecked), and apply it to all traffic to and from your NFS server in the Protected Scope.  Once that is in place see if the failover situation for NFS traffic improves, if it does you may need to tweak your TP policy and how it handles NFS traffic as leaving the null profile in place indefinitely is not a good idea.

"Max Capture: Know Your Packets" Video Series
now available at http://www.maxpowerfirewalls.com
Mark_Shaw
Explorer

Hi Timothy

Many thanks for you reply.

All of the NFS mounts are using TCP/2049 for the connections.

Tonight we implemented the null TP policy with the NFS servers in the protected scope. We performed a failover by issuing clusterxl admin down and everything was fine when we failed to/from the primary gateway.

We then performed a hard reboot of the primary firewall and this is when things broke again, NFS mounts for three hosts stopped working. We had to convert IP to hex and dec to hex to clear the connections table.

Any further ideas? Not getting anywhere quickly with TAC and these Unix guys are wanting me to look at Azure firewalls instead.

0 Kudos