Hi,
We have a problem where we get a connection stuck FW state table that we can get rid of. I am looking for ideas how deal with this scenario.
The problem is between a NFS Client and the NFS Server. It occurs when we have some kind of network issue, like a switch reboot during a change window, or last week it was an IPS failing closed for 2 hours.
What we end up with is connection in the state table in state: DST_FIN. The order of events that get us into this problem aren’t 100% clear, aside from the network failure, but regardless, the connection in DST_FIN state is blocking NFS from recovering so I have focused on that.
This is the setup: Client, the Server with two FWs between them, both running R80.30. I'll use these names: Client --> FW_C --> FW_S --> Server.
After the network outage the Client has no connection to the Server and continuously tries to reconnect to the server using the same source port as the original connection, port 1023. The server is listening on port 2049. So, the Client is sending SYN, SYN, SYN, SYN, SYN, SYN, RST … start over.
Those packets arrive at the FW_C which has this state table entry: inbound, src=[Client,1023], dest=[Server,2049], 3600/3605, state=DST_FIN.
Here are my observations about happens as SYNs and the RST packets arrive at FW_C from the Client.
- The SYN is changed to an ACK (Smart Connection Reuse) and sent on. Confirmed with packet captures. The ACK arrives at FW_S which drops it as out of state, FW_S doesn’t send an RST when it drops the ACK.
- The SYN, or the newly minted ACK, is resetting the idle timer on the connection. Confirmed with multiple state table dumps, when haven’t seen the idle time drop below 60s
- The RST sent after six SYNs does not cause connection table entry to be removed.
The pattern continues, last time for days.
I am thinking disable Smart Connection Reuse is the best option. Having said that we haven’t done that before and can’t really describe what the downsides of that might be in our environment.
Thoughts?
Thanks,
-Wes