Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
WesEvernden
Participant

Stuck entry in the FW state table

Hi,

We have a problem where we get a connection stuck FW state table that we can get rid of. I am looking for ideas how deal with this scenario.

The problem is between a NFS Client and the NFS Server. It occurs when we have some kind of network issue, like a switch reboot during a change window, or last week it was an IPS failing closed for 2 hours.

What we end up with is connection in the state table in state: DST_FIN.  The order of events that get us into this problem  aren’t 100% clear, aside from the network failure, but regardless, the connection in DST_FIN state is blocking NFS from recovering so I have focused on that.

This is the setup:  Client, the Server with two FWs between them, both running R80.30. I'll use these names: Client --> FW_C --> FW_S --> Server.

After the network outage the Client has no connection to the Server and continuously tries to reconnect to the server using the same source port as the original connection, port 1023. The server is listening on port 2049. So, the Client is sending SYN, SYN, SYN, SYN, SYN, SYN, RST … start over.

Those packets arrive at the FW_C which has this state table entry: inbound, src=[Client,1023], dest=[Server,2049],  3600/3605, state=DST_FIN.

Here are my observations about happens as SYNs and the RST packets arrive at FW_C from the Client.

  1. The SYN is changed to an ACK (Smart Connection Reuse) and sent on. Confirmed with packet captures. The ACK arrives at FW_S which drops it as out of state, FW_S doesn’t send an RST when it drops the ACK.
  2. The SYN, or the newly minted ACK, is resetting the idle timer on the connection. Confirmed with multiple state table dumps, when haven’t seen the idle time drop below 60s
  3. The RST sent after six SYNs does not cause connection table entry to be removed.

The pattern continues, last time for days.

I am thinking disable Smart Connection Reuse is the best option. Having said that we haven’t done that before and can’t really describe what the downsides of that might be in our environment.

Thoughts?

Thanks,

-Wes

0 Kudos
4 Replies
Wolfgang
Mentor
Mentor

@WesEvernden  having two firewalls between source and destination and SmartConenctionReuse enabled will be sometime very tricky to handle. In the past we had such a problem with communication between two webproxies. Changing SYN to a ACK  on  gatewayA was followed by an drop on gatewayB with "out of state". This is normal behaviour., it is how it works.

The connection never timed out on gatewayA because the source proxy tried to fast to reestablish a connection on the same source and destination ports.

We checked more then once the timout values on both gateways for tcp connection states (start timeout, end timeout, session timeout etc. and too the timeouts of the used service objects of the matching rules) They must be equal on both gateways. Additional we disabled Smart Connection reuse for port TCP/8080, this was our communication port for the proxy<=>proxy connection  

Be aware you have to set these values twice kernel and SIM, follow instructions "Smart Connection Reuse" feature modifies some SYN packets 

WesEvernden
Participant

Thanks for the feedback Wolfgang. 

0 Kudos
Alexander_Wilke
Collaborator

Hi,

the problem here is not that the client ist trying to re-establisheing the connection to fast. The problem ist that the Firewall FW_C ist resetting the idle timeout back to 3600/3600 after every SYN of the client. This is not correct

 

If SmartConnectionReuse Feature is modifing a SYN to an ACK this is happening because it wants to check if this SYN is valid or not. But is only working if the server responds. Depending on the server response the firewall can do a decision:

a) delete the existing connection and allow a new connection based on same src.port

b) keep the existing session and do not allow to establish a new connection

This is totally fine and secure and working BUT ONLY if the server responds.

 

If the server is not able to respond because any device in between is dropping the packets then firewall can not do any decision and must for securty reasons keep the existing session.

Now there ist the problem.

The SYN to ACK conversion resets the Idle Timeout on the existing session and this is wrong. SYN to ACK conversion feature is to check with the server if the connection is valid if the client is valid. If there is no response the firewall MUST NOT touch anything of the existing connection.

 

Unfortunately the firewall resets the Idle every time a SYN is converted to an ACK back to the confiured default idle timeout of 3600/3600. The Firewall must ignore the "SYN to ACK" packet for Idle timeout reset. The timeout must be as it is as long as there is a valid, verified packet for the existing connection. 

 

This will result in that the session in the firewall will timeout in a regular way until idle reached 0/3600. Then the client can re-establish the connection.

 

Whatever the client is doing or an attacker and whatever the server is doing or something in between, the Firewall MUST NOT touch an existing connection if the packet it receives is not fully verifierd and valid for this existing session.

 

TL;DR

If SmartConnectionReuse Feature is converting a SYN to an ACK packet to verify if the SYN is valid or not the SmartConnectionReuse Featuzre MUST NOT modify the existing State Table entry Idle Timeout.

0 Kudos
WesEvernden
Participant

Thanks Alexander. I agree completely. The idle timeout should not be reset.

0 Kudos