Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
marki
Contributor

Session won't establish "SYN packet on established connection‎"

Hello,

I have a strange issue here.[tm]

So we have a flow traversing several firewalls, looks like this:

Client -> FG1 -> CheckpointGaia -> FG2 -> Server

FG2 reported dropping packet on ingress because of ACK set without an established connection.

We found this is due to sk24960 "Smart Connection Reuse" on Checkpoint Gaia which modifies SYN to become ACK if it thinks the session is already established.

After disabling "Smart Connection Reuse" we get "SYN packet on established connection‎" in Smartview Tracker.

The client state at that point is
SYN-SENT 0 1 172.31.3.127:55962 172.22.160.1:443 users:(("java",pid=15151,fd=131)) timer:(on,1.356ms,2)

To find out more, I was logging the connections table in a loop every few minutes:
fw tab -t connections -f | grep 172.31.3.127

But a connection like that is nowhere to be found around the time of the error.

Note acceleration is switched off for the time being (fwaccel off)

How do I get to the bottom of this?
Could the connections table just be corrupt and the entire cluster need a reboot?

Thanks.

UPDATE "fw ctl conntab" seems to show the culprit as opposed to fw tab -f connections

# fw ctl conntab | grep 172.31.3.127 | grep 53932
<(inbound, src=[172.31.3.127,53932], dest=[172.22.160.1,443], TCP); 14158/14400, rule=73, tcp state=DST_FIN, service=649, Ifncin=55, Ifncout=55, Ifnsin=43, Ifnsout=43, conn modules: Authentication, FG-1>

UPDATE 2 More findings.

tshark on client 172.31.3.127 using filter "host 172.22.160.1 and port 443"

....
15:00:06.638692144 172.22.160.1 → 172.31.3.127 TCP 1514 443 → 55894 [ACK] Seq=72429 Ack=27310 Win=88448 Len=1460 [TCP segment of a reassembled PDU]
15:00:06.638747232 172.22.160.1 → 172.31.3.127 TCP 4434 443 → 55894 [ACK] Seq=73889 Ack=27310 Win=88448 Len=4380 [TCP segment of a reassembled PDU]
15:00:06.638797565 172.22.160.1 → 172.31.3.127 TLSv1.2 5894 Application Data [TCP segment of a reassembled PDU]
15:00:06.638842896 172.22.160.1 → 172.31.3.127 TCP 1514 443 → 55894 [ACK] Seq=84109 Ack=27310 Win=88448 Len=1460 [TCP segment of a reassembled PDU]
15:00:06.638892434 172.22.160.1 → 172.31.3.127 TCP 1514 443 → 55894 [ACK] Seq=85569 Ack=27310 Win=88448 Len=1460 [TCP segment of a reassembled PDU]
15:00:06.658279721 172.22.160.1 → 172.31.3.127 TLSv1.2 1514 Application Data [TCP segment of a reassembled PDU]
15:00:06.669679081 172.31.3.127 → 172.22.160.1 TCP 54 55894 → 443 [ACK] Seq=27310 Ack=88489 Win=130432 Len=0
15:00:06.670310044 172.22.160.1 → 172.31.3.127 TLSv1.2 1239 Application Data, Encrypted Alert
15:00:06.711199624 172.31.3.127 → 172.22.160.1 TCP 54 55894 → 443 [ACK] Seq=27310 Ack=89675 Win=129408 Len=0

Client (172.31.3.127) is in CLOSE-WAIT since after 15:00:01 and before 15:01:01

15:51:05.333458718 172.31.3.127 → 172.22.160.1 TCP 54 55894 → 443 [RST, ACK] Seq=1 Ack=1 Win=1171 Len=0
18:13:53.745065284 172.31.3.127 → 172.22.160.1 TCP 74 55894 → 443 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=2623469574 TSecr=0 WS=128
18:13:54.767213091 172.31.3.127 → 172.22.160.1 TCP 74 [TCP Retransmission] 55894 → 443 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=2623470596 TSecr=0 WS=128
18:13:56.783213562 172.31.3.127 → 172.22.160.1 TCP 74 [TCP Retransmission] 55894 → 443 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=2623472612 TSecr=0 WS=128
18:14:01.039219839 172.31.3.127 → 172.22.160.1 TCP 74 [TCP Retransmission] 55894 → 443 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=2623476867 TSecr=0 WS=128
18:14:09.231207734 172.31.3.127 → 172.22.160.1 TCP 74 [TCP Retransmission] 55894 → 443 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=2623485059 TSecr=0 WS=128
18:14:25.359233329 172.31.3.127 → 172.22.160.1 TCP 74 [TCP Retransmission] 55894 → 443 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=2623501186 TSecr=0 WS=128

So Gaia thinks this is an existing session and won't open it since our TCP session timeout is 14400 seconds instead of the default 3600 seconds. But what about the RST,ACK at 15:51 ...? Argh

0 Kudos
5 Replies
PhoneBoy
Admin
Admin

14400 is a non default timer, which means you overrode the default for that service OR you've set the timeout for that service to be higher than default.
Is the client changing the TCP source port on each new connection?

In any case, I think a TAC case might be necessary to get to the bottom of this.

0 Kudos
marki
Contributor

That's right, it's not the default setting. But it's not the main issue, while it certainly contributes to the problem.

We are still investigating this internally. I suspect something with the connection closure on either client or server-side going wrong.

Which leads to the FW thinking a session long gone is still open hence refusing a new session on this port. Our teams tell me this is happening when "a lot is going on". Which leads me to my assumption that somehow they don't properly close their connections on either client or server side and when they come to reuse the same port a few hours later this happens.

I'll continue with collecting more info: 1) Client side tcpdump 2) server side tcpdump 3) cronjob on the fw dumping the sessions and their state.

Which leads me to an intermediate question: Why does "fwaccel conns" show sessions when acceleration is currently disabled, i.e. "fwaccel stat" gives

# fwaccel stat
+---------------------------------------------------------------------------------+
|Id|Name |Status |Interfaces |Features |
+---------------------------------------------------------------------------------+
|0 |SND |disabled |eth1,eth5,eth2,eth6,eth3,|
| | | |eth7,eth4,Sync,Mgmt |Acceleration,Cryptography |
| | | | |Crypto: Tunnel,UDPEncap,MD5, |
| | | | |SHA1,NULL,3DES,DES,AES-128, |
| | | | |AES-256,ESP,LinkSelection, |
| | | | |DynamicVPN,NatTraversal, |
| | | | |AES-XCBC,SHA256,SHA384 |
+---------------------------------------------------------------------------------+

Accept Templates : enabled
Drop Templates : disabled
NAT Templates : enabled

 

 What would be the correct way to retrieve the actual sessions?

0 Kudos
PhoneBoy
Admin
Admin

Because in R80.20 and above, fwaccel off only disables templating for new connections.
There is no way to entirely disable SecureXL any longer.

0 Kudos
WesEvernden
Participant

Perhaps sk174323 can help in your situation. We have a similiar problem

0 Kudos
marki
Contributor

Thanks.

I forgot to post our diagnosis. 😉

* Server wants to close connection (FIN,(PSH,),ACK)
* Client is now in state CLOSE-WAIT (Half-closed connection)
* Gaia is in state DST-FIN (destination has closed connection). Timeout is 14400 secs.
* FG1 is in state FIN-WAIT for 120 secs.
* Client only actually closes the connection (sends FIN) after +/- 30 minutes.
* Gaia does not see this because FG1 timeout is 120 secs (less than 30 minutes) so it drops the FIN.
* Connection remains in Gaia connection table for 14400 secs = 4 hrs.
* The next connection within an interval of 4 hrs using that source port gets dropped by Gaia.

Problem was the application on the client which needed to be patched.

0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events