Hello,
I have a strange issue here.[tm]
So we have a flow traversing several firewalls, looks like this:
Client -> FG1 -> CheckpointGaia -> FG2 -> Server
FG2 reported dropping packet on ingress because of ACK set without an established connection.
We found this is due to sk24960 "Smart Connection Reuse" on Checkpoint Gaia which modifies SYN to become ACK if it thinks the session is already established.
After disabling "Smart Connection Reuse" we get "SYN packet on established connection" in Smartview Tracker.
The client state at that point is
SYN-SENT 0 1 172.31.3.127:55962 172.22.160.1:443 users:(("java",pid=15151,fd=131)) timer:(on,1.356ms,2)
To find out more, I was logging the connections table in a loop every few minutes:
fw tab -t connections -f | grep 172.31.3.127
But a connection like that is nowhere to be found around the time of the error.
Note acceleration is switched off for the time being (fwaccel off)
How do I get to the bottom of this?
Could the connections table just be corrupt and the entire cluster need a reboot?
Thanks.
UPDATE "fw ctl conntab" seems to show the culprit as opposed to fw tab -f connections
# fw ctl conntab | grep 172.31.3.127 | grep 53932
<(inbound, src=[172.31.3.127,53932], dest=[172.22.160.1,443], TCP); 14158/14400, rule=73, tcp state=DST_FIN, service=649, Ifncin=55, Ifncout=55, Ifnsin=43, Ifnsout=43, conn modules: Authentication, FG-1>
UPDATE 2 More findings.
tshark on client 172.31.3.127 using filter "host 172.22.160.1 and port 443"
....
15:00:06.638692144 172.22.160.1 → 172.31.3.127 TCP 1514 443 → 55894 [ACK] Seq=72429 Ack=27310 Win=88448 Len=1460 [TCP segment of a reassembled PDU]
15:00:06.638747232 172.22.160.1 → 172.31.3.127 TCP 4434 443 → 55894 [ACK] Seq=73889 Ack=27310 Win=88448 Len=4380 [TCP segment of a reassembled PDU]
15:00:06.638797565 172.22.160.1 → 172.31.3.127 TLSv1.2 5894 Application Data [TCP segment of a reassembled PDU]
15:00:06.638842896 172.22.160.1 → 172.31.3.127 TCP 1514 443 → 55894 [ACK] Seq=84109 Ack=27310 Win=88448 Len=1460 [TCP segment of a reassembled PDU]
15:00:06.638892434 172.22.160.1 → 172.31.3.127 TCP 1514 443 → 55894 [ACK] Seq=85569 Ack=27310 Win=88448 Len=1460 [TCP segment of a reassembled PDU]
15:00:06.658279721 172.22.160.1 → 172.31.3.127 TLSv1.2 1514 Application Data [TCP segment of a reassembled PDU]
15:00:06.669679081 172.31.3.127 → 172.22.160.1 TCP 54 55894 → 443 [ACK] Seq=27310 Ack=88489 Win=130432 Len=0
15:00:06.670310044 172.22.160.1 → 172.31.3.127 TLSv1.2 1239 Application Data, Encrypted Alert
15:00:06.711199624 172.31.3.127 → 172.22.160.1 TCP 54 55894 → 443 [ACK] Seq=27310 Ack=89675 Win=129408 Len=0
Client (172.31.3.127) is in CLOSE-WAIT since after 15:00:01 and before 15:01:01
15:51:05.333458718 172.31.3.127 → 172.22.160.1 TCP 54 55894 → 443 [RST, ACK] Seq=1 Ack=1 Win=1171 Len=0
18:13:53.745065284 172.31.3.127 → 172.22.160.1 TCP 74 55894 → 443 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=2623469574 TSecr=0 WS=128
18:13:54.767213091 172.31.3.127 → 172.22.160.1 TCP 74 [TCP Retransmission] 55894 → 443 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=2623470596 TSecr=0 WS=128
18:13:56.783213562 172.31.3.127 → 172.22.160.1 TCP 74 [TCP Retransmission] 55894 → 443 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=2623472612 TSecr=0 WS=128
18:14:01.039219839 172.31.3.127 → 172.22.160.1 TCP 74 [TCP Retransmission] 55894 → 443 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=2623476867 TSecr=0 WS=128
18:14:09.231207734 172.31.3.127 → 172.22.160.1 TCP 74 [TCP Retransmission] 55894 → 443 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=2623485059 TSecr=0 WS=128
18:14:25.359233329 172.31.3.127 → 172.22.160.1 TCP 74 [TCP Retransmission] 55894 → 443 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=2623501186 TSecr=0 WS=128
So Gaia thinks this is an existing session and won't open it since our TCP session timeout is 14400 seconds instead of the default 3600 seconds. But what about the RST,ACK at 15:51 ...? Argh