Solved: Re: R80.20 MTU and SecureXL Problem

Jan_Kleinhans · ‎2019-06-21

Hello,

we have a Ethernet-Link (no VPN from Checkpoint) to a network where the MTU is 1422. If we set the mtu on the interface and disable SecureXL the Clients (with default MTU of 1500) get the ICMP Fragmentation Packet and start to send packets with smaller MTU.

When we reactivate SecureXL the Clients starts to send 1500 byte packets again and do not get an ICMP Fragmentation paket from the Firewall.

We are using an Checkpoint 5600 Cluster with R80.20 with latest HFA.

Did anybody had the same problem?

Jan

Ilya_Yusupov · ‎2019-08-01

Hi all,

we found the issue and we have a fix that will be included on next JHF.

if someone want the fix immediately please open case in TAC and we will provide the fix.

Thanks,

Ilya

View solution in original post

Timothy_Hall · ‎2019-06-21

Yep, see this SK: sk98070: Traffic sent over a VPN tunnel does not reach its destination because SecureXL does not sta...

For additional information: sk98074: MTU and Fragmentation Issues in IPsec VPN

Gaia 4.18 (R82) Immersion Tips, Tricks, & Best Practices Video Course
Now Available at https://shadowpeak.com/gaia4-18-immersion-course

Jan_Kleinhans · ‎2019-06-21

Thanks for your quick answer.

But in our case the Firewall is not using IPSEC. It is only an interface with a smaller mtu. So changing the parameter to sim_ipsec_dont_fragment=1

should not make any difference or am I wrong?

Timothy_Hall · ‎2019-06-21

How are you disabling SecureXL in R80.20? fwaccel off? vpn accel off?

If you disable SecureXL selectively for the involved IP addresses as specified here:

sk104468: How to disable SecureXL for specific IP addresses

but leave SecureXL active otherwise, do the ICMP Frag Needed packets still get sent?

Gaia 4.18 (R82) Immersion Tips, Tricks, & Best Practices Video Course
Now Available at https://shadowpeak.com/gaia4-18-immersion-course

Jan_Kleinhans · ‎2019-06-23

Hello,

I disabled it with fwaccel off. This seems to work without problems.
I cannot exclude specific IP addresses as this seems to be a problem with all IP Adresses behind this network Interface (Possible IPs 192.168.0.0/16).
At the moment I will let the acceleration disabled and open a TAC.
I'm surprised, that I am the only one to have this problem with R80.20.

Thanks,

Jan

Matt_Killeen · ‎2019-06-26

I don't think you are the only one with this problem. I think we have it as well.

We recently upgraded from R77.30 to R80.20 on a 15600 cluster and, prior to the upgrade to R80.20, communications from a public internet source to one of our web servers on port TCP 443 (HTTPs) completed as expected.

Following the upgrade, HTTPs communications stall due to TLS v1.2 Server Hello messages being sent with a length of 1514 and the client then sends ICMP TYPE 3 CODE 4 Destination Unreachable, Fragmentation messages which we appear to ignore.

The initial TCP SYN and SYN-ACK also contains the TCP Option : Maximum Segment Size : 1460 bytes so we shouldn't really be sending a 1514 byte packets back to the client.

I have a rolling packet capture on our public interfaces and there is evidence to confirm that the client was sending ICMP TYPE 3 CODE 4 Destination Unreachable, Fragmentation messages prior to the CheckPoint upgrade.

At the moment I’ve no idea whether r77.30 ignored the DF flag and fragmented the packets as requested or whether r77.30 forwarded the ICMP message to the server or whether it was handled differently elsewhere. So, we've escalated to vendor support.

Without a resolution to hand, we had to set the MTU to 1400 on the Web Server to take the firewall out of the equation and allow us to investigate further.

Over the next couple of days it became apparent that some private wan traffic was also affected and that turning off fwaccel resolved the issue and allowed the traffic to complete.

I'll keep a close watch on this thread and update if I have any further information.

Matt_Killeen · ‎2019-06-27

UPDATE: turning Secure XL off caused intermittent connectivty problems for at least one identified web service for multiple public clients. Followed sk104468 in order to disable SecureXL for traffic sent from/to specific IP addresses - this allowed us to address the issue for specific private WAN traffic and switch Secure XL back on for everything else whilst we continue to troubleshoot.

Matt_Killeen · ‎2019-07-05

UPDATE: I have been able to successfully recreate this issue using a lab client to the live service and compare r80.20 to r77.30 behaviour.

As we still have our work-a-round in place to protect our client wher the web server MTU is set at 1400, the numbers are reduced but the effect is the same.

The Lab is an Ubuntu client with an MTU of 1500 and a Cisco router with the LAN interface MTU set at 1360 (this is significant as it is this interface that doesn't want to transmit anything larger than 1360).

The Web server MTU is set at 1400 and is behind a CheckPoint r80.20 15600 Cluster.

During tests, the packet anaysis shows the SYN and SYN-ACK negotiate their preferred Maximum Segment Size (MSS) and that when the Cisco router receives a packets larger than it likes (due to the MTU set at 1360) , the packets are dropped and ICMP TYPE 3 CODE 4 messages are created and transmitted. CheckPoint r80.20 appears to ignore these ICMP messages and packets are re-transmitted until the HTTPs connection times out.

As we also have another CheckPoint 15600 cluster still running r77.30, the same lab test was run against a live web service NAT'd behind that cluster (the MTU of the live web server is 1500 here but the effect is the same).

In this test, the packet anaysis shows the SYN and SYN-ACK again negotiate their preferred Maximum Segment Size (MSS) and that when the Cisco router receives a packet larger than it likes (due to the MTU set at 1360) , the packet is dropped and an ICMP TYPE 3 CODE 4 message is created and transmitted. The difference being is that, from that point on, packets sent are lower than the Cisco's 1360 MTU and the HTTPs transaction completes successfully.

These results have being passed to CheckPoint support and we are scheduling further debug and testing with CheckPoint earl next week in an out of hours window.

Will keep this thread updated.

Matt.

Matt_Killeen · ‎2019-07-10

Following remote session with checkpoint we were able to determine our issue is most likely patched in the latest GA hotfix. (see https://supportcenter.checkpoint.com/supportcenter/portal?eventSubmit_doGoviewsolutiondetails=&solut...).

In order to resolve this we will be scheduling the installation of the latest hotfix version (87 is GA as of yesterday).

Once scheduled, installed and tested, I will update this thread.

Matt.

HeikoAnkenbrand · ‎2019-07-05

SecureXL has been significantly revised in R80.20.

There are new fw monitor chain (SecureXL) objects that do not run in the virtual machine.

SecureXL offloading chain modules

# fw ctl chain

The new fw monitor chain modules (SecureXL) do not run in the virtual machine (vm).

SecureXL inbound (sxl_in) > Packet received in SecureXL from network
SecureXL inbound CT (sxl_ct) > Accelerated packets moved from inbound to outbound processing (post routing)
SecureXL outbound (sxl_out) > Accelerated packet starts outbound processing
SecureXL deliver (sxl_deliver) > SecureXL transmits accelerated packet

More see here:
- R80.x Security Gateway Architecture (Logical Packet Flow)
- R80.20 - New FW Monitor inspection points

If there are problems with the MTU size, you should open a TAC ticket.

@Timothy_Hall

CUT>>>
If you disable SecureXL selectively for the involved IP addresses as specified here:

sk104468: How to disable SecureXL for specific IP addresses
<<<CUT

This solution does not resolve MTU Size problems. The MTU Size always affects the interface and does not affect single IP addresses in sk104468

➜ CCSM Elite, CCME, CCTE ➜ www.checkpoint.tips

Jan_Kleinhans · ‎2019-07-16

Update:

We have already installed the HFA Take 87. It does not help with our problem.

Also we got a Hotfix from Checkpoint for Take 87 that does not work at the moment.

Checkpoint is working on a solution.

Vladimir · ‎2019-07-16

@PhoneBoy , can we get someone from Check point to weigh-in on this issue?

Additionally, I'd like to know if those running R80.30 GA are experiencing the same.

I suspect that at least one of my larger clients was battling the issues caused by this behavior for months without resolution after the upgrade to R80.20.

PhoneBoy · ‎2019-07-16

It would help for those who have the issue to send me the SR number in a PM or via email to my username AT checkpoint DOT com.

Hellen5394 · ‎2019-07-17

did you get any proper solution?, I am also facing the same issue.

tellthebell

Peter_Lyndley · ‎2019-07-18

Hi All, this may be slightly off topic, however we are also seeing something similar on latest versions of embedded Gaia (R77.20.70+), havent seen it before that version but dont know exactly when it came in. Turning off SecureXL works , or lowering external interface MTU

Khalid_Aftas · ‎2019-07-31

We encountered the same issue after upgrading to R80.20.

CAPWAP tunnel traffic between a WLC and the Anchor is impacted, the FW drops packet because fragmentation table is full.

CP suggested to SK65074 to increase IP Fragment table size from 200 to 2000.

Fixed the issue only for some few hours, next day we escalated the SR and they told us to pute 4000, fixed the issue atm, but i think it will happen again.

Something has changed in R80.20 that's for sure, Support could not provide more info as they need to check with r&d.

Any news from your side ?

Timothy_Hall · ‎2019-07-31

What changed in R80.20 is the ability of the SecureXL driver to perform virtual defragmentation of packets. In R80.10 and earlier, any fragmented packets received by SecureXL would be instantly sent to the F2F path on a Firewall Worker core for handling. It is possible that the new SecureXL virtual defragmention code is somehow leaking table entries and not freeing them back up in a timely fashion (or ever) once the virtual defrag timer has expired. If that is indeed the case, increasing the size as specified in sk65074 will only delay the inevitable. This assumes of course that changing that value from the IPS signature actually applies to the SecureXL virtual defragmentation code, and not just the F2F virtual defragmention code which is what that setting was originally created to modify.

Gaia 4.18 (R82) Immersion Tips, Tricks, & Best Practices Video Course
Now Available at https://shadowpeak.com/gaia4-18-immersion-course

Khalid_Aftas · ‎2019-08-01

Thank you Timothy for your valuable explanation.

Applying a new value, instantly freed up the drops from some time, i indeed was thinking the same that it will only delay the problem, and suspect something with the table being stuck or something.

That's a bummer for us, we need to see what solution CP will provide, at the mean time r&d is not even involved.

Is there anyway to bypass this by disableing Secure XL for specific IPs ?

Matt_Killeen · ‎2019-08-01

Follow sk104468 in order to disable SecureXL for traffic sent from/to specific IP addresses

Ilya_Yusupov · ‎2019-08-01

Hi all,

we found the issue and we have a fix that will be included on next JHF.

if someone want the fix immediately please open case in TAC and we will provide the fix.

Thanks,

Ilya

Khalid_Aftas · ‎2019-08-01

Hi Ilya, we already have a SR opened SR#6-0001716363, can you please use it to deliver us the fix ? thx

Jan_Kleinhans · ‎2019-08-05

Hello,
is it the fix fw1_wrapper_HOTFIX_R80_20_JHF_T87_190_MAIN_GA_FULL.tgz?

This one is not working in our envionment.

Best regards,

Jan

Yifat_Chen · ‎2019-08-12

Fix is planned to be part of the next Jumbo; Ballpark ETA - end of Sep.

Thanks

Release Management Groups

Khalid_Aftas · ‎2019-08-12

The Hotfix fixed the issue 🙂 thx

Matt_Killeen · ‎2019-08-12

@Khalid_Aftas - please can you confirm the HotFix that fixed this issue? Many thanks.

Khalid_Aftas · ‎2019-08-12

Well i said yes, but unfortunatly issues happen again, it just took more time as the table was increased to 8000.

Even with the fix, the issue is still there...
This is becoming critical.
@;133267895;[vs_3];[tid_3];[fw4_3];fw_log_drop_ex: Packet proto=97 192.168.10.23:34166 -> 172.16.89.21:58658 dropped by fwfrag_error Reason: Virtual defragmentation error: fragment table is full;

Timothy_Hall · ‎2019-08-12

Assuming that you have "Number of seconds after which discarding incomplete packets" still set to the default 1 second under Inspection Settings...IP Fragments...Advanced, no way there could be 8,000 frags legitimately pending for virtual reassembly. Got to be some kind of failure to free frag table entries...

Gaia 4.18 (R82) Immersion Tips, Tricks, & Best Practices Video Course
Now Available at https://shadowpeak.com/gaia4-18-immersion-course

Khalid_Aftas · ‎2019-08-12

Yes, 1 second.
Well for the "bug" i think we know it already, but the fix was supposed to fix it.

Timothy_Hall · ‎2019-08-12

There may well be more than one vector into the issue, fixing one vector obviously doesn't fix them all. 🙂

Gaia 4.18 (R82) Immersion Tips, Tricks, & Best Practices Video Course
Now Available at https://shadowpeak.com/gaia4-18-immersion-course

Timothy_Hall · ‎2019-08-12

Please provide output of following commands:

fwaccel tab -t frag_table -s

and

first few lines of: fwaccel tab -f -t frag_table

Curious to see:

1) If upping Inspection Settings frag signature to 8000 actually propagated into SecureXL table frag_table

2) Using the first command it should be interesting to see if the table is accumulating entries over time and slowly increasing, this may indicate leakage of table entries, especially with the watch command

3) It doesn't look like the -x option (clear table) is supported for fwaccel tab, would be nice to be able to flush/clear this table

4) Might also be interesting to do a fwaccel tab -f -m 10000 -t frag_table and massage the output to see if there is a common entry shown that seems to be "stuck" in the table, might give some clues about possibly how to work around it.

Gaia 4.18 (R82) Immersion Tips, Tricks, & Best Practices Video Course
Now Available at https://shadowpeak.com/gaia4-18-immersion-course

Are you a member of CheckMates?

R80.20 MTU and SecureXL Problem