Solved: Strange ICMP issues in R80.40 with hotfixes greate...

Thomas_Eichelbu · ‎2021-06-09

Hello fellow CheckMates.

We have encountered some strange issues after upgrading R80.40 above Take93/Take94.
We see that ICMP is NOT passing through the gateway, it starts to work ONLY after a TCP packet has been sent ...
This happens in local attached networks, over routed networks and also over VPN ...
It doesnt matter if SecureXL is ON/OFF ...
Regardless if openserver or appliance

what we see:
only an echo / never a replay

[vs_0][fw_4] eth4:i[44]: 172.XX.66.228 -> 172.ZZ.10.43 (ICMP) len=96 id=30804
ICMP: type=8 code=0 echo request id=64388 seq=0
[vs_0][fw_4] eth4:I[44]: 172.XX.66.228 -> 172.ZZ.10.43 (ICMP) len=96 id=30804
ICMP: type=8 code=0 echo request id=64388 seq=0
[vs_0][fw_5] eth4:i[44]: 172.XX.66.228 -> 172.ZZ.10.43 (ICMP) len=96 id=30829
ICMP: type=8 code=0 echo request id=64388 seq=1
[vs_0][fw_5] eth4:I[44]: 172.XX.66.228 -> 172.ZZ.10.43 (ICMP) len=96 id=30829
ICMP: type=8 code=0 echo request id=64388 seq=1

We see only small "i" and big "I" ... never small "o", big "O"
We know this destination is ALIVE.
When we send an TCP packet, immediatley an ARP request is made and an ARP entry is created then the ICMP works!!!
This happens also over VPN!
On the DESTINATION IP we checked with tcpdump, NOTHING was received until the first TCP SYN was sent, then the ICMP followed!
No drops seen with fw ctl zdebug / no drops seen on Smartlog
When the ping works, is sometimes stops after 60 seconds! (ARP timeout = 60?)
This happens mostly to "silent" device which do not have permanent TCP sessions runnings becasue TCP "heals" the connection.

Several CP Cases are ongoing, and alot of R80.40 installations are affected ...
And we had numerous remote sessions with TAC to proove the issue is real and not a hoax.

yes there is this SK for example ...
When SecureXL is enabled, no ARP is sent and traffic fails (checkpoint.com) sk152093
it decribes the exact opposite ...

what is your experience from the field?

best regards
Thomas

Thomas_Eichelbu · ‎2021-06-15

Hello

Check Point TAC came with more information:
"Indeed the hotfix should be integrated into the upcoming Jumbos, currently, we don't have an exact ETA but you can follow sk165456 for PMTR-69435.
sk173933 was created for this issue, just in case you wish to follow up further."

so finally the hotfix "fw1_wrapper_HOTFIX_R80_40_JHF_T118_865_MAIN_GA_FULL.tgz" really solved the issue!
some final words from TAC about the root cause would be fantastic, to understand the issues more precise!

View solution in original post

Tobias_Moritz · ‎2021-06-10

We experienced the same problem (only on one of our multiple clusters) and opened a TAC case at April 9th. It took a long time, but R&D finally said they found the root cause:

May 28th.:

Good news, we managed to find out the root cause of the issue which was an update for the PBR and ABR functionality that got integrated into take_92. You can refer to this documentation sk165456 "Jumbo Hotfix Accumulator for R80.40 (R80_40_jumbo_hf) " for further information.

A fix for this issue is already under development and should be integrated into the coming Jumbos.

To monitor the fix implementation, you can use this fix ID " PRJ-26756 " to know whether it will be integrated.

June 8th:

The fix was compiled successfully and the fix will be integrated into all the affected versions ( R80.40 and R81 ).
A port fix for R80.40 take_118 has been already requested and I will keep you informed as soon as possible whether it is ready.

Timothy_Hall · ‎2021-06-10

That explanation makes perfect sense as packets disappearing after iI and not reentering o means that the Gaia OS itself "ate" the packet, and since PBR/ABR is part of the Gaia OS that tracks. I mentioned this in my speech at CPX 2018 and called it the "roach motel" scenario, and also covered troubleshooting this extensively in my Max Capture video series.

New Book: "Max Power 2026" Coming Soon
Check Point Firewall Performance Optimization

Thomas_Eichelbu · ‎2021-06-11

Hi Timothy,

it seems this Packet Injector from sk110865 only works on R80.10 and not on versions like R80.30 and up?
is there are newer version?

Timothy_Hall · ‎2021-06-11

pinj is not supported past R80.10 due to the SecureXL overhaul in R80.20. Alternative packet generators that are built-in to Gaia are tcptraceroute and hping2.

New Book: "Max Power 2026" Coming Soon
Check Point Firewall Performance Optimization

Thomas_Eichelbu · ‎2021-06-11

Hi Tobias,

yes Check Point TAC said a custom hotfix on top of HFA118 is on its way ... it should be available by end of this week.
In the meantime we were told to

+ switch to Usermode FW
+ create static ARP entries

well i have not tried this so far as the most costumer enviroments are not meant as playground for guessing games ...
we will see!

Tobias_Moritz · ‎2021-06-14

Hotfix is available through TAC now.

Hotfix information:

Name: fw1_wrapper_HOTFIX_R80_40_JHF_T118_865_MAIN_GA_FULL.tgz
MD5SUM: c90b532396928d2b37ec0a0f0b9e4ed5

Thomas_Eichelbu · ‎2021-06-14

Hello Tobias,

yes true i have received the same information today ... tomorrow we try it.
then we will see if it resolves all issues!

Thomas_Eichelbu · ‎2021-06-15

Hello

Check Point TAC came with more information:
"Indeed the hotfix should be integrated into the upcoming Jumbos, currently, we don't have an exact ETA but you can follow sk165456 for PMTR-69435.
sk173933 was created for this issue, just in case you wish to follow up further."

so finally the hotfix "fw1_wrapper_HOTFIX_R80_40_JHF_T118_865_MAIN_GA_FULL.tgz" really solved the issue!
some final words from TAC about the root cause would be fantastic, to understand the issues more precise!

Are you a member of CheckMates?

Strange ICMP issues in R80.40 with hotfixes greater Take 94.