Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted

route flipping on R80.40

We have upgraded from R80.10 to R80.40 (HF48) and have a route flipping issue:

# ip route get a.b.c.d

a.b.c.d via <correct next hop ip> dev eth5 src <correct source>

 

# ip route get a.b.c.d

a.b.c.d via <correct next hop ip> dev eth2 src <correct source>

 

For whatever reason the interface in the routing table is changed from eth5 to eth2.

 

In fw monitor you can see it as well, the first packet goes to eth5 correctly, the second one after a route flip goes to eth2, which is wrong.

[vs_0][fw_2] eth5:O[44]: ****** -> ***** (UDP) len=200 id=61683 UDP: 2464 -> 49910

[vs_0][fw_2] eth2:O[44]: ****** -> ****** (UDP) len=200 id=49885 UDP: 2464 -> 49910

 

Did anyone have similiar problems?

TAC case is opened.

24 Replies
Highlighted

Is your firewall statically or dynamically routed?  What does the actual routing table (netstat -rn) show for the destination network?  

Book "Max Power 2020: Check Point Firewall Performance Optimization" Third Edition
Now Available at www.maxpowerfirewalls.com
0 Kudos
Highlighted

It is completly static routed.

0 Kudos
Highlighted
Employee+
Employee+

Hi,

Do you happen to have this route duplicate on both interfaces?
Sound weird issue networking wise, I would say checking Gaia configuration in clish (show configuration), in config/active and in kernel (ip route, ifconfig)

also - any chance there is physical interface flapping going on?

Thanks,
Yair
Highlighted

No interface flapping.
The problem occurs on both cluster nodes.

Only one static route to the destination host:
set static-route a.b.c.d/32 nexthop gateway address ****** on

And in active:
routed:instance:default:static:network:a.b.c.d t
routed:instance:default:static:network:a.b.c.d:masklen:32 t
routed:instance:default:static:network:a.b.c.d:masklen:32:gateway t
routed:instance:default:static:network:a.b.c.d:masklen:32:gateway:address:****** t


Highlighted

We have only seen it for UDP and ICMP never for TCP.

0 Kudos
Highlighted

We have only ssen it for UDP and ICMP never for TCP.

This statement makes no sense unless you are using Policy Based Routing (PBR), are you?  Routing is performed by the Linux IP Driver and is not generally influenced by anything in Check Point's code that would be distinguishing service or protocol, unless a feature such as ISP Redundancy is in use.

Regular IP routing only looks at destination IP address and could care less about service or protocol.  Please provide the output of netstat -renv from expert mode for the route in question, once when it is showing eth2 and the other when it is showing eth5.  We need to see the live routing table and associated flags/use.

Book "Max Power 2020: Check Point Firewall Performance Optimization" Third Edition
Now Available at www.maxpowerfirewalls.com
0 Kudos
Highlighted

No pbr is used:

show pbr
PBR Summary

PBR has 0 tables
PBR has 0 rules

 

netstat shows:

a.b.c.d ******  255.255.255.255 UGH 0 0 0 eth5 in the correct case

since the node is not in production right now of course I cannot provide the incorrect one.

 

Just for the info, there are about 130 static routes on the gateway.

 

 

0 Kudos
Highlighted

Without being able to see the unredacted value of the route next hop address and the full unredacted eth2 and eth5 interface configuration it is difficult to surmise what is wrong.  Feel free to PM this information to me without redacting the IP addresses.  If you aren't comfortable with doing that you'll need to work through TAC.

Book "Max Power 2020: Check Point Firewall Performance Optimization" Third Edition
Now Available at www.maxpowerfirewalls.com
0 Kudos
Highlighted

Did you receive the PN?

0 Kudos
Highlighted

No I don't see your PM.

Book "Max Power 2020: Check Point Firewall Performance Optimization" Third Edition
Now Available at www.maxpowerfirewalls.com
0 Kudos
Highlighted

PM sent.

 

0 Kudos
Highlighted
Employee+
Employee+

Hi @Steffen_Appel ,

 

do you have ISPR configured on the system?

Highlighted

No ISP redundancy configured no.

Highlighted
Employee+
Employee+

Strange, did you open TAC case? if so can you share the number so i can follow it?

0 Kudos
Highlighted

Yes TAC case is open, will PM you the number.

Had a two day debugging season on the WE.

 

0 Kudos
Highlighted

R&D is still investigating.
0 Kudos
Highlighted

Still no solution, but additional customers with the same problem.

0 Kudos
Highlighted

by chance does 

show route all

from clish show anything strange?

I would enable trace (set trace kernel all on, set trace static all on, etc) and then check /var/log/routed.log to see if you can get some hints on what is going on as routed is the only process that should be making routing changes so it would be the place to debug.

Highlighted

Actually I've received some inside information about this case, and it appears to be a problem with the Linux route cache (ip route show cache) which is somehow getting cached route entries that are associated with the wrong interface.  The main routing table (ip route show or netstat -rn) always shows the problematic route associated with the correct interface.  So this would appear to be a Gaia/Linux bug, and I find it interesting that the IP route cache functionality was abandoned in the 3.6 version of the Linux kernel, but unfortunately there is no apparent way to disable it permanently.  A temporary workaround is to flush the route cache with the ip route flush cache command but the problem just comes back later.

Book "Max Power 2020: Check Point Firewall Performance Optimization" Third Edition
Now Available at www.maxpowerfirewalls.com
0 Kudos
Highlighted

It seems, that if you have VPNs on both interfaces (eventhough, that the impacted traffic is non VPN traffic), the FW sometimes decides to switch to MAC based routing and inserts an incorrect route in the route cache.

 

Setting cphwd_enable_ecmp to 1 seems to fix it, we are now waiting for a confirmation and a fix.

0 Kudos
Highlighted

Hmm I guess MAC based routing would be needed for Equal Cost MultiPath.  The presence of cphwd in that variable name would imply that it is implemented in SecureXL, so I imagine selectively disabling SecureXL for the problematic interfaces/addresses would solve this as well.

Book "Max Power 2020: Check Point Firewall Performance Optimization" Third Edition
Now Available at www.maxpowerfirewalls.com
Highlighted

Ok, so that is a scary bug. Any idea what is triggering it and effected versions?

0 Kudos
Highlighted

I guess it happens, if you got more than one interface, which terminates VPN session (BTW the affected traffice is not VPN traffic).

 

Affected version for us are R80.40 with any current JHFA. I assume the bug will happen in R80.30/3.10 as well.

0 Kudos
Highlighted

No disabling SecureXL did not help.

0 Kudos