Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Steffen_Appel
Advisor

route flipping on R80.40

We have upgraded from R80.10 to R80.40 (HF48) and have a route flipping issue:

# ip route get a.b.c.d

a.b.c.d via <correct next hop ip> dev eth5 src <correct source>

 

# ip route get a.b.c.d

a.b.c.d via <correct next hop ip> dev eth2 src <correct source>

 

For whatever reason the interface in the routing table is changed from eth5 to eth2.

 

In fw monitor you can see it as well, the first packet goes to eth5 correctly, the second one after a route flip goes to eth2, which is wrong.

[vs_0][fw_2] eth5:O[44]: ****** -> ***** (UDP) len=200 id=61683 UDP: 2464 -> 49910

[vs_0][fw_2] eth2:O[44]: ****** -> ****** (UDP) len=200 id=49885 UDP: 2464 -> 49910

 

Did anyone have similiar problems?

TAC case is opened.

30 Replies
Timothy_Hall
Champion
Champion

Is your firewall statically or dynamically routed?  What does the actual routing table (netstat -rn) show for the destination network?  

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
0 Kudos
Steffen_Appel
Advisor

It is completly static routed.

0 Kudos
Yair_Shahar
Employee
Employee

Hi,

Do you happen to have this route duplicate on both interfaces?
Sound weird issue networking wise, I would say checking Gaia configuration in clish (show configuration), in config/active and in kernel (ip route, ifconfig)

also - any chance there is physical interface flapping going on?

Thanks,
Yair
Steffen_Appel
Advisor

No interface flapping.
The problem occurs on both cluster nodes.

Only one static route to the destination host:
set static-route a.b.c.d/32 nexthop gateway address ****** on

And in active:
routed:instance:default:static:network:a.b.c.d t
routed:instance:default:static:network:a.b.c.d:masklen:32 t
routed:instance:default:static:network:a.b.c.d:masklen:32:gateway t
routed:instance:default:static:network:a.b.c.d:masklen:32:gateway:address:****** t


Steffen_Appel
Advisor

We have only seen it for UDP and ICMP never for TCP.

0 Kudos
Timothy_Hall
Champion
Champion

We have only ssen it for UDP and ICMP never for TCP.

This statement makes no sense unless you are using Policy Based Routing (PBR), are you?  Routing is performed by the Linux IP Driver and is not generally influenced by anything in Check Point's code that would be distinguishing service or protocol, unless a feature such as ISP Redundancy is in use.

Regular IP routing only looks at destination IP address and could care less about service or protocol.  Please provide the output of netstat -renv from expert mode for the route in question, once when it is showing eth2 and the other when it is showing eth5.  We need to see the live routing table and associated flags/use.

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
0 Kudos
Steffen_Appel
Advisor

No pbr is used:

show pbr
PBR Summary

PBR has 0 tables
PBR has 0 rules

 

netstat shows:

a.b.c.d ******  255.255.255.255 UGH 0 0 0 eth5 in the correct case

since the node is not in production right now of course I cannot provide the incorrect one.

 

Just for the info, there are about 130 static routes on the gateway.

 

 

0 Kudos
Timothy_Hall
Champion
Champion

Without being able to see the unredacted value of the route next hop address and the full unredacted eth2 and eth5 interface configuration it is difficult to surmise what is wrong.  Feel free to PM this information to me without redacting the IP addresses.  If you aren't comfortable with doing that you'll need to work through TAC.

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
0 Kudos
Steffen_Appel
Advisor

Did you receive the PN?

0 Kudos
Timothy_Hall
Champion
Champion

No I don't see your PM.

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
0 Kudos
Steffen_Appel
Advisor

PM sent.

 

0 Kudos
Ilya_Yusupov
Employee
Employee

Hi @Steffen_Appel ,

 

do you have ISPR configured on the system?

Steffen_Appel
Advisor

No ISP redundancy configured no.

Ilya_Yusupov
Employee
Employee

Strange, did you open TAC case? if so can you share the number so i can follow it?

0 Kudos
Steffen_Appel
Advisor

Yes TAC case is open, will PM you the number.

Had a two day debugging season on the WE.

 

0 Kudos
Steffen_Appel
Advisor

R&D is still investigating.
0 Kudos
Steffen_Appel
Advisor

Still no solution, but additional customers with the same problem.

0 Kudos
John_Fleming
Advisor

by chance does 

show route all

from clish show anything strange?

I would enable trace (set trace kernel all on, set trace static all on, etc) and then check /var/log/routed.log to see if you can get some hints on what is going on as routed is the only process that should be making routing changes so it would be the place to debug.

Timothy_Hall
Champion
Champion

Actually I've received some inside information about this case, and it appears to be a problem with the Linux route cache (ip route show cache) which is somehow getting cached route entries that are associated with the wrong interface.  The main routing table (ip route show or netstat -rn) always shows the problematic route associated with the correct interface.  So this would appear to be a Gaia/Linux bug, and I find it interesting that the IP route cache functionality was abandoned in the 3.6 version of the Linux kernel, but unfortunately there is no apparent way to disable it permanently.  A temporary workaround is to flush the route cache with the ip route flush cache command but the problem just comes back later.

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
0 Kudos
Steffen_Appel
Advisor

It seems, that if you have VPNs on both interfaces (eventhough, that the impacted traffic is non VPN traffic), the FW sometimes decides to switch to MAC based routing and inserts an incorrect route in the route cache.

 

Setting cphwd_enable_ecmp to 1 seems to fix it, we are now waiting for a confirmation and a fix.

0 Kudos
Timothy_Hall
Champion
Champion

Hmm I guess MAC based routing would be needed for Equal Cost MultiPath.  The presence of cphwd in that variable name would imply that it is implemented in SecureXL, so I imagine selectively disabling SecureXL for the problematic interfaces/addresses would solve this as well.

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
John_Fleming
Advisor

Ok, so that is a scary bug. Any idea what is triggering it and effected versions?

0 Kudos
Steffen_Appel
Advisor

I guess it happens, if you got more than one interface, which terminates VPN session (BTW the affected traffice is not VPN traffic).

 

Affected version for us are R80.40 with any current JHFA. I assume the bug will happen in R80.30/3.10 as well.

0 Kudos
Steffen_Appel
Advisor

No disabling SecureXL did not help.

0 Kudos
John_Fleming
Advisor

Where is this at? Any resolution?

0 Kudos
Steffen_Appel
Advisor

received a HF for it, but as it requires a production down time, we cannot implement it before a WE in September.

It was promised, that it becomes part of the JFA pretty soon.

John_Fleming
Advisor

Thanks for the update.

0 Kudos
Steffen_Appel
Advisor

Now there is a SK about the issue as well: SK168881.

Steffen_Appel
Advisor

The HF fixed the issue, now we have to wait for it to be included in the Jumbo.

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events