Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Chinmaya_Naik
Advisor

Showing logs "Missing OS Route" when place new CP Appliance (Suspect ARP issue)

Hi Team,

Diagram.png

 

As previously we using L3 Switch (Name: POC L3) where 4 interface which comes from two internal switches (ABC and XYZ Switch) is terminated with upstream L3 Switch (Name: POC L3) internal VLAN interface.

➡️  Two internal switches (ABC and XYZ) both having one bond interface with mode LACP.

➡️  On Checkpoint external interface also having a bond interface with network 192.168.100.0/22.

➡️  Our requirement is to place Checkpoint Firewall (23500) in between internal switch (ABC and XYZ) and L3 Switch (Name: POC L3).

We face below challenges

Scenario 1

As per our requirement, we can not add two bonds on one VLAN, If we added then it will be creating two networks, for example, bond1.100, bond2.100 with respect to VLAN 100.

➡️  So we added 4 interfaces to one BOND as bond1 with mode LACP and then create a VLAN and added that bond1 but we face the issue because all interface is active because of mode LACP.

➡️  So on this scenario, If we remove the cable from any one internal switch, for example, ABC switch then XYZ switch working fine, we able to access server as well ssh from LAN/WAN.

Scenario 2:

As the first scenario does not work as expected so we change the configuration.

➡️  We create a two BOND, bond 1 and bond 2 with respect to the internal switch and then bridge between bond 1 and bond 2 so in this scenario, we only assign one IP address on the bridge interface as 172.16.100.1/22 which meet our requirement.

So in this scenario we able to ping internal server that behind ABC and XYZ switch.

Issue:

➡️  we check the logs found that the packet drop on Checkpoint internal bond interface, showing error "missing OS route".

➡️  Base on the logs I suspect as the ARP cache is overflowing. The most possible reason - too much traffic on the network (generated by some application, by some hosts, or by related factors). (sk108587 )

➡️  By default, the arp cache size is 4096 ARP cache slot so maximum we can increase up to 4096*4=16384 arp cache slot.

Possible Solution

1. If we use two Checkpoint device and give cluster VIP address as 172.16.100.1/22 then we will not face challenges with bond interface configuration.

2. If we place the checkpoint gateway above the POC L3 Switch then we not face challenge related to bond.

3. We need to increase the arp cache size up to 16386 arp cache slot before that we need to check the below command to verify whether arp entry will increase or not after live the checkpoint gateway.

arp -an | wc -l (to check the line on arp cache size)

Check the message file as "neighbour table overflow"

command : dmesg | grep -i “table overflow”

From MAX POWER Book : One interesting side effect of not having a private transit network between the internal interface of the firewall and your internal core router is that the firewall will have to maintain far more IP address to MAC address mappings in its ARP cache than it otherwise would. This is especially true if there are hundreds or thousands of workstations/servers located on VLANs directly attached to the firewall’s various interfaces.

Such a large network will, of course, have far more broadcast traffic flooded to all stations on the network including the firewall, resulting in reduced overall network
throughput.

However, one situation that can occur in this case will severely damage the firewall’s performance (almost to the point of looking like rolling outages in some cases): an
overflow of the firewall’s ARP cache.

Please suggest us what about the possible reason for getting the error "missing OS route".

NOTE: Message file is overwritten so we unable to find "neighbor table overflow" also we do not have any output or arp cache size during the issue.

Now we again plane to place the Checkpoint with the proper plane so need all your help.

 

Regards,

@Chinmaya_Naik 

0 Kudos
10 Replies
FedericoMeiners
Advisor

Hello,

It's a fun problem to solve for sure, I don't have any magic commands to suggest, however... ¿Have you tried to solve the architecture with VSX? 

Also what I'm seeing is that your gateway is acting as a core router for the network, maybe it's too much to handle and it would be a better idea to deploy a core router, this strategy will depend on what are you pretending to inspect and budget, obviusly.

Going back to VSX, create two separate virtual systems, one for network A and another one to network B, this should limit the ARP table size since it's dependant for each VS. For routing between the two networks you may need to create a virtual switch / virtual router or give that role to your POC Switch if it has L3 capabilities.

Hope that my 2 cents help,

Federico Meiners

____________
https://www.linkedin.com/in/federicomeiners/
0 Kudos
Vladimir
Champion
Champion

@Chinmaya_Naik , the problem, as I see it is the presence of a huge flat network that you are trying to split, possibly for the lateral traffic inspection.

To address the "missing OS routes", try going into WebUI, Advanced Routing / Routing Options / click to select on “Kernel Routes” and "Apply".

If you must allow broadcast traffic propagation between networks behind different interfaces/bonds while simultaneously attempting to route traffic to other networks,  you have to use bridge and will, inevitably, suffer the huge ARP table, as well as associated performance deterioration.

Since you are still experimenting and seem to have two gateways, try using one unit in the 3-way bridge and another one as a routing gateway. Drop the broadcast traffic on the bridge towards the routing gateway to avoid ARP overload and see if it works.

 

 

0 Kudos
Chinmaya_Naik
Advisor

Hi @FedericoMeiners  and @Vladimir  thanks you so much for the update. I will look on this also.

As per our network architecture, it seems to be arp issue because the internal network is on one broadcast domain 172.16.100.0/22 and also we configure the Bridge interface where we added two bonds (total 4 interfaces).

So we change our plane.

New Plane v2

We revise the changes on the current diagram and introduce a new switch in between Checkpoint Gateway and internal Switch (ABC and XYZ), where

The new switch will handle flat network traffic. (Broadcast)

We make Checkpoint gateway inside interface 172.16.100.1/22 as the default gateway for all North and South traffic Inspection.

But still, as the internal network still have one broadcast domain so still the checkpoint maintain the ARP but we avoid the bond interface related challenges.

So If I change the arp cache to maximum 16386 arp cache slot then is there any challenges?

New architecture.png

 

Thank You

Regards

@Chinmaya_Naik 

0 Kudos
Chinmaya_Naik
Advisor

Hi Team,

New Plane v3

Now we make a new plane because still on our existing setup the internal network still have one broadcast domain and still checkpoint need to maintain the arp and that situation also we need to increase the arp cache. So we plane to place a new L3 switch and make default gateway as New L3 Switch.

One additional configuration: Add a static route for incoming traffic.

command: set static-route 172.16.100.0/22 nexthop gateway address <ip address of New L3 switch external interface> on

 

New architecture v 3.png

We only enable Firewall Blade now, to check the traffic so which rule is good?

"any any accept" or "internal_zone , external_zone allow" 

Hi, Team any suggestion?

Thank You

Regards

@Chinmaya_Naik 

0 Kudos
FedericoMeiners
Advisor

If you are using zone objects you need to add the interfaces to their corresponding zones, for POC purposes keep things simplified by creating a any any accept rule.

It may be a good idea to add a management and stealth rule if you want to be sure that only desired users can reach your gateways:

Management

Allowed_Networks (Or hosts) | Check_Point_Objects | Accept

Stealth

Any | Check_Point_Objects | Drop

Allow

Any | Any | Accept

Make sure to check if the customer is expecting a more hardened rulebase and remember that the any any accept is just for POC purposes, never go into production like this.

Regards,

Federico Meiners

____________
https://www.linkedin.com/in/federicomeiners/
0 Kudos
Chinmaya_Naik
Advisor

Thanks for the update @FedericoMeiners /Team

I plane "any any accept" only.

Do you have any idea if the customer is using multicast traffic, As per my knowledge the "any any accept" will not work need to create a rule for that.

This is for POC purpose so I make as "any any accept" only, also we disable the "Anti-Spoofing" and also we disable the TCP out of state for TCP, ICMP, and SCTP on Global properties stateful Inspection because our aim is to check the load on Appliance.

Thank You

regards

@Chinmaya_Naik 

0 Kudos
Vladimir
Champion
Champion

@Chinmaya_Naik , how does disabling anti-spoofing and out of state drops will help you to get a realistic load on the appliances?

0 Kudos
Chinmaya_Naik
Advisor

@Vladimir 

Thanks for the update. Yes definitely we will enable anti-spoofing and tcp out of state but before that when we put live will check the  connectivity  and with basic testing like accesss the internal server,etc.

regards

@Chinmaya_Naik 

 

0 Kudos
Denver
Explorer

any  Update for this topic please?  I have the same issue with the many error messages "Missing OS Route"

 

Thank you very much

0 Kudos
oledesma
Participant

Hi Denver. We had the same issues as explained here and it was caused by a bug in the garbage collect who has to remove from the arp_cache table the old entries. We are using the scalable platform(chassis). It is a known issue and checkpoint has a fix to resolve this. So, you have to open a support ticket in order to get the fix.

Regards

0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events