Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
GeorgeF
Contributor
Jump to solution

How the Gaia Gateway cluster learn the ARP entry and update ARP table

Hi Experts

 

Recently we find an interesting thing, the environment is as below:

2x Gaia Security Gateways (R81.20 take 26 ) in a ClusterXL Active/Standby mode, the VIP is 10.217.81.1, the active member is 10.217.81.2 .

The interface acted as a gateway of wifi users, the DHCP is on the Wifi access switches. As the DHCP will change the ARP entry and according to sk175603 https://support.checkpoint.com/results/sk/sk175603  , the parameter was set to 1  ( fw ctl set int cphwd_refresh_nh 1 -a )

Let's see the below captures,  we can see that from Frame No.533 to 536, the client 33:35:0c keeps asking gateway's MAC. 

 

Capture1.pngCapture2.png

At Frame No.537, the gateway d2:c3:20 replied to the client.

 

My question is, why the Gaia gateway didn't learn the ARP entry for the client 10.217.81.87 from Frame No.536? 

And from Frame No.539, the gateway 10.217.81.1 keeps asking for the client's MAC, which should tell to 10.217.81.2 ? This behavior really looks like an ARP Spoofing... and if downstream switch turned on Anti ARP Spoofing feature, these broadcasts might be taken as spoofing.. (e.g. A asks for B's MAc address and tell to Z ?....If A asks CDEFGHIJKLM's MACs and all send to Z , then Z will receive huge replies from CDEFGHIJKLM..)

Actually this is our issue : the client didn't receive these ARP request broadcast packets, and the gateway doesn't have the ARP entry for the client 10.217.81.87. So if a Frame reached Gaia gateway, but it doesn't have the destination MAC address, then the Frame will be dropped?

 

Any ideas about the ARP learning process of Gaia Gateway, I know it is Linux OS based, and might totally different with Cisco Switches...I think Cisco Switch should be able to learn the ARP entry from the ARP requests which sent to gateway.

 

Thanks very much.

 

 

0 Kudos
1 Solution

Accepted Solutions
GeorgeF
Contributor

Thanks very much Amir, yes, when I add the static ARP entry, the issue was resolved, which means it is the ARP related issue. 

Finally we upgraded the Wireless Controller firmware version,and solved the issue.

Not the firewall's fault but it is a good chance to learn more about the L2 mechanism on GAIA platform.

 

View solution in original post

19 Replies
AkosBakos
Advisor
Advisor

Hi GeorgeF,

It seems you should open a TAC case. Before you open it, install the latest Jumbo take. This will save an iteration round between the TAC and you.

BR

Akos

I have searched for the ARP string among the resolved issues. Maybe one of them can help, but they don't fit to this issue according to the description

----------------
\m/_(>_<)_\m/
0 Kudos
the_rock
Legend
Legend

Good point about jumbo hotfix, I agree 🙂

Andy

0 Kudos
AmirArama
Employee
Employee

Hi,

so let's put things in order, 
i assume this packet capture was taken on the GW, please let me know if it's not the case.

1. it's possible that passive arp learning is not enabled on GAIA.
2. when working with cluster HA, the same MAC address of the active GW is used for both the physical IP and the VIP.
2.a - when the member is reaching out to get arp, it uses it's own physical IP as src IP, and it's MAC address.
2.b - i don't see why there should be mac spoofing, as this is how cluster is operates, as the mac shared between physical and vip ip addresses. and packets can be sent sometimes from vip and sometimes from physical IP with the same MAC.

from what you describe, it sounds like the issue is that BC arp request sent from the GW is not reaching the host (did you verify that with wireshark on the host?), and because of that the GW is not able to learn the arp of this host. so this what needs to be investigated, my guess would be on switch level.

"So if a Frame reached Gaia gateway, but it doesn't have the destination MAC address, then the Frame will be dropped?" - actually if you will look at the ethernet layer in this packet you will see that it has dst broadcast address, which the switch should send via all it's ports on the broadcast domain, and in case you took this packet capture on the GW, than the packet has reached the GW and didn't drop, and as you can see the GW even respond to that back to the HOST (is that reaching the HOST, is the host have the arp of the .1?)
in the ARP layer the dst MAC is empty because the host asking for the request still don't know the mac of it's dst.

Thanks

0 Kudos
GeorgeF
Contributor

Hi,AmirArama

Thank you so much for your detailed reply.

 

so let's put things in order, 
i assume this packet capture was taken on the GW, please let me know if it's not the case.

---   Actually, it is captured on the WLC ( which connected to GW directly and in the same VLAN) , we have captured on the GW as well, and got the same result

 


1. it's possible that passive arp learning is not enabled on GAIA.

--- Could you please let me know to check and turn on the passive arp learning ? Then I might test if any help .


2. when working with cluster HA, the same MAC address of the active GW is used for both the physical IP and the VIP.
2.a - when the member is reaching out to get arp, it uses it's own physical IP as src IP, and it's MAC address.
2.b - i don't see why there should be mac spoofing, as this is how cluster is operates, as the mac shared between physical and vip ip addresses. and packets can be sent sometimes from vip and sometimes from physical IP with the same MAC.

---  Yes, I checked the broadcast frame, it did sent with the active member's IP as source , but the same MAC with the VIP.



from what you describe, it sounds like the issue is that BC arp request sent from the GW is not reaching the host (did you verify that with wireshark on the host?), and because of that the GW is not able to learn the arp of this host. so this what needs to be investigated, my guess would be on switch level.

--- Yes, the BC arp request did reach WLC, but didn't reach the host, the below is the capture from HOST.

       I noticed that there is only one broadcast ARP Announcement packet since the HOST got the DHCP IP, so I just doubt why the GW didn't learn the HOST MAC from it ( not sure if it reached the GW, because only one packet sent, and hard to capture it )

 

HOST packet capture.png


"So if a Frame reached Gaia gateway, but it doesn't have the destination MAC address, then the Frame will be dropped?" - actually if you will look at the ethernet layer in this packet you will see that it has dst broadcast address, which the switch should send via all it's ports on the broadcast domain, and in case you took this packet capture on the GW, than the packet has reached the GW and didn't drop, and as you can see the GW even respond to that back to the HOST (is that reaching the HOST, is the host have the arp of the .1?)
in the ARP layer the dst MAC is empty because the host asking for the request still don't know the mac of it's dst.

-- that is what I suspect. Because I did see that the DNS traffic and Web Authentication traffic from the HOST have already reached the DNS server and Web Authentication Server, but no return/reply traffic reached the HOST, so DNS and Web Authentication failed. I just suspect if the GW dropped the traffic, as the return/reply packets whose destination is the HOST's MAC will be dropped because of no ARP entry for them on the GW. 

How can I check if those return/reply packets dropped or not by the GW?

 

 

 

 

Thanks again and much appreciate for your patient reply.

 

Best regards

George



Thanks

0 Kudos
the_rock
Legend
Legend

Hey George,

Just throwing this out there...for proxy arp, once added, needs policy install. As far as regular arp, if its saying who-has, it definitely tells us that its not able to "discover" the device or host in question. Why? Thats another question. Have you tried doing zdebug to see if firewall is indeed dropping it?

Say if IP is 10.10.10.10, you can run fw ctl zdebug + drop | grep 10.10.10.10

Best,

Andy

0 Kudos
GeorgeF
Contributor

Hi 

Thanks very much for your reply.

I am thinking is it possible to let the GW send ARP request to the WLC directly, as the DHCP is on the WLC, so it should know all the IP-MAC entries for all hosts. 

That is, GW send a unicast to WLC:  "who has 10.217.81.87, please tell 10.217.81.2" , it would be another solution for the issue.

 

Thanks again

Best regards

George

 

 

0 Kudos
the_rock
Legend
Legend

Right, but the issue is that if that request does not know how to get there, it wont work. I suspect it could be routing issue.

Andy

0 Kudos
GeorgeF
Contributor

That shouldn't be a routing issue, as they are in the same VLAN and the some broadcast domain, as my understanding,  all ARP related are layer 2  issue

0 Kudos
the_rock
Legend
Legend

Yes, agree 100%, did not realize it was the same subnet. Thats odd then, as there would be no routing involved. What is .87 IP? Just a wireless client?

0 Kudos
GeorgeF
Contributor

.87 is a wireless HOST 's IP assigned by the DHCP server(DHCP service is on the WLC (wireless controller))

0 Kudos
the_rock
Legend
Legend

Here is my suggestion...IF this issue happens ONLY with that machine, why just not reboot it? 

0 Kudos
GeorgeF
Contributor

Some machines have this issue, some don't have, and already rebooted many times. 

Not very sure whether the WLC dropped the broadcast packets or not, but if the GW can learn about the ARP entry from the ARP request then it will solve the issue.

0 Kudos
the_rock
Legend
Legend

So if thats the case, I would try reboot WLC as next step, if you have not already. Otherwise, please run the fw monitor command @AmirArama indicated.

Best,

Andy

0 Kudos
AmirArama
Employee
Employee

Yes, to see the routing behavior you can run 'fw monitor' 

So for example if your client IP is 10.10.10.10 and web server is 6.6.6.6 run this:
fw monitor -F "10.10.10.10,0,6.6.6.6,0,0" -F "6.6.6.6,0,10.10.10.10,0,0"

What i expect you to see is that 10.10.10.10>6.6.6.6 (assuming not dropped by policy and such) will be with i,I,o,O(maybe not O if you going through hide NAT), and on the return direction you would see only i,I, without any o, because the GW doesn't have a nexthop for this packet as it's missing it's mac address.

well, you can add static arp entry in the OS to be 100% this is the issue.

If i were you, i would configure mirror port on the port connected to this PC, and get the output on another PC with wireshark. and see if the switch actually passing the arp broadcast to this PC from the GW to this port. if it doesn't, you need to figure out why, check the port configuration and stuff. and if it does, that means packet is dropped on the socket of the NIC or something like that, and that's another question. (but unlikely).

I'm not familiar with any official way to change the arp behavior in GAIA so it learns arp passively. but maybe there is..
but anyhow, it would be a workaround, and you still have some issue on your L2 network.

Thanks

0 Kudos
GeorgeF
Contributor

Thanks very much Amir, yes, when I add the static ARP entry, the issue was resolved, which means it is the ARP related issue. 

Finally we upgraded the Wireless Controller firmware version,and solved the issue.

Not the firewall's fault but it is a good chance to learn more about the L2 mechanism on GAIA platform.

 

AmirArama
Employee
Employee

Great

i'm glad to hear everything sorted out, and was as thought.

Thanks 

0 Kudos
the_rock
Legend
Legend

Great job!

Andy

0 Kudos
Chinmaya_Naik
Advisor

Dear Team,

Let's I summarize this post (Correct me in-case some points miss, or I am wrong):

  1. Experiencing ARP-related issues in a network with two Gaia Security Gateways in ClusterXL Active/Standby mode.
  2. The issue involves the gateway not learning ARP entries for clients, leading to potential drops of frames.
  3. Suggests investigating passive ARP learning on GAIA, considering the use of the same MAC address for both physical IP and VIP in cluster HA, and checking if broadcast ARP requests from the gateway reach the host.
  4. Recommends checking for dropped packets using fw ctl zdebug and suggests rebooting the Wireless Controller (WLC).
  5. Provides additional details, confirming the packet capture was taken on the WLC, not the GW directly. Express concern about the GW not learning ARP entries from broadcast packets and suspect that the GW might be dropping return/reply packets.
  6. Proposes the idea of the GW sending ARP requests directly to the WLC, considering the DHCP service is on the WLC.
  7. Also Suggests rebooting the WLC and using fw monitor to analyze routing behavior.
  8. Recommends using fw monitor to observe routing behavior and suggests configuring a mirror port to check if the switch passes ARP broadcasts to the PC.
  9. Also Propose adding a static ARP entry as a workaround.

The ARP-related issue was successfully resolved by adding a static ARP entry and upgrading the firmware of the Wireless Controller.

Hence The problem was not attributed to the firewall.

@Chinmaya_Naik 

 

 

0 Kudos
the_rock
Legend
Legend

I think thats very good summary, seems right to me.

Andy

0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events