Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Matlu
Advisor

No Internet after GW reboot

Hello everyone.

We had a work window to update the Hotfixes of all our machines that are on version R81.10.

During the window there was an error, the client was without internet on site.

Their architecture is something like this:

LAN -> ClusterINT -> ClusterEXT -> Internet.

When they started updating ClusterEXT and revisions, restarted the machines, the client simply ran out of internet.

Upon further review, we realized that the reboot, for some reason, deleted an entry in the arp table (local.arp).

The Cluster that handles the publics is just ClusterEXT, but it only removed the arp table entry for the IP that gave the Client Internet access.

Does anyone know if this is normal'? Or why this kind of thing can happen?

Greetings.

0 Kudos
8 Replies
the_rock
Legend
Legend

Bro, though I spent enough time in South America to understand Spanish, just keep it consistent, maybe better write this in English. Btw, no its NOT normal this would happen. What is the current cluster state?

Andy

0 Kudos
Matlu
Advisor

Sorry,

I didn't realize it was in Spanish, LOL. 😄
Actually, ClusterXL is working fine.
The problem occurred when we updated the Hotfix on both members. Simply the INTERNET went down.

After checking for a while, we realized that the ARP table of both Cluster members was missing "one line" which was the client's IP address.

Something totally rare and atypical.

I have no idea why the Internet could have been down, and why that line was deleted from the arp tables of both members.

0 Kudos
the_rock
Legend
Legend

Its fine, I understood all you wrote. Well, I cant say why it happened, maybe check messages files for any indication. We had weird thing with default route for customer and it turned out to be ISP redundancy script related. Not certain the cause here, but upgrade is not supposed to wipe out any settings.

0 Kudos
Matlu
Advisor

It is rare.

First we update the passive members of each Cluster.

LAN -> ClusterINT -> ClusterEXT -> Internet.

When we had already updated the Hotfix in the Standby members of each Cluster, what we did is to switch first in the ClusterEXT, until there, everything worked fine.

Once we switched to the ClusterINT order, this is where we simply lost the Internet connection, and everything went down 😞

Too weird.

When we started to check, we realized that the ClusterEXT, for some reason, had deleted from its ARP TABLE, the Public IP that gives access to the Internet to the client.

I'm waiting for an update from TAC, but so far, they can't find anything 😞

0 Kudos
PhoneBoy
Admin
Admin

Just to clarify: an entry was removed from the local.arp file?
Did you check this file before and after to confirm?
Because I’ve never heard of anything like that happening.

0 Kudos
Matlu
Advisor

Hello,

Exactly. Only one line of the arp table was deleted.

Unfortunately, the table was not checked before the Hotfix update.

The deleted line is the Cluster VIP. This IP is exactly the one that gives Internet access to the client.

For some reason, it was deleted, and we don't know "why".

0 Kudos
the_rock
Legend
Legend

You may need TAC case for root cause.

Andy

0 Kudos
PhoneBoy
Admin
Admin

Was it the actual $FWDIR/conf/local.arp file or just the arp table?
Either way, this is probably going to require TAC to get to root cause here.

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events