Re: OSPF failuire after cluster upgrade.

Chris_Thuys · ‎2019-04-01

I recently tried to upgrade both the hardware for a cluster of 5200 gateways running r77.30. This was done by purchasing a pair new 5600 devices installing R80.20 and doing a basic first time wizard config and then cut and pasting the results of show config from the production cluster to the new cluster (new management ip addresses and host names were used) . These clusters have only static routes plus the use OSPF to receive routes from another gateway they do not redistribute any of their own routes.

The same interfaces were used on both clusters. The devices were added as a new cluster to the smart console and assigned the existing production policy. The policy was then pushed to the new firewalls ( same policy as production) All looked well cluster status was ok OSPF could not be checked as they were not connected into the prod network only the management network was connected.

On the night of the change the patch cables were transferred from the old 5200 cluster firewalls to the new 5600 cluster firewalls. Every thing looked fine except the standby node was in the down state with a routed pnote.

OSPF only showed the active node participating in OSPF. After much time with TAC we discovered we needed to add a rule to allow the cluster nodes to connect on port 2010. Unfortunately this was discovered avter we backed out the change due to the end of the change window.

Testing in a more comprehensive test environment post the change shows that if done in the right way the connection between the nodes on port 2010 uses and implied rule. and does not need a explicit rule to allow the connection.

To get the implied rule to work I had to first install and configure my new cluster with r77.30 get it managed an operational in Smart console, ie policy pushed and SIC set. Then I do a connectivity upgrade as per the R80.20 admin guide (fresh install on standby, configure ,set sic,install policy ,cphacu start, failover from r77.30 node to r80.20 node,then upgrade r77.30 node using same fresh install etc)

Does any one have any idea what might have gone wrong to require a explicit rule to be used in the first scenario?

PhoneBoy · ‎2019-04-01

Sometimes there are changes in implicit rules (or bugs in them) between versions/JHF or in different circumstances.
Logs from that timeframe might provide a clue.

Vladimir · ‎2019-04-01

Had similar experience with VSX VS routed pnote post-upgrade (with OSPF). I believe rebooting the standby VS after the policy was pushed to the cluster, cleared the issue.

Chris_Thuys · ‎2019-04-01

Rebooted, tried cpstop, cpstart, cphastop,cphastart nothing fixes it.

Strangely enough. When you list the implicit rules there are no rules related to OSPF or service FIBMGR or port 2010 even in the working scenario.

Are you a member of CheckMates?

OSPF failuire after cluster upgrade.