r80.10 ospf issues

Chris_Phillips · ‎2022-07-06

I'm seeing some issues with OSPF, vSEC R80.10

we see regular OSPF convergence on the vSEC cluster which dumps routes for a short time and is causing some instability.

In the logs i see lots of connections dropped origin being the secondary cluster member & due to antispoofing, the dropped traffic is from vip addresses to the associated ospf neighbor for service ospf.

we have an antispoofing group configured on the interfaces that includes the ip ranges of both the src & dst ospf neighbors & antispoofing action is set to detect & log so i'm not sure why the secondary checkpoint is dropping.

show ospf neighbors shows errors incrementing for 1 neighbor.

anyone seen anything like this with checkpoint and ospf?

when i look at show ospf neighbors on both checkpoints i see they have the exact same details, including interface details, I would have thought each cluster member would use its own IP to form neighbour relationships but understand why they use the vip as it ensures the neighbors send to the active ip which leads me to a few q's:

is the standby /secondary cp dropping ospf from the neighbour intentionally but shows as antispoof in the logs?
does the active cp sync the ospf database to the secondary so the secondary never needs to actively form neighbor relationships?
are the drops in the logs normal behaviour for the secondary?

i appreciate we need to upgrade but you already know why that hasn't happened.

Thanks

Chris_Atkinson · ‎2022-07-06

sk95968 discusses the OSPF sync in a cluster.

Only the ACTIVE member should be actively participating in OSPF.

Does your policy allow for IGMP traffic & the OSPF multicast addresses (see sk39960)?

Are the OSPF Router IDs configured the same on both members of the cluster?

CCSM R77/R80/ELITE

Chris_Phillips · ‎2022-07-07

thanks @Chris_Atkinson

that sk95968 answers a number of my questions,

it is a clusterxl cluster & the doc exolains that the ospf db is synced from the master which is fine,
the ospf router ID's are identical on both cluster members,

our ospf rule did not permit igmp or 224.0.0.1, i have added a new rule beneath the old rule permitting both those with the neighbors but do not see any hits or logs on it.

i did see drops from the gateways to 224.0.0.22, but the fw accepted teh same from the neighbors under implied rules.

Chris_Atkinson · ‎2022-07-07

@Chris_Phillips Which Jumbo is installed on this Cluster, higher than T288?

Do you see FIBMGR (TCP/2010) between the gateways being accepted?

What's the interface/network config look like on the other side is it Nexus with VPC or something else?

(Note P2P network type is not supported with OSPF per sk116500)

CCSM R77/R80/ELITE

Chris_Phillips · ‎2022-07-11

@Chris_Atkinson

its unloved and running t112.

i see traffic accepted for tcp2010 but it its not directed at the ospf gateways in question but others

the other side is nsx-t

i'm wondering if rfc 1583 compatibility (on) could be an issue here?

Chris_Atkinson · ‎2022-07-11

You could test that if you wish, but I would try and update the jumbo.

Is Graceful restart used/configured on either side currently?

CCSM R77/R80/ELITE

Chris_Phillips · ‎2022-07-22

@Chris_Atkinson
just to update this in case i come across this in the future!!

so we upped the ospf timers from 1 / 3 to 10 /40 and things seemed to stabilise but overnight fell over again.

My colleague Nathan looked to download a hotfix on 1 gateway which failed as the directory was full.
/var/log/auth was 25+G so we deleted on both gateways, created a new auth file with correct permissions then restarted syslogd (lsopf showed /var/log/auth was used by it). checked /var/log/auth to check new logs where entered.

so far so good,

from ~20 ospf events per hour so far its been 0 over the last 5 hours.

Last day on the account so hopefully it'll be ok going forward.

Of course the real fix should be an update but that scares everyone.

Chris_Atkinson · ‎2022-07-22

Can you clarify what hotfix was installed or none in the end only clearing space in addition to the timer change?

CCSM R77/R80/ELITE

Are you a member of CheckMates?

r80.10 ospf issues