Hi,
I am currently looking into an issue with a few VS's running OSPF to a Cisco core router. There are 6' virtual systems, all configured pretty much the same (100% same when it comes to ospf). One of the systems is having issues with OSPF flapping.
We think that it might could be related to load, but the thing is that the load really is not that big. CPU and MEM, along sessions and general overview of bandwidth - shows us that there is absolutley no issues with any of these. Everything runs smooth, except the one OSPF process on the one VS.
The problem "comes and goes" troughout the day, causing downtime due to lost routing when it occurs.
The Cisco OPSF logs tells me that:
XX XX XX:28:24.839066 ospf 200 [30470]: (vrf-XXX) : Nbr XX.XX.XX.XX: INIT --> TWOWAY, event TWOWAYRCVD
XX XX XX:28:24.838981 ospf 200 [30470]: [30525]: Walking neighbor XX.XX.XX.XX (0xd5202914), state TWOWAY
XX XX XX:28:24.838913 ospf 200 [30470]: [30525]: Compare done, new current bdr XX.XX.XX.XX
XX XX XX:28:24.838901 ospf 200 [30470]: [30525]: Walking neighbor XX.XX.XX.XX (0xd5202914), state TWOWAY
XX XX XX:28:24.838640 ospf 200 [30470]: (vrf-XXX) : Nbr XX.XX.XX.XX: DOWN --> INIT, event HELLORCVD
XX XX XX:28:16.423676 ospf 200 [30470]: [30525]: Walking neighbor XX.XX.XX.XX (0xd5202914), state DOWN
XX XX XX:28:16.423607 ospf 200 [30470]: [30525]: Walking neighbor XX.XX.XX.XX (0xd5202914), state DOWN
XX XX XX:25:44.839356 ospf 200 [30470]: (vrf-XXX) : Nbr XX.XX.XX.XX: INIT --> TWOWAY, event TWOWAYRCVD
XX XX XX:25:44.839271 ospf 200 [30470]: [30525]: Walking neighbor XX.XX.XX.XX (0xd5202914), state TWOWAY
XX XX XX:25:44.839202 ospf 200 [30470]: [30525]: Compare done, new current bdr XX.XX.XX.XX
XX XX XX:25:44.839191 ospf 200 [30470]: [30525]: Walking neighbor XX.XX.XX.XX (0xd5202914), state TWOWAY
XX XX XX:25:44.838935 ospf 200 [30470]: (vrf-XXX) : Nbr XX.XX.XX.XX: DOWN --> INIT, event HELLORCVD
XX XX XX:25:06.597013 ospf 200 [30470]: [30525]: Walking neighbor XX.XX.XX.XX (0xd5202914), state DOWN
XX XX XX:25:06.596939 ospf 200 [30470]: [30525]: Walking neighbor XX.XX.XX.XX (0xd5202914), state DOWN
(xx.xx.xx.xx - being the ip of the firewall.)
I have cheked dead timer/ hello intervall, router ID, etc etc,, all the settings have been double checked and look good! They also match the settings on the other VS's.
Do note that the desired state, as it is for the oter VS's, is to be TWOWAY/DROTHER.
The routed.d logs tells me pretty much only that a "2WAY" was recived.... cant find any errors or troublesome logs at all.
I have set the priority for OSPF on the VS now to "0" just so to bring it out of the whole BDR/DR discussion. But as we are not having any issues as of now, I have no idea if that will help as a workaround.
Do anyone have any tips, besides the sk describing ospf debugging on vsx, ? Anyone with experience on ospf issues like this on VSX on top of Maestro...... I do feel there is a Maestro bug lurinkg in the background here, but thats just a feeling. (this is r80.20)
Any input is appreciated. 🙂