Maestro BGP - Route Sync Question/Problem

HansKazan

Hello CheckMates!

Thank you once again for taking the time to entertain my question. I am currently experiencing problems having the other SGMs learn the routes from the SMO-Master and would like to verify if this is normal behavior or not.

When showing the output of netstat -rn, I can verify that the SMO-Master has received all BGP routes. However, the SGMs are only showing their directly connected and static routes, is this normal behavior? When using traceroute to test, we can see that every connection handled by SGMs other than the SMO-Master will be thrown towards the default-route, where as the ones handled by the SMO-Master will be handled according to the routing table learned via BGP.

I have rebooted the SGMs and taken them out of the SG and back in with the issue not resolving. The GAIA configuration includes ip-reachability and local-address incase this is an obscure issue. The SD-WAN connection is using md5 authentication.

example:
set bgp external remote-as 65300 peer 10.38.243.73 on
set bgp external remote-as 65300 peer 10.38.243.73 local-address 10.38.243.74 on
set bgp external remote-as 65300 peer 10.38.243.73 ip-reachability-detection on
set bgp external remote-as 65300 peer 10.38.243.81 on
set bgp external remote-as 65300 peer 10.38.243.81 local-address 10.38.243.82 on
set bgp external remote-as 65300 peer 10.38.243.81 ip-reachability-detection on

I hope this is sufficient information to send me in the right direction before opening a support ticket.

Thank you for your input!

emmap

Any routes learned via BGP should be sent to all the other SGMs, so this isn't normal behaviour.

Saying that, when I look at the routing admin guide, it states that: If running BGP in a cluster you must not configure the local address. So that might be breaking something.

HansKazan

Thank you for your response, unfortunately this did not resolve the issue.

Perhaps the following output is of any use?

[Global] x-FW-IT-ch01-02> fw stat
Fetching data from blades...
formatting data...

HOST    POLICY          DATE
1_01    x_TEST_Policy        6May2024 12:11:40 : [>Sync] [<Sync] [>bond30.906] [<bond30.906] [>bond30.903] [<bond30.903] [>bond10.901] [<bond10.901] [>bond30.905] [<bond30.905] [>bond40.940] [<bond40.940] [>bond30.902] [<bond30.902] [>bond30.904] [<bond30.904]
1_02    x_TEST_Policy        6May2024 12:11:27 : [>Sync] [<Sync] [>bond10.901] [<bond10.901] [>bond30.905] [>bond40.940]

I lack the understanding to know what this means.

Thank you!

EDIT: I have just noticed that the racking team has slot MHO-1 ports in BP 2-4 and MHO-2 ports in BP 1-3 with MHO-1 having been staged as the primary orchestrator. Everything else appears operational, could it be that its trying to sync its routes as the active SMO to the SGM to BP2-4, which currently is itself?

EDIT2: Other appliance links are unstable and frequently being physically disconnected that are bonded to both MHOs, could this disturb the route syncing process?

emmap

MHO 1 must be plugged into ports 1 and 3 on the line card and MHO into ports 2 and 4. This being wrong certainly won't be helping things so I'd suggest fixing that as a priority before spending too much time on this.

Uplinks going up and down shouldn't affect the route sync'ing but if an interface is down its associated routes may not be reflected in the kernel routing table.