Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
HansKazan
Contributor
Jump to solution

Maestro BGP - Route Sync Question/Problem

Hello CheckMates!

 

Thank you once again for taking the time to entertain my question. I am currently experiencing problems having the other SGMs learn the routes from the SMO-Master and would like to verify if this is normal behavior or not.

When showing the output of netstat -rn, I can verify that the SMO-Master has received all BGP routes. However, the SGMs are only showing their directly connected and static routes, is this normal behavior? When using traceroute to test, we can see that every connection handled by SGMs other than the SMO-Master will be thrown towards the default-route, where as the ones handled by the SMO-Master will be handled according to the routing table learned via BGP.

I have rebooted the SGMs and taken them out of the SG and back in with the issue not resolving. The GAIA configuration includes ip-reachability and local-address incase this is an obscure issue. The SD-WAN connection is using md5 authentication.

example:
set bgp external remote-as 65300 peer 10.38.243.73 on
set bgp external remote-as 65300 peer 10.38.243.73 local-address 10.38.243.74 on
set bgp external remote-as 65300 peer 10.38.243.73 ip-reachability-detection on
set bgp external remote-as 65300 peer 10.38.243.81 on
set bgp external remote-as 65300 peer 10.38.243.81 local-address 10.38.243.82 on
set bgp external remote-as 65300 peer 10.38.243.81 ip-reachability-detection on

I hope this is sufficient information to send me in the right direction before opening a support ticket.

Thank you for your input!

 

0 Kudos
1 Solution

Accepted Solutions
HansKazan
Contributor

Thank you for all your help. I have resolved the problem. Another administrator had turned off SMO auto-cloning. After enabling auto-cloning, all SGMs work perfectly fine upon joining the cluster.

View solution in original post

0 Kudos
10 Replies
emmap
Employee
Employee

Any routes learned via BGP should be sent to all the other SGMs, so this isn't normal behaviour.

Saying that, when I look at the routing admin guide, it states that: If running BGP in a cluster you must not configure the local address. So that might be breaking something.

 

0 Kudos
HansKazan
Contributor

Thank you for your response, unfortunately this did not resolve the issue.

Perhaps the following output is of any use?

[Global] x-FW-IT-ch01-02> fw stat
Fetching data from blades...
formatting data...

HOST    POLICY          DATE
1_01    x_TEST_Policy        6May2024 12:11:40  :  [>Sync] [<Sync] [>bond30.906] [<bond30.906] [>bond30.903] [<bond30.903] [>bond10.901] [<bond10.901] [>bond30.905] [<bond30.905] [>bond40.940] [<bond40.940] [>bond30.902] [<bond30.902] [>bond30.904] [<bond30.904]
1_02    x_TEST_Policy        6May2024 12:11:27  :  [>Sync] [<Sync] [>bond10.901] [<bond10.901] [>bond30.905] [>bond40.940]


I lack the understanding to know what this means.

Thank you!

 

EDIT: I have just noticed that the racking team has slot MHO-1 ports in BP 2-4 and MHO-2 ports in BP 1-3 with MHO-1 having been staged as the primary orchestrator. Everything else appears operational, could it be that its trying to sync its routes as the active SMO to the SGM to BP2-4, which currently is itself?

 

EDIT2: Other appliance links are unstable and frequently being physically disconnected that are bonded to both MHOs, could this disturb the route syncing process?

0 Kudos
emmap
Employee
Employee

MHO 1 must be plugged into ports 1 and 3 on the line card and MHO into ports 2 and 4. This being wrong certainly won't be helping things so I'd suggest fixing that as a priority before spending too much time on this. 

Uplinks going up and down shouldn't affect the route sync'ing but if an interface is down its associated routes may not be reflected in the kernel routing table. 

0 Kudos
HansKazan
Contributor

Everything is cabled properly now, the entire network works. However, when I add another SGM to the cluster it is not receiving the learned BGP routes from the SMO Master. 

I have tried to restart BGP, this did not resolve the issue. I have let it be part of the cluster in active state for over 20 minutes before removing it due to network impact.

This is the configuration output with some redactions:

> show configuration bgp
set bgp graceful-restart restart-time 360
set bgp graceful-restart selection-deferral-time 360
set bgp external remote-as 65509 on
set bgp external remote-as 65509 peer 10.38.243.65 on
set bgp external remote-as 65509 peer 10.38.243.65 authtype md5 secret already_scrambled_
set bgp external remote-as 65509 peer 10.38.243.65

This is Check Point CPinfo Build 914000239 for GAIA
[IDA]
No hotfixes..
[CPFC]
No hotfixes..
[FW1]
HOTFIX_GOT_TPCONF_AUTOUPDATE
HOTFIX_R81_20_JUMBO_HF_MAIN Take: 53
HOTFIX_R80_40_MAAS_TUNNEL_AUTOUPDATE

FW1 build number:
This is Check Point's software version R81.20 - Build 024
kernel: R81.20 - Build 032
[SecurePlatform]
HOTFIX_R81_20_JUMBO_HF_MAIN Take: 53
HOTFIX_ENDER_V17_AUTOUPDATE
[SMO]
HOTFIX_R81_20_JUMBO_HF_MAIN Take: 53
[CPinfo]
No hotfixes..
[PPACK]
HOTFIX_R81_20_JUMBO_HF_MAIN Take: 53
[AutoUpdater]
No hotfixes..
[DIAG]
No hotfixes..
[CVPN]
HOTFIX_R81_20_JUMBO_HF_MAIN Take: 53
HOTFIX_ESOD_CSHELL_AUTOUPDATE
[cpsdc_wrapper]
HOTFIX_CPSDC_AUTOUPDATE
[CPUpdates]
BUNDLE_CPOTELCOL_AUTOUPDATE Take: 77
BUNDLE_INFRA_AUTOUPDATE Take: 65
BUNDLE_R81_20_JUMBO_HF_MAIN Take: 53
BUNDLE_CPSDC_AUTOUPDATE Take: 34
BUNDLE_DEP_INSTALLER_AUTOUPDATE Take: 27
BUNDLE_HCP_AUTOUPDATE Take: 70
BUNDLE_CORE_FILE_UPLOADER_AUTOUPDATE Take: 21
BUNDLE_R80_40_MAAS_TUNNEL_AUTOUPDATE Take: 60
BUNDLE_ENDER_V17_AUTOUPDATE Take: 26
BUNDLE_GENERAL_AUTOUPDATE Take: 18
BUNDLE_ESOD_CSHELL_AUTOUPDATE Take: 20
BUNDLE_GOT_TPCONF_AUTOUPDATE Take: 128
BUNDLE_CPVIEWEXPORTER_AUTOUPDATE Take: 34
[CPotelcol]
HOTFIX_OTLP_GA
[CPviewExporter]
HOTFIX_OTLP_GA
[hcp_wrapper]
HOTFIX_HCP_AUTOUPDATE
[core_uploader]
HOTFIX_CHARON_HF
[CPDepInst]
No hotfixes..

0 Kudos
emmap
Employee
Employee

What's your route redistribution configuration? Do you have a local AS and router-id configured?

0 Kudos
HansKazan
Contributor
 

No Router-ID has been configured and is using the default management IP assigned by the SG. See BGP redistribution below.

bgp-redistro.jpg

bgp-settings.jpg

 

Edit, this appears to be an issue even after installing the policy from the SmartConsole.

+------------------------------------------------------------------------------+
|Summary |
+------------------------------------------------------------------------------+
|Policy Verification completed with the following errors: |
|1. [1_01:0]: Policy signature doesn't match on all SGMs |
|2. [1_02:0]: Policy signature doesn't match on all SGMs |
| |
+------------------------------------------------------------------------------+

 

[Expert@ch01-01:0]# asg policy verify -a

+----------------------------------------------------------------------+
|Policy Verification |
+-------+-------------------+---------------+-----------------+--------+
|SGM |Policy Name |Policy Date |Policy Signature |Status |
+-------+-------------------+---------------+-----------------+--------+
|1_01 |TEST_Policy |29May24 11:26 |26176e256 |Failed |
|1_02 |TEST_Policy |29May24 11:26 |a561d4fa0 |Failed |
+-------+-------------------+---------------+-----------------+--------+

 

I have tried installing the policy without policy acceleration to no success.

 

0 Kudos
emmap
Employee
Employee

Seems like this is a sync issue more than anything else - maybe have a look through the $FWDIR/log/blade_config on both SGMs and see if you can see anything useful. It's a big file, so look for timestamps around when doing policy installs.

0 Kudos
HansKazan
Contributor

Thank you for all your help. I have resolved the problem. Another administrator had turned off SMO auto-cloning. After enabling auto-cloning, all SGMs work perfectly fine upon joining the cluster.

0 Kudos
emmap
Employee
Employee

It should work with auto-clone disabled, but it may be that with auto-clone on the second SGM also grabbed patches / something else that resolved the issue. Typically we don't recommend leaving auto-clone enabled once SGMs are introduced into the group and working, for the record.

HansKazan
Contributor

Thank you for your follow-up recommendation!

0 Kudos