Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
gemechis
Contributor

SGM in DOWN state

Hello,

We are facing an issue with one of our Maestro Security Gateway Members (SGM) that is unable to properly join the Security Group and remains in DOWN state.

Issue Summary:
The affected SGM fails during cluster initialization and is unable to retrieve the cluster state information from the Maestro Security Group. As a result, the gateway cannot fetch the Security Policy and remains unavailable.

Observed Errors:

Fetching Security Policy from localhost failed Error: Failed to retrieve cluster state Waiting for cluster to start...

Additional logs observed:

Failed to initialize dxl configuration SecureXL disabled, cannot use affinity commands

We also observed continuous internal communication drops on UDP port 8116:

Packet proto=17 0.1.0.x:8116 -> 192.0.2.0:8116 dropped Reason: Rulebase drop - DEFAULT POLICY

Analysis:

  • Internal Maestro/cluster communication traffic appears to be dropped by the firewall policy.
  • The SGM is unable to complete cluster initialization.
  • Cluster state retrieval fails repeatedly.
  • Security Policy installation cannot complete successfully.

Current Behavior:

  • Gateway remains in DOWN state
  • SGM continuously waits for cluster startup
  • Policy fetch from localhost fails
  • Internal communication traffic on port 8116 is being dropped

Background Information:

  • The device was recently reformatted/reinitialized.
  • Jumbo Hotfix installation was performed afterward.
  • The issue occurs during or after the SGM joining process.

Initial Troubleshooting Performed:

  • Verified interface connectivity
  • Checked orchestrator connections and ports
  • Attempted policy installation multiple times
  • Confirmed other SGMs in the Security Group are operational

We would appreciate your assistance in identifying the root cause and providing recovery recommendations.

0 Kudos
6 Replies
Gennady
Contributor

Good day!

On the problematic SGM try to delete $CPDIR/conf/cp.license file and reboot.
You will need to catch a moment and login from MHO to the SGM when "waiting for cluster to start" bad-message appears. Good message is "waiting for cluster to stabilize"

This happens almost every time for me in a Lab.

0 Kudos
gemechis
Contributor

@Gennady , but have seen the attached images? More of it shows the SGM has default policy dropping the SYNC from the Active SGM. 

0 Kudos
Gennady
Contributor

As far as I understand, the main problem is that the new SGM have a glitch of some sorts when it tries to merge cp.license and cp.license.smo files. This results in no-proper-license which prevents to load Initial policy which would allow CCP communication. Instead, Default Filter is loaded which drops everything. For the same reason ClusterXL doesn't start and SecureXL doesn't start.

Try to delete the cp.license file. This is harmless because it will be re-created later on after next boot. If this doesn't help, then you will need to create a TAC case.

You can also check /var/log/merge_license_file.log to double check for errors.
Specifically for the situation mentioned in this SK
sk183985 - Security Group Member in a Dual Site Maestro deployment remains down after a reboot

0 Kudos
emmap
MVP Gold CHKP MVP Gold CHKP
MVP Gold CHKP

Definitely check for a license as Gennady says, that's a pretty common issue for new SGMs not joining.

Generally you shouldn't install a JHF before joining the SGM to the group, let it autoclone it on there. It's not the QA'd method and thus isn't the supported way.  

When you say that it's in the DOWN state - this sounds like it's joined the group and then can throw a pnote? What pnote is it throwing to be down? 

0 Kudos
gemechis
Contributor

Below is the scrrenshot for cphaprob state and cphaprob list

0 Kudos
emmap
MVP Gold CHKP MVP Gold CHKP
MVP Gold CHKP

After you've checked the license, have a look through the $FWDIR/log/blade_config log file for sync issues.

0 Kudos