- Products
- Learn
- Local User Groups
- Partners
- More
Maestro Masters Series 2026
WATCH NOWIf you run Quantum Maestro in production, you’ve probably seen the pattern: issues that “look like VPN” or “look like policy” often turn out to be Security Group health, a single divergent SGM, a physical/link problem (cable/port/optics), or an unstable uplink. The key to reducing MTTR is discipline: evidence + commands, without skipping layers.
Below is a practical “copy-and-run” runbook, with good vs bad interpretation.
MHO (Orchestrator): controls the Security Group (inventory, health, ports, fabric).
SGMs: run the dataplane (sessions, inspection, VPN, state).
Typical symptom patterns:
Unhealthy SG → everything becomes a symptom (policy/VPN/traffic).
Unhealthy single SGM → intermittent behavior (“sometimes it works”).
Local node context.
Useful for point inspection, but risky for configuration in Maestro because it can introduce drift (one member behaving differently).
Global Security Group context.
Operational rule:
use gclish when the intent is global consistency (uniform validation/collection/adjustment);
use clish only when you need to inspect/act on a specific member in a controlled way.
A recurring field root cause: a change made with clish on a single member → the SGM starts handling traffic differently → intermittent symptoms that are hard to reproduce.
On the MHO:
orch_stat -all
What this proves:
whether all SGMs are present/operational
whether any member is degraded/missing
signals of port/fabric issues
Good: all members OK, stable links, no critical port down.
Bad: missing/degraded member, unstable links → fix the foundation before analyzing VPN/policy.
asg diag verify
What this proves: high-level SG consistency and quick integrity checks.
Bad: critical alerts → return to orch_stat -all and isolate the failing member/port.
asg perf -v
What this proves: whether the SG has enough headroom (CPU/memory) to absorb load during isolation/actions.
Bad: SG near its limits → avoid disruptive actions.
hcp -r all
Note: commonly used in playbooks to recover internal state/handshakes, but it should not be the first “blind” step.
When you see intermittency, “traffic disappears,” or only some users/flows fail, first prove whether there is physical/L1–L2 instability.
orch_stat -p
or
cat /etc/maestro.json
Use this to confirm interface/port mapping in the Maestro context.
g_all netstat -ni
What to look for: increasing RX-ERR/TX-ERR/drops.
If these counters climb, they often explain VPN flapping, broken sessions, and “policy is OK but traffic fails.”
ethtool -S <interfacename>
Good: no CRC/errors increasing.
Bad: CRC/symbol errors → treat as L1/L2 (cable/optics/port/switch) before focusing on VPN.
asg_ifconfig | grep carrier | grep -v "carrier: 0"
Bad: carrier oscillation → intermittent behavior is highly likely.
g_all cpstat -f sensors os
What this proves: thermal/power/fan conditions can lead to instability and erratic behavior.
show maestro port <port>
Confirms the port’s state/configuration in the Maestro domain.
This step quickly separates “problem before the gateway” from “problem inside the gateway.”
Example (intentionally generic IPs):
asg search -v 10.10.40.25 \* 203.0.113.50 443 tcp
Interpretation:
No output: traffic likely is not reaching the SG (or it’s taking a different path). Return to L1/L2/L3 and capture at the correct point.
Output present: traffic exists in the dataplane; you now have a basis to correlate with NAT, routing, policy, and VPN.
If the connection does not “exist” for the SG, changing policy/VPN is usually wasted effort.
Typical symptom: intermittent failures, “some flows drop,” “works after some time.”
On the suspected SGM:
clusterXL_admin down
clusterXL_admin up
Risk: medium (sessions anchored to that member can be impacted).
Pre-condition: confirm headroom with asg perf -v.
cphaprob list
tail $FWDIR/log/blade_config
What to look for:
cphaprob list: HA/cluster participation/state signals and inconsistencies
blade_config: alerts and errors that indicate configuration drift
Maestro troubleshooting requires discipline: start with SG health, then prove traffic exists, then validate physical stability, and only then go deeper. If you follow this sequence with objective commands, “phantom incidents” drop sharply—and troubleshooting becomes engineering, not guesswork.
very good, very useful
thk's Israel
Throughout my career I’ve learned to start with the fundamentals, because 90% of problems are solved there. In Maestro’s case, it’s no different — most of the issues I’ve resolved happened because the analyst didn’t know how to differentiate between clish and gclish, which ended up causing misconfigurations.
Awesome work. Btw, I could not agree more with what you said. I cant even count how many times I been on calls with people and it usually turned out to be something so simple at the end that solved the issue.
I’ve already seen troubleshooting cases that lasted for days turn out to be just a simple VLAN issue. Usually, people miss the basics, focus on the more complex aspects, and forget to check the fundamentals.
Good and useful guideline in general. I just would like to point out, that few of your commands (asg diag, asg perf, asg search) do not exist anymore in R82 and have been moved to insights or cluster-cli.
Check the release notes for more changes regarding Maestro:
Thank you very much for the tip. I still haven’t had the opportunity to work with R82 on Maestro, so I’ll take a look. It will be good for me to understand the differences between them and, perhaps, even update the title of this topic to R81.20, since it may indeed be obsolete in R82.
very good content with practical examples.
This guide seems to conflate the MHOs and the SMO. The SMO is not an orchestrator, it's an SGM.
I don't think there's a file called /etc/maestro.json. For a port inventory at the MHO you would use orch_stat -p, or the MHO WebUI in R82+.
That 'last resort' of just deleting the security group with no follow up is terrible advice. What's going on there, you're just going to remove the group entirely and give up? Please review this and make sure you're not suggesting steps that will cause massive problems. There are many other things that can be attempted in a troubleshooting context before going nuclear here.
@emmap /etc/maestro.json file is mentioned in sk164712.
About removing a security group, I agree that would be a very bad move in a production environment.
My understanding is, @WiliRGasparetto is writing this based on his lab trials.
Yep, you're right that is a file, my mistake.
I’m going to remove that step, and I’ll look for better approaches. I included it only as a last-resort option when there was truly no solution and in coordination with Check Point TAC, but presenting it as a standard solution was a bad idea. Thank you very much for the feedback.
I also added the command `orch_stat -p` as the first option and then the verification with `cat /etc/maestro.json`. I found your point very helpful, @emmap .
Nice
thk's
Leaderboard
Epsum factorial non deposit quid pro quo hic escorol.
| User | Count |
|---|---|
| 3 | |
| 3 | |
| 2 | |
| 2 | |
| 1 | |
| 1 | |
| 1 | |
| 1 | |
| 1 | |
| 1 |
About CheckMates
Learn Check Point
Advanced Learning
YOU DESERVE THE BEST SECURITY