Hello,
I am running R81.20 + JumboHFA Take 99. I use the Skyline packages:
BUNDLE_CPVIEWEXPORTER_AUTOUPDATE Take: 67
BUNDLE_CPOTLPAGENT_AUTOUPDATE Take: 92
BUNDLE_CPOTELCOL_AUTOUPDATE Take: 179
I created my script based on this not so complete and correct documentation:
https://sc1.checkpoint.com/documents/Appliances/Skyline/Content/Topics-AG/Custom-Metrics.htm#:~:text....
Idea:
there are some metrics which show the cluster_xl status. But these metrics only show the status of the SGM within a Maestro environment. As all SGMs are in general "ACTIVE" there is no indicator/metric which shows which chassis is the ACTIVE chassis or if the chassis is ACTIVE at all or down or chassis admin down, standby etc.
So I tried with a simple bash script to collect these information from "asg stat -v" and put this into otlp metrics. here is my script, my commands and a documentation which helps me to understand what to do.
### create the script which collects the information we need:
vi /config/skyline_custom_metrics/chassis_state.sh
____________________________________________________________________________________________________________________________________________________
#!/bin/bash
### Include Checkpoint environment variables
source /opt/CPshrd-R81.20/tmp/.CPprofile.sh
. /opt/CPotlpAgent/cs_data_handler_is.bash
## script part to check the values
# status chassis 1
chassis_1=$(asg stat -v | grep "SGM ID" -A1 | grep -v "SGM ID" | awk -F ' ' '{print $2}')
# status chassis2
chassis_2=$(asg stat -v | grep "SGM ID" -A1 | grep -v "SGM ID" | awk -F ' ' '{print $3}')
# reset variable "metric_value"
metric_value=0
# check if both chassis are ACTIVE (split brain) and set metric_value
if [[ "$chassis_1" == *ACTIVE* && "$chassis_2" == *ACTIVE* ]]; then
metric_value=3
# check chassis_1
elif [[ "$chassis_1" == *ACTIVE* ]]; then
metric_value=1
# check chassis_2
elif [[ "$chassis_2" == *ACTIVE* ]]; then
metric_value=2
# check if no chassis is active set 0
else
metric_value=0
fi
## Building the metric otlp with its labels and values
# value of the metric itself. if not gauge or counter set to 1 or something else
set_ot_object new value ${metric_value}
# define a label and its value. the value is result from the previous script and was saved in a variable and used here.
# we add "host_name" as this is needed for identification
# set_ot_object last label host_name ${host_name}
# define a label and its value. the value is result from the previous script and was saved in a variable and used here
set_ot_object last label chassis_1 ${chassis_1}
# define a label and its value. the value is result from the previous script and was saved in a variable and used here
set_ot_object last label chassis_2 ${chassis_2}
### mandatory to quit the script
script_exit "Finished running" 0
____________________________________________________________________________________________________________________________________________________
### create the json file which we need for this script
/config/skyline_custom_metrics/chassis_state.json
### documentation is missing the "secured" parameter but sklnctl complains if this is missing
### what the meaning of "secured" is not documented but sklnctl command complains if not added
{
"state" : "enabled",
"command" : "/config/skyline_custom_metrics/chassis_state.sh",
"desc" : "Chassis status in Maestro",
"name" : "chassis.state",
"type" : "Gauge",
"unit" : "{bool}",
"interval" : 15,
"secured" : "false"
}
____________________________________________________________________________________________________________________________________________________
### as a test you may run the script like this and you get a JSON output back which tells you if your script worked:
chmod 775 /config/skyline_custom_metrics/*
/config/skyline_custom_metrics/chassis_state.sh
### copy script and json to all members of the Maestro Cluster
asg_cp2blades /config/skyline_custom_metrics/ -r
## --name is the script name not the filename and is wrong in the documentation
## --path is the path to the json not the shell script, the json has the path to the shell script. the documentation is wrong here
## "secured" needs to be added to the json and is missing in documentation
## yes confirms the confirmation request
gexec -b all -c 'yes | sklnctl otlp add --name /config/skyline_custom_metrics/chassis_state --path /config/skyline_custom_metrics/chassis_state.json'
## you need to enable the script first which is missing in documentation
## --name you defined earlier
## "script" means it is of type script. you can enable and disable "collectors" with this command, too.
gexec -b all -c 'sklnctl otlp enable --name chassis_state script'
### restart the otlp and otelcol processes and wait for the metrics.
g_all /opt/CPotlpAgent/CPotlpagentCli.sh stop; sleep 2; g_all /opt/CPotlpAgent/CPotlpagentCli.sh start
g_all /opt/CPotelcol/CPotelcolCli.sh stop; sleep 2; g_all /opt/CPotelcol/CPotelcolCli.sh start
### To add additional processes to the monitoring you may add these by the following commands
## add additional system processes to the monitoring
## To check if the process is monitored check this metric: "process_cpu_usage"
## I added pepd, pdpd and rsyslogd - I used "ps -ef | sort" to get a list on the system
gexec -b all -c 'sklnctl otlp process --add pepd,pdpd,rsyslogd'
### shows the list of all monitored processes
sklnctl otlp process --show
### restart the otlp and otelcol processes
g_all /opt/CPotlpAgent/CPotlpagentCli.sh stop; sleep 2; g_all /opt/CPotlpAgent/CPotlpagentCli.sh start
g_all /opt/CPotelcol/CPotelcolCli.sh stop; sleep 2; g_all /opt/CPotelcol/CPotelcolCli.sh start