Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Alexander_Wilke
Advisor

Custom Metrics - Failed to execute script - /bin/bashexceededtheCPUthreshold

Hello,

I created a script (or AI did it) to get the information from "orch_stat -p" of my MHO 140.

 

This is the script:

#!/bin/bash

# =========================
# RX Metrics Script for MHO
# =========================

# Load Maestro environment profiles (required for Maestro scripts)
source /opt/CPshrd-R81.20/tmp/.CPprofile.sh
. /opt/CPotlpAgent/cs_data_handler_is.bash

# Check if this is an MHO system; exit if not
if [[ ! -f /etc/.scalable_platform_mho ]]; then
    script_exit "system is no MHO" 0
fi

# Use process substitution to avoid subshells
while IFS= read -r line; do
    # Read the 12 tab-separated fields into named variables
    IFS=$'\t' read -r Physical_Port Interface_Name Type SG QSFP_Mode Admin_State Link_State Transceiver_State Operating_Speed MTU RX_Frames TX_Frames <<< "$line"

    # Skip lines with missing critical fields (should not happen, but for safety)
    if [[ -z "$Physical_Port" || -z "$RX_Frames" ]]; then
        continue
    fi

    # Set RX_Frames as the metric value
    set_ot_object new value "${RX_Frames}"

    # Set all other columns as labels (explicitly, order is guaranteed)
    set_ot_object last label "Physical_Port"      "${Physical_Port}"
    set_ot_object last label "Interface_Name"     "${Interface_Name}"
    set_ot_object last label "Type"               "${Type}"
    set_ot_object last label "SG"                 "${SG}"
    set_ot_object last label "QSFP_Mode"          "${QSFP_Mode}"
    set_ot_object last label "Admin_State"        "${Admin_State}"
    set_ot_object last label "Link_State"         "${Link_State}"
    set_ot_object last label "Transceiver_State"  "${Transceiver_State}"
    set_ot_object last label "Operating_Speed"    "${Operating_Speed}"
    set_ot_object last label "MTU"                "${MTU}"
done < <(
    orch_stat -p | awk '
    BEGIN {FS="|"}
    /^\+/ { next }                              # Skip separator lines
    /^\|[ ]*Physical Port/ { next }             # Skip header line
    /^\|/ {
        if (NF != 14) next                      # Only process lines with 14 fields (12 data fields)
        row=""
        for(i=2; i<=13; i++) {                  # Extract fields 2 to 13 (data columns)
            f=$i
            gsub(/^[ \t]+|[ \t]+$/, "", f)      # Trim leading/trailing whitespace
            row = row f "\t"
        }
        sub(/\t$/, "", row)                     # Remove trailing tab
        print row
    }
    '
)

# Exit successfully
script_exit "Finished running" 0

 

Running it manually in the shell is working. it takes approximately 10 seconds to finish. This is long time - however I am testing and maybe it can be improved - maybe not.

However if I add it to the "sklnctl otlp add" and it is running after service restart (it should run evry 60s) I get this error:

[Expert@yyyy-mho1_01:0]# tail -n 1 /opt/CPotlpAgent/otlp_agent.log
ts=2025-06-16T00:07:43.461+02:00 caller=level.go:63 ts=2025-06-16T00:07:43.461+02:00 caller=level.go:63 level=info msg="Collector: /config/skyline_custom_metrics/skyline_custom_orch_stat_p_rxhas disabled due to: " Script:/var/log/CPotlpAgent/backup/scripts/skyline_custom_orch_stat_p_rx.shchangethestatetodisableddueto:TheCommand:/bin/bashexceededtheCPUthreshold=(MISSING)


Looks like a CPU limit in place. where to check? how to adjust? how to disable? Alternatives?

0 Kudos
4 Replies
Danny
Champion Champion
Champion

AI generated bash scripts? 😅

My brain 🎓 generated this oneliner you might want to check out and modify for your own needs:
Maestro MHO Ports Dump - Sorted & Colored

By looking at this part of your script:

    orch_stat -p | awk '
    BEGIN {FS="|"}
    /^\+/ { next }                              # Skip separator lines
    /^\|[ ]*Physical Port/ { next }             # Skip header line
    /^\|/ {
        if (NF != 14) next                      # Only process lines with 14 fields (12 data fields)
        row=""
        for(i=2; i<=13; i++) {                  # Extract fields 2 to 13 (data columns)
            f=$i
            gsub(/^[ \t]+|[ \t]+$/, "", f)      # Trim leading/trailing whitespace
            row = row f "\t"
        }
        sub(/\t$/, "", row)                     # Remove trailing tab
        print row
    }
    '

I instantly had to shrink and optimize it into this oneliner:

orch_stat -p|awk '/\// {gsub(/\s*\|\s*/, "\t"); sub(/^\t|\t$/, ""); print}'


As you'll see, it runs slightly faster.

It's generally a good practice to avoid unnecessary subshells, therefore I suggest to continue with |while read line; do, just as Sven did here.

0 Kudos
Alexander_Wilke
Advisor

Hello @Danny 

I think you are on the wrong road. This is the skyline sub forum. The idea is to create open telemetry / prometheus metrics based ond this output what "orch_stat -p" provides. I do not talk about running this command manually on the MHO. I want a 24/7/365 monitoring of these stats via OpenTelemetry Agent.

And for that reason I need to parse every single line, add every column as a label value and the rx/tx as the metric value. this will generate one metric for RX and one for TX with the column headers as label names and the label values per port/each line. This is how metrics work and should look like.

 

Your comments about unnecessary subshells etc - I am not familar with that so I used the AI but following this comment your suggestions to not use < <(...) is maybe wrong.
https://community.checkpoint.com/t5/OpenTelemetry-Skyline/Custom-Metric-Behaves-different-CLI-vs-skl...

So I do not see what your oneliner could help here. Hopefukly I miss some important thing or you miss the point which skyline custom metrics and not colored shell scripts.

However, maybe you may provide an efficient skyline custom metrics script which is faster than the one I provided with the same amount of information. This would be valuable for this thread. Otherwise not.

0 Kudos
Danny
Champion Champion
Champion

Regarding the updated Skyline Admin Guide Check Point R82 supports built-in metrics for Maestro Orchestrator, so you don't need to work on your own custom metrics anymore. Just upgrade to R82, as it's widely recommended for all deployments. Also there is a new OpenTelemetry Collector build #192 available lsk180522).

0 Kudos
Sven_Glock
Advisor

I love the new metrics for orchestrators.
It would be nice from Check Point if you could provide some sample dashboards for the new metrics.
SK178566 provides some nice samples, but those are still working with R81.10 metrics.

@Elad_Chomsky would it be possible to take this inside CP? A seperate orchestrator dashboard would be nice.

Thanks in advace!

Regards
Sven

0 Kudos
Upcoming Events

    CheckMates Events