Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Dorian
Participant

Skyline - Interface Utilization Alerts

Hi team, I am trying to setup an alert for our Interface utilization in Skyline, one example would be for utilization >50%.. Below you can see how we did this on our previous Grafana instance (running with InfluxDB & Telegraf). First, we had to distinguish between 1G and 10G interfaces, then create two conditions for alerting to facilitate the different types of NICs – not pretty but if works for our almost 1800 interfaces.

Dorian_0-1706236829812.png

Alert

Dorian_1-1706236829816.png

In Skyline I am yet to find how to implement something similar. I found that we can distinguish between "speed" in the metrics browser so I feel like something similar should be possible but I'm missing how to put it all together. How do you alert on Interface utilization? 

Cheers Dorian

0 Kudos
4 Replies
Alexander_Wilke
Advisor

Hello,

for this you have to find the proper PromQL query.

system_traffic_connections{speed} shows you the interface Speed.

you probably have to combine it or compare it with :

system_traffic_io_transmit

system_traffic_io_receive

 

I personally do not monitor the Interface speed because in my case this is irrelevant. In general I am more interested in throughput:

 

ceil((avg_over_time(system_network_interface_io_transmit_rate{host_name=~".*", interface="TOTAL"}[15m]) / 1024 / 1024 ) >= 8000) ## >= XXX Firewall throughput bigger than X, convert from bits to Mbit and without any decimals over the last 15 minutes
 
I only use "transmit" because everything which is sent came in at some time.
 
 
count by (host_name)( ### counts and groups by host_name
sum by (cpu,host_name) (
100 - avg_over_time(system_cpu_utilization{host_name=~".*", state="idle"}[5m]) ## 100% - CPU Idle avg over 5 minutes = cpu usage over 5 minutes
)>= 70 ### shows all CPU cores and hostnames which load is higher tha 70 percent
) >= 3 ### only shows systems where cpu core usage is higher than 70% for at least 3 cpu cores at the same time
 
 
For promql questions I would suggest:
Prometheus Users - Google Groups
 
Dorian
Participant

Thanks @Alexander_Wilke - much appreciated.. haven't played around with PromQL yet but suspected that might be the answer to my question..
Interface speed is irrelevant for us too.. we used that in our old environment to distinguish between alerts/thresholds for 1GB & 10G interfaces. I will try what you've described above for the interface utilization - when you get a chance would you please mind sharing a screenshot of query & alert setup? Thanks again for your help 

0 Kudos
Alexander_Wilke
Advisor

Hello,

 

originally I used this dashboard from @Kaspars_Zibarts  and slightly modified it and fixed some queries (e.g. top CPU)

Cluster Dashboard - Skyline alternative - Check Point CheckMates

 

 

Dorian
Participant

Good morning Alex & sorry for the delayed reply. Thank you very much for sending that through - vielen Dank fellow German Checkmate 🙂
I finally managed to spend some time on this - I replicated your alerting query which works perfect however I might be missing something - using the interface "Total" and setting a threshold of 5000 will cater for a >5GB alert for total throughput of your firewall correct? I've set the wildcard for the interface instead to query all of our 1800 interfaces however setting a threshold of 500 or 5000 would only cater for 50% alerting for 1G or 10G interfaces - one or the other. 

system_network_interface_state allows me to select the label speed which I could use to query 1G or 10G interfaces - there must be a way to combine this with transmit

Pre skyline we had a Grafana panel with two queries - first one to identify throughput for all 1G interfaces & second one for the 10G interfaces - then two alerting conditions to cater for each query which caters for each Interface type/speed (ie. 500 for 1G & 5000 for 10G resulting in a 50% interface utilization alert) - hoping this explanation makes sense. Apologies if I'm missing something obvious here.

Thanks again for your time mate. 

0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    Tue 23 Apr 2024 @ 08:00 AM (CDT)

    South US: HTTPS Inspection Best Practices

    Tue 23 Apr 2024 @ 11:00 AM (EDT)

    East US: What's New in R82

    Thu 25 Apr 2024 @ 11:00 AM (SGT)

    APAC: CPX 2024 Recap

    Tue 30 Apr 2024 @ 03:00 PM (CDT)

    EMEA: CPX 2024 Recap

    Tue 23 Apr 2024 @ 08:00 AM (CDT)

    South US: HTTPS Inspection Best Practices

    Tue 23 Apr 2024 @ 11:00 AM (EDT)

    East US: What's New in R82

    Thu 25 Apr 2024 @ 11:00 AM (SGT)

    APAC: CPX 2024 Recap

    Tue 30 Apr 2024 @ 03:00 PM (CDT)

    EMEA: CPX 2024 Recap

    Thu 02 May 2024 @ 11:00 AM (SGT)

    APAC: What's new in R82
    CheckMates Events