Skyline - Interface Utilization Alerts

Dorian · ‎2024-01-25

Hi team, I am trying to setup an alert for our Interface utilization in Skyline, one example would be for utilization >50%.. Below you can see how we did this on our previous Grafana instance (running with InfluxDB & Telegraf). First, we had to distinguish between 1G and 10G interfaces, then create two conditions for alerting to facilitate the different types of NICs – not pretty but if works for our almost 1800 interfaces.

Alert

In Skyline I am yet to find how to implement something similar. I found that we can distinguish between "speed" in the metrics browser so I feel like something similar should be possible but I'm missing how to put it all together. How do you alert on Interface utilization?

Cheers Dorian

Alexander_Wilke · ‎2024-01-26

Hello,

for this you have to find the proper PromQL query.

system_traffic_connections{speed} shows you the interface Speed.

you probably have to combine it or compare it with :

system_traffic_io_transmit

system_traffic_io_receive

I personally do not monitor the Interface speed because in my case this is irrelevant. In general I am more interested in throughput:

ceil((avg_over_time(system_network_interface_io_transmit_rate{host_name=~".*", interface="TOTAL"}[15m]) / 1024 / 1024 ) >= 8000) ## >= XXX Firewall throughput bigger than X, convert from bits to Mbit and without any decimals over the last 15 minutes

I only use "transmit" because everything which is sent came in at some time.

count by (host_name)( ### counts and groups by host_name
sum by (cpu,host_name) (
100 - avg_over_time(system_cpu_utilization{host_name=~".*", state="idle"}[5m]) ## 100% - CPU Idle avg over 5 minutes = cpu usage over 5 minutes
)>= 70 ### shows all CPU cores and hostnames which load is higher tha 70 percent
) >= 3 ### only shows systems where cpu core usage is higher than 70% for at least 3 cpu cores at the same time

For promql questions I would suggest:
Prometheus Users - Google Groups

PromLabs | PromQL Cheat Sheet

Dorian · ‎2024-01-29

Thanks @Alexander_Wilke - much appreciated.. haven't played around with PromQL yet but suspected that might be the answer to my question..
Interface speed is irrelevant for us too.. we used that in our old environment to distinguish between alerts/thresholds for 1GB & 10G interfaces. I will try what you've described above for the interface utilization - when you get a chance would you please mind sharing a screenshot of query & alert setup? Thanks again for your help

Alexander_Wilke · ‎2024-01-30

Hello,

originally I used this dashboard from @Kaspars_Zibarts and slightly modified it and fixed some queries (e.g. top CPU)

Cluster Dashboard - Skyline alternative - Check Point CheckMates

Dorian · ‎2024-02-15

Good morning Alex & sorry for the delayed reply. Thank you very much for sending that through - vielen Dank fellow German Checkmate 🙂
I finally managed to spend some time on this - I replicated your alerting query which works perfect however I might be missing something - using the interface "Total" and setting a threshold of 5000 will cater for a >5GB alert for total throughput of your firewall correct? I've set the wildcard for the interface instead to query all of our 1800 interfaces however setting a threshold of 500 or 5000 would only cater for 50% alerting for 1G or 10G interfaces - one or the other.

system_network_interface_state allows me to select the label speed which I could use to query 1G or 10G interfaces - there must be a way to combine this with transmit.

Pre skyline we had a Grafana panel with two queries - first one to identify throughput for all 1G interfaces & second one for the 10G interfaces - then two alerting conditions to cater for each query which caters for each Interface type/speed (ie. 500 for 1G & 5000 for 10G resulting in a 50% interface utilization alert) - hoping this explanation makes sense. Apologies if I'm missing something obvious here.

Thanks again for your time mate.

Are you a member of CheckMates?

Skyline - Interface Utilization Alerts