Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Alexander_Wilke
Advisor
Jump to solution

64k metric "system_network_interface_io_receive_rate" different than asg perf or cpview

Hello,

I think I am using a dashboard which was provided here in the community by @Kaspars_Zibarts  or from the sk.

I am not sure if I modified that query or not however it looks like this:

 

system_network_interface_io_receive_rate{environment=~"Default", host_name=~"l999gnfw0101q-ch01-01", interface="TOTAL"}
 
The result I get from this query does not fit the results from:
asg perf -v -p and blade 1_01
cpview on blade 1_01
 
The query does not represent ther overall chassis thorughput, too.
 
Unfortunately it s hard to compare the results because asg perf and cpview are updating the values more frequently than skyline sends data to Prometheus (15s).
 
I think "system_network_interface_io_receive_rate" is roughly 2x the throughput on the SGM shown by cpview or asg perf.
 
I use the same query for 16200 or 26000 ClusterXL environments and there it looks like the values match between skyline and cpview.
 
 
PS:
In addition I receive metrics with the metrics name "system_network_interface_rx_throughput_bs". The represent exactly the same value but different name. The documentation does list this metric.
 
 
 
Any observations from your site?
 
 
 
 
0 Kudos
1 Solution

Accepted Solutions
skeutgen
Explorer

Hi,

 

we had a similar issue in our VSX environment. I've investigated a bit and found out, that if we just use the default query which includes all VS the VS ID 0 delivers the same data as some VS. So we created a query to exclude data from VS0 like so:

sum by (interface) (system_network_interface_io_receive_rate{environment=~"$d_environment", service_namespace!="vs_id_0", interface!~"TOTAL|lo"})

 

This solved our issues and allowed for correct graphs.

 

Cheers,

Sascha

View solution in original post

2 Replies
skeutgen
Explorer

Hi,

 

we had a similar issue in our VSX environment. I've investigated a bit and found out, that if we just use the default query which includes all VS the VS ID 0 delivers the same data as some VS. So we created a query to exclude data from VS0 like so:

sum by (interface) (system_network_interface_io_receive_rate{environment=~"$d_environment", service_namespace!="vs_id_0", interface!~"TOTAL|lo"})

 

This solved our issues and allowed for correct graphs.

 

Cheers,

Sascha

Alexander_Wilke
Advisor

Hello,

I think I found the root cause for non-VSX environments. The "TOTAL" seems to include BPEth0 and BPEth1 on scalable plattforms.


For non Maestro/64k I use this alert rule to alert if throughput per SGM/device is higher than 6Gbit/s:

ceil(
    (avg_over_time(system_network_interface_io_transmit_rate{host_name!~".*64k.*|.*maestro.*", interface="TOTAL"}[15m]) / 1024 / 1024 )
     >= 6000)


For maestro/64k devices we use only the BPEthX interfaces.
 
ceil(
        sum by (host_name)(avg_over_time(system_network_interface_io_transmit_rate{host_name=~".*64k.*|.*maestro.*", interface=~"BPEth.*"}[15m])) / 1024 / 1024
    ) > 6000
0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events