Maestro Connections Rate Alerts

scordy · ‎2024-09-08

Hi All,

We recently upgraded from R81 -> R81.20 on the SGMs (and the MHOs, but this effect was not evident prior to the gateway upgrades) on our Maestro kit and are now seeing oddities in output from chassis "Alert Events". Specifically, connection rate event alerts. Periodically (about 2-3 times per day) ConnRateEvent alerts are triggered (and emailed) indicating a dramatic increase in connections/sec and then another set of alerts to indicate they've fallen back to our "normal" level. All inside a ~10 second timeframe.

Usual conn.rate as indicated by an "asg perf -v" would be around 1,000. I have never observed this theoretical spike in connection rate events as a result of asg perf monitoring. The emailed alert indicates that the number of connections is jumping somewhere between one and three-thousand fold. Which seems a bit extreme. i.e. It isn't uncommon for the alert to perceive a connections/sec jump from around 1,000 to several million.

So my question from here is - has anyone seen anything like this before with R81.20, JHF Take 76 on a Maestro? Is this something quirky I might be able to do something about or straight to TAC with this one?

Thanks for any info anyone might be able to contribute. If not, then TAC I shall hassle! 🙂

Regards,
scordy

the_rock · ‎2024-09-08

Hey Scordy,

Im not maestro expert by any means, but let me ask you this. How is optimization set in smart console object? Do you have it configured manually or automatic? Usually, its best to leave it as automatic, as it lets gateway calculate about of ram/cpu used based on amount of connections.

Andy

Best,
Andy
"Have a great day and if its not, change it"

scordy · ‎2024-09-08

Hi @the_rock ,

This is stuff more visible via GCLISH on the Maestro SMO using "show chassis alert_threshold <alert>" command. The exact alerts in play would be 'conn_rate_threshold_high/conn_rate_threshold_low_ratio/conn_rate_total_threshold_high/conn_rate_total_threshold_low_ratio'.

As stated, this didn't feature as a problem in the previous OS release. We were using R81 until recently. The variation during the "spike" events seems so dramatic that it's difficult to believe the number being recorded at the high watermark is real.

Scordy

the_rock · ‎2024-09-08

Ok...Im sure maestro gurus will help with this, but I did do a quick search and found below, though this to me appears to show how to set those limits, so might not be super useful here.

Andy

https://sc1.checkpoint.com/documents/R81/WebAdminGuides/EN/CP_R81_Maestro_AdminGuide/Topics-Maestro-...

Best,
Andy
"Have a great day and if its not, change it"

scordy · ‎2024-09-08

Yep - the limits are set. But they don't really account for 3-orders-of-magnitude shifts in metrics. i.e. It's easy enough to set a high watermark, set a low watermark at say ~30% of the high and watch the host purr along forever between these two values. It doesn't readily allow for the high threshold to be many hundreds of times higher than the low. And it probably shouldn't. That seems a very odd network environment. But this is the feedback I'm getting about my network. Which was not the case until the OS upgrade. It's not catastrophic or even problematic really. I'm just fairly curious about what's causing this analysis feedback from the host.

emmap · ‎2024-09-11

I'm not aware of this being an issue, probably best to get a TAC case raised so we can deep dive into any logs that might shine a light on what's occurring here.