Hello,
I noticed a weired thing after comparing my maestro environment running R81.20 + JumboHFA Take 99 without dynamic balancing and with dynamic balancing enabled.
After I enabled dynamic balancing I could see that the total cpu usage of alle CPUs and all SGMs increase by approximately 80% and on the other hand with similar or less throughput. Traffic mix was the same as this is a production environment.

I opened a ticket with my Diamond support team an we could not find a reason until now but still investigating. We assumed an issue with skyline metrics.
In parallel I tried to find indicators in skyline why it is like it is and I think I found the root cause. Skyline sends metrics every 15s to prometheus. without dynamic balancing we receive the cpu usage for this 15s. the CPU can only have one "type" which is CoreXL_FWD or CoreXL_SND. as the CPUs are assigned statically by cpconfig they can not change its type. As a result One specific CPU has only ohne type within a 15s metric.

However a fter enabling dynamic balancing the type of a CPU is not static anymore. it may change and this is expected. A CPU can now have multiple types within one period of 15s. It can be CoreXL_FWD, CoreXL_SND, BOTH, PPE_MGR and others. The result is that in a busy environment with different traffic loads the CPUs type retured for a specific CPU has multiple results. In the following screenshot you can see the see CPU 2 with different states within one 15s period. The result is that "sum" adds these CPU percentages and and because we don't know how long the cpu 2 was FWD and how long it was SND within these 15s we can not build an correct average.

Here is another example which shows that cpu number 2 could have a cpu usage more than 100% within a 15s period.
In the following screenshot over a period of 90s (6x 15s) the CPU usage of this CPU is always over 100% which is impossible and leads me to the conclusion that the defaul skyline behaviour can not handle dynamic balancing properly.

Another question that has arisen is how the cpivew exporter exports the metrics and how are these aggregated into the 15s push interval. If the cpview exporter would only export every 15s the cpu usage then as a result there would be only one single CPU type. But we can see multiple types which means that cpview exporter exports the CPU metrics multiple times within these 15s interval. In the screenshot above we can see at least 4 types per 15s which lets me assume cpview exporter exports the metrics 4 times and then builds an average?
PS:
In the first screenshot you maybe noted that there are 2 days with high CPU spikes. this was the result of the take 92 bug described in sk183251
How do you measure the CPU reliably usage with skyline and dynamic balancing?