R80.30 management server endless loop

John_Fleming · ‎2019-10-05

I recently upgraded my GNS3 lab and found out that after running a mgmt server for a few hours, basically doing nothing, the mgmt server would become unreachable even after reboot. After much trouble shooting i bumped into known issues for R80.30 (granted says R80.20). Lab VM would boot and java would spike to %100 on a single cpu.

PMTR-21441,
PMTR-22269

On oVirt virtual platforms, it is not supported to use the "kvm-clock" as the clock source (/sys/devices/system/clocksource/clocksource0/current_clocksource). This can cause the management processes to stuck in an endless loop with a very high CPU usage (almost 100%).

R80.20

My hypervisor is Linux KVM / Qemu.

I changed the clocksource to hpet inside the VM (as sure enough its running kvm-clock) and did a cpstop/cpstart and was able to login to the management server again.

I checked the clock source the hyper visor was using and found out its using tsc which I've had issues with in the past in different systems. FYI hardware is Dell R720 with 2 x E-2697v2 and ddr-1600. After changing to hpet on the hyper visor the VM seems to be working with kvm-clock again. My theory is the tsc clock isn't high resolution enough (or is just terrible for some reason) and its problems are being passed down to kvm-clock which is making java angry.

So far this seems resolved. I'll do more testing and report back if kvm-clock still continues to have issues post hyper visor change to hpet.

PhoneBoy · ‎2019-10-05

I assume this is the 3.10 variant of kernel if this is management?

John_Fleming · ‎2019-10-07

yes 3.10 kernel. Stress testing looks fine. Adjusting clock source to hpet on hyper visor fixed the issue.

Are you a member of CheckMates?

R80.30 management server endless loop