Hello Community,
We are currently in the middle of a hardware refresh, moving from an older cluster (16200) to a new 19200 appliance setup. However, we’ve hit a significant roadblock that has rendered the new appliance unusable in production. I’m curious if anyone else has encountered this specific issue between UPPAK and KPPAK modes on the 19000 series.
The Migration Path:
We followed the standard cluster replacement procedure (similar to this guide). After swapping the first node, we immediately ran into two issues:
- ICMP over VPN Failure: With the default UPPAK mode, ICMP traffic through the VPN stopped working entirely. This mirrors the issue discussed in this thread.
- High Idle CPU: The appliance was idling at roughly 20% CPU with almost no load.
The KPPAK Attempt:
Because we had experienced stability issues with UPPAK on our previous hardware, we decided to switch the new 19200 to KPPAK mode.
- The Good: Switching to KPPAK immediately fixed the ICMP VPN issue and the CPU stabilized. We moved the node into production for testing.
- The Bad: The next morning, as user load increased, performance tanked. It turns out our 10/25/40/100G QSFP28 (Intel) NICs use the ICE driver, which is known to have major performance limitations when running in KPPAK on these appliances (as per sk183525).
Current Status:
We are essentially stuck
- In UPPAK: VPN traffic (ICMP) is broken.
- In KPPAK: The ICE drivers cause severe performance issues
We have two open TAC cases, and they are currently looking into debugs for the UPPAK ICMP issue, but we are effectively unable to use the new hardware.
Has anyone successfully resolved the ICMP/UPPAK issue on the 19000 series? Or, has anyone found a workaround for the ICE driver performance bottleneck in KPPAK mode?
Any insights or similar experiences would be greatly appreciated!
Best regards,
Kuba