2 independent 23800 clusters 80.40 (jhk 91-94) sta...

SerB · ‎2021-07-23

Hello Guys,

I have 2 independent 23800 clusters 80.40 (jhk 91-94). During policy installation, a standby member crashed on both of them. They are completely independent even MGMTs are different.

Has anyone faced a similar issue recently?

I opened TAC cases of course but I can't share the coredump files that were created during these crashes and I can't get traces from them as I did a long time ago by using cp_makedumpfile64. Btw does someone have a fresh version of this script?

Tommy_Forrest · ‎2021-07-23

Hi SerB.

Yes, we had a similar issue back in January. In our case, the primary gateway failed during policy install. And it took a VERY long time for the policy to install. Generally it runs about 2-3 minutes. We saw it take upwards of 7-8, if I recall correctly.

TAC found 3 very specific issues. One required a private hotfix and the other two, to the best of my understanding, were incorporated into the JHT stream. The private HF, last I checked hadn't yet been incorporated.

I don't want to go handing out case numbers here. Maybe Phoneboy or someone else can help get our case details to your engineer.

They'll probably want a crash dump to validate your issue is the same or different. We're on 26k's.

SerB · ‎2021-07-23

Hello Tommy, thank you for your reply. We tried yesterday with TAC to get traces from core dump files with cp_makedumpfile64 but vmcore_analysis.txt was empty. I can't share them even with TAC. It's the customer's security policy. But our case is a little bit different, there are only standby members who crashed. Two standby members on different clusters are too much for a coincidence. Let's wait maybe someone had this issue. There are no worries about sharing case numbers if someone had this issue I'll share this topic with TAC and I hope they'll find something in R&D cases.

Are you a member of CheckMates?

2 independent 23800 clusters 80.40 (jhk 91-94) standby node crashed during policy installation