Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
GianniPapetti
Contributor

CP 6200Plus R80.40 JHA 156 - red flashing of the ALARM LED after update

Hi Checkmates!

Yesterday I applied the update from version R80.40 JHA take 139 to version R80.40 JHA take 156 for the pair of 6200 Plus gateways that supply the ClusterXL system. I was surprised by the long time spent updating, so much so that I wanted to check for any error messages by accessing the virtualized console via LOM.

During the upgrade phase, I was unable to access the LOM web interface and had to wait for the update activities to finish to access the LOM console.

After the update, I noticed on the front panel of both gateways the intermittent red flashing of the ALARM LED.

Schermata 2022-04-02 alle 13.01.13.png

I'm sure the LEDs weren't flashing before the update!

Everything is working as expected, checks carried out through the appliance health check function do not detect anything abnormal. Opened i TAC CASE and was instructed to run a Hardware Diagnostic Tool diagnosis... even if i'm quite confident is an update issue. 

Have you ever had a similar case? In order to avoid having to stop the machines and access via the physical console, do you think it is possible to restart a node and start the diagnostic software via LOM virtualized console (obviously after generating the boot pen drive)?

 

Best regards and thanks for your feedback,

Gianni.

0 Kudos
26 Replies
the_rock
Champion
Champion

Hm, thats tricky issue...are you able to uninstall new jumbo and see if problem is there after the reboot?

Andy

0 Kudos
Chris_Atkinson
Employee
Employee

Does anything unexpected show in the hardware health section of the GAiA portal for either appliance?

0 Kudos
GianniPapetti
Contributor

Everything is fine

Schermata 2022-04-02 alle 17.22.29.png

0 Kudos
sharonab
Employee
Employee

the above output is from lom unless i am mistaken and not gaia ,

 

lom sensors and gaia are somewhat different ,

 

gaia bases activation of it's led on the result of "show sysenv all" command ,

please show output of command (e.g. of possible cause that cannot be seen in lom, bios might have moved to secondary , which might also explain the time it took for upgrade to finish and possibly lom access issue)

0 Kudos
GianniPapetti
Contributor

you got it! 

Schermata 2022-04-03 alle 10.02.32.png

0 Kudos
sharonab
Employee
Employee

Bios value is invalid , this is what is causing the alarm , i will check with HW team if they wish to read why this machine moved to secondary bios 

0 Kudos
GianniPapetti
Contributor

The weird thing is that both gateway have invalid bios; i'll suggest command output to TAC.

Best regards,

Gianni.

0 Kudos
Chris_Atkinson
Employee
Employee

Please refer to sk108517 further to the above.

0 Kudos
G_W_Albrecht
Legend
Legend

And did you ask for RMA already ?

CCSE CCTE SMB Specialist
0 Kudos
GianniPapetti
Contributor

Not yet, hope to find other solutions.

TKS,

Gianni.

0 Kudos
Dolev
Employee
Employee

Hi,

 

Following Sharon's comments, can you kindly open SR with TAC and provide here the SR #?

We will contact you offline with some troubleshooting steps and mitigation to address that error

 

Regards,

Dolev

GianniPapetti
Contributor

Thank guys for your support!

This is the SR number SR#6-0003221846

Gianni.

0 Kudos
GianniPapetti
Contributor

for sure SR#6-0003221846

the_rock
Champion
Champion

Keep us posted please how it gets solved, I would really like to know.

Cheers,

Andy

0 Kudos
GianniPapetti
Contributor

Hi there, I am really upset with this case.

Checkpoint will proceed with RMA of both devices.
Excellent Easter holidays are expected 😞

Thanks u all,

Gianni.

0 Kudos
the_rock
Champion
Champion

But wait...why RMA if boxes are working properly? Can you ask them if there is any way to clear that message without RMA-ing the devices?

0 Kudos
GianniPapetti
Contributor

I asked exactly what you asked for;

As TAC said:

You can try by totally powering down the appliance to see if the bio error would clear.  If it's not clear, we would need to RMA the device.

 

0 Kudos
Dolev
Employee
Employee

@the_rock , @GianniPapetti 

There is an official SK that recommend to RMA the unit upon seeing this errors.

At this point, after power cycle the unit will boot with primary BIOS and therefore you will not see the errors; however we will not be able to investigate the cause.

The unit is fully functional, and shouldn't be RMAed, the SR will be assigned to the relevant team and shortly we will arrange a session with RnD primes for an investigation.

We appreciate you cooperation and I'll personally explain the situation better over the session.

0 Kudos
the_rock
Champion
Champion

Re-reading the whole post again, it certainly seems like somewhat difficult issue to investigate, thats for sure. Lets hope that @GianniPapetti wont have to RMA the boxes.

0 Kudos
G_W_Albrecht
Legend
Legend

See sk108517 - three readings have to be bad for RMA:

1. The "Hardware Health" section in Gaia Portal shows for Sensor "BIOS":

  • Value "Invalid"
  • Status "Off"

2. Query for SNMP OID / SNMP Trap for OID .1.3.6.1.4.1.2620.1.3000.0.12.0 returns: "the BIOS has failed, using recovery BIOS"

3. The dmidecode command shows the BIOS is booting up from the Secondary or recovery BIOS:

[Expert@GHost:0]# dmidecode -t 11
# dmidecode 2.7
SMBIOS 2.8 present.

Handle 0x0022, DMI type 11, 5 bytes.
OEM Strings
        String 1: Secondary BIOS
OR:
        String 1: Recovery BIOS 

 

CCSE CCTE SMB Specialist
0 Kudos
the_rock
Champion
Champion

I read the sk, but not sure if in @GianniPapetti case all 3 match. I believe its only one, but I could be mistaken.

0 Kudos
Dolev
Employee
Employee

Hi,

 

Seems like my request was not yet received by the case owner, please hold with that - No need to RMA at this point.

 

Regards,

Dolev

GianniPapetti
Contributor

Sure, let's hope for the best.
We await further investigations, i'll be more confident with BIOS redundancy.

Gianni

Martin_Raska
Advisor

keep us updated 🙂

0 Kudos
GianniPapetti
Contributor

Hi guys,
as reported in the call this morning, I proceeded in order to shut down the secondary node via halt command from the GAIA portal.
 
After the halt command the gateway restarted spontaneously (strange behaviour) and the error disappeared. 
I confirm no alarm LEDs are flashing on the front panel of the machine either.
Just repeated the same procedure on the primary node and the result was the same.
Schermata 2022-04-05 alle 21.06.30.png
 
So no RMA needed 🙂
Thanks,
Gianni.
the_rock
Champion
Champion

Awesome news, tx for sharing @GianniPapetti 

0 Kudos