Solved: Re: Qauntum Spark 1600 HA down problem

ikafka · ‎2023-06-23

I configured 2 Quantum Spark 1600 Ha. The device that should be passive seems to be down. When I check the interfaces, some interfaces do not seem to have arrived. The screenshots are as below.

Active Devica HA List of Configuration

Network interfaces of active device

Down Device list of internet interfaces.

When I enter the down device, I get such a web gui.

I reset the secondary device a couple of times, then added it as HA secondary again, and this happens every time. not all interface ip's come through.

ikafka · ‎2023-07-13

Hi;

The problem is solved.

TAc suggested an upgrade.
After upgrading the primary and secondary member, the problem persisted. as you can see in the picture above, there are vlan interfaces in the primary member that do not pass to the secondary member. (when adding the secondary, the vlan interfaces do not pass in any way. normally it passes.)
So I manually added the vlan interface ip addresses that did not pass. After waiting for a while, the problem was fixed. I did this in the previous version and it didn't work. I think the secondary device is having trouble getting the vlan interface ip addresses when clustering on SMB devices.
As a result: My problem is solved. Thank you.
related upgrade package link

View solution in original post

Chris_Atkinson · ‎2023-06-23

For context which firmware version/build is used here?

CCSM R77/R80/ELITE

ikafka · ‎2023-06-23

Two devices of the same version: Version: R81.10.05 (996001301

RS_Daniel · ‎2023-06-23

Hello,

According to sk167453 traffic from standby member goes through sync interface. In our case active member dropped traffic from standby member. Try creating a rule with src the two members and dest any accept.

If that doesn't fix the issue try changing "OS advanced settings - Use unique ICMP ID" value to true so both members can do monitoring independently.

Regards

ikafka · ‎2023-06-23

The result has not changed. The device is still down. I did a reboot and it showed LOST during the reboot. When the device is turned on, it appears down again. But when I try to access from the web, I get ERR_CONNECTION_TIMED_OUT error. This quantaum series is strange.

ikafka · ‎2023-06-23

There is cphaprob state output on the down device:

Cluster Mode:   High Availability (Active Up) with IGMP Membership

ID         Unique Address  Assigned Load   State

1          10.231.149.1    100%            ACTIVE(!)
2 (local)  10.231.149.2    0%              DOWN


Active PNOTEs: LPRB, IAC, COREXL

Last member state change event:
   Event Code:                 CLUS-110600
   State change:               INIT -> DOWN
   Reason for state change:    Incorrect configuration - Sync interface has not been detected
   Event time:                 Fri Jun 23 18:35:00 2023

Cluster failover count:
   Failover counter:           0
   Time of counter reset:      Fri Jun 23 21:32:33 2023 (reboot)

But when check sync interface status is up. ping to active device is successfully. and there is active device cphaprobstate

RS_Daniel · ‎2023-06-26

Hello,

Yes, quantum spark have more issues/bugs than regular Gaia appliances. You have pnote COREXL, i would start there. Compare output of this command on both members "fw ctl multik stat", Also you can check cpview > cpu > overview, you should have the same amount of CoreXL_FW and OTHER cpu's. do they match? Is case no, you need to check this with TAC.

Regards

ikafka · ‎2023-07-10

TAC suggested upgrade to version R81.10.07. I will cluster again after upgrade. I will post if the problem is solved.

ikafka · ‎2023-07-13

Hi;

The problem is solved.

TAc suggested an upgrade.
After upgrading the primary and secondary member, the problem persisted. as you can see in the picture above, there are vlan interfaces in the primary member that do not pass to the secondary member. (when adding the secondary, the vlan interfaces do not pass in any way. normally it passes.)
So I manually added the vlan interface ip addresses that did not pass. After waiting for a while, the problem was fixed. I did this in the previous version and it didn't work. I think the secondary device is having trouble getting the vlan interface ip addresses when clustering on SMB devices.
As a result: My problem is solved. Thank you.
related upgrade package link

Chris_Atkinson · ‎2023-07-10

sk174423 provides further guidance on CoreXL configuration for Spark appliances and how to align if different.

Is this the only Pnote remaining?

CCSM R77/R80/ELITE

Are you a member of CheckMates?

Qauntum Spark 1600 HA down problem