Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Danny
Champion Champion
Champion

Maestro boot-loop & MAC address verifier issues

I'm experiencing an interesting issue.

Single-Site Maestro Environment:

  • 2x MHO-140 running on R80.20SP JHF 315
  • 2x 7000 appliance SGMs (R81 JHF 44) running as Security Group 1 (SG1)

After I added another 7000 appliance as SGM3 to SG1 it is always ending up in a boot loop until the max. auto-restart count is reached:

[Dec 17 16:53:13]: pulling configuration from SMO (192.0.2.1)
INIT: Sending processes the TERM signal

/var/log/reboot.log shows:

Fri Dec 17 16:32:30 Reason: reboot_with_log : Rebooting local blade (global context database was modified) Type: configuration
Fri Dec 17 16:37:44 Reason: reboot_with_log : Rebooting local blade (global context database was modified) Type: configuration
Fri Dec 17 16:42:57 Reason: reboot_with_log : Rebooting local blade (global context database was modified) Type: configuration
Fri Dec 17 16:48:10 Reason: reboot_with_log : Rebooting local blade (global context database was modified) Type: configuration
Fri Dec 17 16:53:22 Reason: reboot_with_log : Rebooting local blade (global context database was modified) Type: configuration

The SGM eventually starts and goes straight into 'Down' state:

[Expert@sg-ch01-03:0]# cphaprob stat
Cluster Mode:   HA Over LS

ID         Unique Address  Assigned Load   State          Name
1          192.0.2.1       50%             ACTIVE         sg-ch01-01
2          192.0.2.2       50%             ACTIVE         sg-ch01-02
3 (local)  192.0.2.3       0%              DOWN           sg-ch01-03

There appears to be an issue with the configuration:

[Expert@sg-ch01-03:0]# cphaprob -l list
Registered Devices:

Device Name: Configuration
Registration number: 11
Timeout: none
Current state: problem

asg diag verify just shows an issue with the MAC address verifier:

[Expert@sg-ch01-03:0]# asg diag verify
--------------------------------------------------------------------------------
| Tests Status                                                                 |
--------------------------------------------------------------------------------
| ID | Title              | Result     | Reason                                |
--------------------------------------------------------------------------------
| Networking                                                                   |
--------------------------------------------------------------------------------
| 19 | MAC Setting        | Failed (!) | (1)Inconsistent Firewall value and MA |
|    |                    |            | C Address                             |
--------------------------------------------------------------------------------

Output:

[Expert@sg-ch01-03:0]# mac_verifier -v
--------------------------------------------------------------------------------
Collecting information from SGMs...
--------------------------------------------------------------------------------
Verifying FW1 mac magic value on all SGMs...
FW1 mac magic value on all SGMs:
Command completed successfully

Success
--------------------------------------------------------------------------------
Verifying IPV4 and IPV6 kernel values...
FW1 mac magic values are the same on SGM 1_01 for IPv4 and IPv6 kernels.
FW1 mac magic values are the same on SGM 1_02 for IPv4 and IPv6 kernels.
Success
--------------------------------------------------------------------------------
Verifying FW1 mac magic value in /etc/smodb.json...
FW1 mac magic value and /etc/smodb.json value are the same (1)
Success
--------------------------------------------------------------------------------
Verifying MAC address on local chassis (Chassis 1)...
-*- 2 blades: 1_01 1_02 -*-
BPEth0      MAC address of BPEth0 is correct

-*- 2 blades: 1_01 1_02 -*-
BPEth1      MAC address of BPEth1 is correct

MAC address inconsistency found on interface bond1 (FW1 value is different)
-*- 2 blades: 1_01 1_02 -*-
bond 00:1c:7f:xx:yy:zz

Nothing seems to solve this issue.
I tried re-attaching SGM3 to SG1 as well as cpha_blade_config pull_config all 192.0.2.1 to no avail.

Guess what, I had the same issue before when I was adding SGM2 to SG1. I fixed it back then by rebooting SGM1 and SGM2 together. This time I'd like to avoid rebooting all SGMs at once because of production outage and risk of introducing other issues.

I've been able to set SGM3 'Active' with this trick:cphaconf set_pnote -d Configuration -s ok report

After "solving" the configuration issue 😅 even the MAC address verifier doesn't report any issues anymore and everything looks fine.. until the next reboot.

Any ideas? @Laszlo_Csosza , @Lari_Luoma , @Jan_Irani , @Jochen_Hoechner , @Anatoly , @Tom_Kendrick , @Christian_Hofma

0 Kudos
4 Replies
Tom_Kendrick
Employee
Employee

Hi Danny, ping me the SR # offline, and I will ask. I assume you checked sk170158?

0 Kudos
Danny
Champion Champion
Champion

sk170158 does not apply as the management IP is ending on .217

I'll send you the SR # offline. I had the same issue when I added SGM2. CP sent us a RMA appliance back then. Of course that didn't help as it's a configuration issue so the boot-loop also appeared on the RMA appliance. I was then able to fix it by rebooting the entire SG.

0 Kudos
Danny
Champion Champion
Champion

Hmm, maybe mac_verifier isn't yet capable of handling it correctly when run on a 'Down' member in an Active-Active environment and therefore shows an error that vanishes as soon as I put SGM3 active with my trick shown above? Then the only issue to debug and resolve is the boot-loop behavior on system start.

0 Kudos
Danny
Champion Champion
Champion

Issue solved. ITAC advised to edit $FWDIR/boot/modules/fwkern.conf and move the line nac_max_enforced_identities=90000 from the middle to the end (sk176371). Rebooted all SGMs and no boot-loop anymore.

0 Kudos