Solved: Issues with MVC mode during R80.40 Take 173 to R81... - Page 2

Joe_Kanaszka · ‎2024-05-06

Hey mates!

We have a two-node active/standby cluster running on a pair of 5100 appliances. One security gateway node is on R81.20 Take 53 and the other node is on R80.40 Take 173.

MVC is enabled on the R81.20 node. We enabled MVC to allow us to test the new R81.20 OS in our environment before upgrading the other node. We had only planned to be in MVC mode for a short while - upgrading the remaining node the next week.

This upgrade initially occurred at the beginning of April. Due to an emergency medical issue, I needed to take a month off and am just now getting back to work.

Shortly after the upgrade, we noticed the R81.20 node was rebooting on its own and afterward the cluster would roll over to the R80.40 node. After this happened a few times we decided to just leave the R80.40 node as the active node and I would take a look at the issue when I came back to the office.

After opening a ticket with TAC and having them examine my kernel crash files in /var/log/crash for the R81.20 node, they made this determination:

******************************************************************************

Primarily , the issue is happening because of the members being on different versions. The stack trace generated matched a previous task internally; TM-63720, which identifies the crash as a result of a change in table format.

The kernel table in question is : SEP_my_IKE_packet_gtid,
The table values were changed from {local ip, peer ip} to {local id, peer ip} between versions R80.40 and R81.20.

Issue occurs when trying to sync the table between R80.40 and R81.20,
we get owner member as IP and not as ID.

Recommendation;

Please make sure that both members are running with R81.20 and that will resolve the issue.

********************************************************************

So I get this and agree.

However, the whole point of MVC mode in this case was to be able to test the new OS, which I really can't do. The reboots are too frequent (sometimes daily). The R81.20 node reboots even while in standby mode.

What are our options if we want to test before upgrading the other node?

Can we disable MVC? Will this break the cluster and cause weirder things to happen?

Can we manually edit the above-referenced table causing the issue on the R80.40 node, thereby getting rid of the sync issue?

I suppose we can break the cluster and simply run on the new node, but this makes me nervous. I really would rather not do this.

Thanks, everyone! Interested in hearing your thoughts.

Joe

Adam276 · ‎2024-05-08

@emmap wrote:
I'm not sure of what kind of scenario that statement is talking about, but it might be warning that if MVC is enabled before you do the initial policy install, the gateway will immediately join the cluster and may take over the active rule if the cluster is configured that way. Disabling MVC would prevent this takeover if the other cluster member is still on a lower version.

It seems like two different terms are being used and has me a bit confused which meaning is intended for your "first policy". The documentation uses "initial policy" install which to me means there exists no policy at all which woudl have put it into a cluster on the member yet (fresh install, etc). To me first policy install could mean that it is the first policy install after an upgrade of a member that previously had policy installed. I assume that is open to interpretation though.

The way I interpreted the document is that if for any reason you don't want a cluster member to go active *after* an initial policy install (before initial policy seems only possible if there has never been a policy installed configuring it into a cluster), then you can disable MVC before hand before initial policy install. With no policy ever pushed to a memeber before, I don't see how it could join an existing cluster(i hope that wouldn't ever be possible anyway).

I am just trying to clarify this to make sure my understanding is accurate. I am not sure if your 'first policy' intended meaning is the same with the documentation initial policy.

Different scenarios...

1. Upgrading existing secondary HA member with HFA or major version.
2. Installing fresh or putting a freshly installed member (no policy yet) in place.

It seem to me the documentation is referencing the second scenario. I am interpreting this all correctly?

the_rock · ‎2024-05-08

Glad you asked for clarification, but logic would indicate thats exactly what it is...

Andy

Best,
Andy

Joe_Kanaszka · ‎2024-05-07

Cool. Thank you.

Are you a member of CheckMates?

Issues with MVC mode during R80.40 Take 173 to R81.20 Take 53 upgrade