Solved: Re: ClusterXL different hardware

Andy_N · ‎2020-11-18

Hello,

Does anybody have experience to bring online failing node on different HW?

E.g. node 1 (online) has 2 cores per CPU and 5 network ports, new node has 10 cores per CPU and 4 network ports.

We haven’t exactly HW configuration faulted node was.

It was appliance 5200, but we have server Dell PE R630 and wanna bring it online as a second node.

Cluster configuration is HA mode.

Regards

PhoneBoy · ‎2022-10-05

I would never recommend to anyone they attempt to cluster two different hardware types together even if they have the same core count.
Perhaps you can get it to work, but as I said elsewhere in this thread, I would thoroughly test it in the lab before even attempting to do it in production.

View solution in original post

G_W_Albrecht · ‎2020-11-18

sk93306: ATRG: ClusterXL

In order to avoid unexpected behaviour, ClusterXL is supported only between machines with identical CPU characteristics.

In addition, in order to avoid unexpected fail-overs due to issues with CCP packets on cluster interfaces, it is strongly recommended to pair only identical physical interfaces as cluster interfaces.

Number of CoreXL FW instances: number of instances on all members must be identical.

CCSP - CCSE / CCTE / CTPS / CCME / CCSM Elite / SMB Specialist

Bob_Zimmerman · ‎2020-11-18

For sync to work, the CoreXL config must match between the two nodes. That's the only real, functional concern, though. I've done this dozens of times.

Each system should ideally have at least as many interfaces as you are actually using. It's possible to define an interface only on one member, if you're willing to accept that traffic not working when you run on the other member.

I haven't personally tried clustering between an untagged interface on one firewall and a tagged interface on another, but I don't know of a reason it definitely would not work. That might let you get around the interface count concern above.

NOTE: there is a difference between "will work" and "is supported". Doing weird things like this will probably work, but the TAC may not be able to help with problems. As a result, I would not run a cluster like this for long.

PhoneBoy · ‎2020-11-18

One other factor to consider is if this is R80.40/R81 and Dynamic Workloads is enabled.
In that situation, the clusters will definitely have different CoreXL configs (it's dynamic, after all), but sync still works (assuming same CPU characteristics).

No question this is an unsupported configuration.
I would do thorough testing on this prior to even considering doing this in production.

_Val_ · ‎2020-11-19

Dynamic Workloads leverage Multi-Queue for both HW and SW interrupts. That allows FW to maintain a constant amount of instances, with variable amount of CPUs serving them. For that matter, Dynamic Workflow does not cause any clustering issue, but it still works with the limitation of exactly the same amount of CPUs on all cluster members

Heather_Lewis · ‎2022-10-05

I have a client with a 15400 cluster (80.40), looking to refresh to a 16200 cluster, replacing one member at a time, to have no downtime. So at one point, the cluster will have one 15400 member and one 16200 member. Is this something you would comfortably recommend ?

PhoneBoy · ‎2022-10-05

I would never recommend to anyone they attempt to cluster two different hardware types together even if they have the same core count.
Perhaps you can get it to work, but as I said elsewhere in this thread, I would thoroughly test it in the lab before even attempting to do it in production.

Andy_N · ‎2020-11-18

Thanks for answer

E.g if I restrict CoreXL on new node board to 2 cores – cluster will works?

Actually when I tried bring online a new node (w/o CoreXL matching between 2 nodes) after policy was installed a new node got loop reboot w/o login prompt.

It fixed only get node in maintained mode and disabling Cluster membership.

Wolfgang · ‎2020-11-18

You spent so much time to trying to get running an unsupported solution. I would prefer to get a replacement for the failing node.

In the meantime you can use your open server as cold standby. Copy you're configuration from one to the other node. And in case of a failure you can switch to the new node, reset SIC and you had a running node.

Wolfgang

Andy_N · ‎2020-11-18

Hello, Wolfgang

We are thinking about it and probably what this right way.

U r right – time was spent so much…

Thanks everyone for answers

Regards

_Val_ · ‎2020-11-19

As already mentioned, you have to run two identical machine in a cluster. Anything else in a not supported config, and most probably will not work. This includes not just the amount of CPUs, other HW as well.

Are you a member of CheckMates?

ClusterXL different hardware