Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Alex_Birkovsky
Participant
Jump to solution

ClusterXL Different Subnet Configuration

Hi,

I'm trying to do a test upgrade from a Cluster XL R77 Secure Platform to R80.10 GAIA . I've upgraded the Managment Server and setup two new Gateway servers with R80.10. The Management Server imported all the old rules and pushes them successfully to the new Gateways. My problem seems to be that I can't get routing to work properly once I make the test platform live. My network config is as follows, some IP's are changed.

Firewall IP from provider:              111.111.251.26

Firewall Gateway from provider:   111.111.251.25

Internal Network:    111.111.74.0/24

Internal Gateway IP: 111.111.74.1

Sync: 10.0.0.1 and 10.0.0.2

I've setup the Firewall IP (111.111.251.26) as a Virtual IP between the two Clusters Members on the 10.10.10.1 and 10.10.10.2 IP's. I've setup the Internal Gateway as a Virtual IP (111.111.74.1) on 111.111.74.3 and 111.111.74.4.

On both Gateway Servers in the GUI I set the IPv4 routing as follows:

Default   111.111.251.25 eth0

Static 111.111.74.0/24 LOCAL eth1

Without clustering, the firewall works fine but with Cluster XL enabled the routing fails and I'm not sure where. I tried copying the routing tables from the R77 but they're still not working when I turn the old stuff off and plug the new stuff in. I thought it could be ARP cache and I tried clearing everything on the main switch and firewall but that didn't resolve it. I tried spoofing MAC addreses from the old servers to the new ones.

Any clues on where I'm going wrong with this would be apperciated!

Thank you!

1 Solution

Accepted Solutions
AlekseiShelepov
Advisor

In my previous comment I wrote:

As I understand, you should have the following routes for it to work (on both nodes of the cluster, of course):

111.111.251.26/24 via eth0 (scopelocal)

and in the SK it is said:

Set the scopelocal attribute on the new static route for the cluster network via member's interface:

Note: Currently setting this attribute is supported only via Clish.

HostName:0> set static-route 172.16.6.0/24 scopelocal on

and the error in your case says:

"IPv4 unicast netmask check fails: Host bits are not set to zero in 111.111.251.26/24"

So, it seems like there must be a proper network address throught this scopelocal route. I strongly recommend to use the same mask on the extetrnal interface, in cluster object settings in policy, and in the scopelocal route.

My mistake when copying or just not carefully checking. I don't have a possibility to test it for now. But you can try the same on virtual machines before switching production devices.

At the same time you should not have any reason to configure scopelocal route for internal interface, as I see from your explanations.

Could you share the following information (censoring IP addresses and names):

  • Screenshot of cluster topology from SmartDashboard
  • cphaprob stat
  • cphaprob -a if
  • Routing and interfaces settings from show configuration
  • route -n
  • Part of cat /etc/routed.conf with your static routes
  • Can you ping external gateway and internal host from the active gateway of the cluster?

Again, from the SK:

(2) Procedure

There are two major steps required in order for ClusterXL to function correctly with cluster IPs on different subnets:

  1. The first step is to create static routes on each cluster member, which determine the interface connected to the cluster's network (the subnet, to which the cluster IP belongs). Unless these entries are created, the OS cannot route packets to the cluster's network. No additional configuration is required for the cluster members. It is, however, important to note that the unique IP addresses given to the members must share common subnets on each "side" of the cluster (meaning, each interface on each machine must have an interface on every other machine using the same subnet).

    Note: Configuring the static route is not needed in these cases:

    • On SecurePlatform OS Security Gateway with enabled Advanced Dynamic Routing (GateD daemon will add the route to cluster VIP network when the member's interface comes up).
    • On Gaia OS Security Gateway in VSX mode (this is done automatically when configuring routes in SmartDashboard).
  2. The second step relates to the configuration of the cluster topology (followed by the policy installation). Here, the cluster IP addresses are determined, and associated with the interfaces of the cluster members (each member must have an interface responding to each cluster IP address). Normally, cluster IP addresses are associated with an interface based on a common subnet. In this case, these subnets are not the same. It must be explicitly specified, which member subnet is associated with the cluster IP address.

View solution in original post

0 Kudos
16 Replies
Daniel_Taney
Advisor

Does the output of cphaprob -a if reveal any issues where it thinks certain interfaces are down?

Also, how does cphaprob stat report the cluster health when the issue arises?


Is the topology defined correctly in R80.10 where the ISP LAN and Internal LAN Interfaces are defined as Cluster Interfaces and the Sync is explicitly defined as a Sync interface?

R80 CCSA / CCSE
0 Kudos
Alex_Birkovsky
Participant

cphaprob -a if shows all interfaces as up on both Gateways and Virtuals

eth0 UP non sync(non secured), multicast

eth1 UP non sync(non secured), multicast

eth2 UP sync(secured), multicast

Virtuals

eth0 111.111.251.26

eth1 111.111.74.1

cphaprob state shows both cluster members one's Active and one's Standby (the opposite on the second Gateway)

1  (local) 10.0.0.1  100% Active

2             10.0.0.2  0%     Standby

I believe the Cluster Properties in Management are setup correctly

eth0       External         111.111.251.26/24       10.10.10.1/24       10.10.10.2/24

eth1       This Network 111.111.74.1/29           111.111.74.3/24   111.111.74.4/24

(hmm this could be wrong as I think that should be /24 not sure how it turned to be /29)

eth2       This Network  Sync      10.0.0.1/24    10.0.0.2/24

Even though with that /29 I still think it's strange that the active gateway can't ping/reach anything out.

Thank you for the help!

0 Kudos
Daniel_Taney
Advisor

I agree. I think the /29 should be fixed if it is supposed to be a /24. 

You're using the same Policy for R77 and R80.10, right?

If you try to ping something upstream of the Firewall, do you see ARP complete? That would help rule out whether this is somehow a Layer 2 issue.

R80 CCSA / CCSE
0 Kudos
Alex_Birkovsky
Participant

I changed the /29 to /24 but that didn't help.

The orignal cluster was imported from the R77 export. I'm using all the policies from it. dThey're very basic. I tried removing and recreating the Cluster in the Mangement console and installed the policy but still no go. 

I'm not sure how I can check to see if ARP completes. I can ping the firewall gateway and everything on the local network but that's it. 

Thanks!

0 Kudos
Maarten_Sjouw
Champion
Champion

check on your active member if you get anything whit the command:

   fw ctl arp

This should give you the proxy arp for the VIP.

Did you enable VMAC on the ClusterXL page as well? this will always improve the failover when it occurs as there is no need for gratuitous arps.

Regads, Maarten.

Regards, Maarten
0 Kudos
Alex_Birkovsky
Participant

Thanks Maarten! Running "fw ctl arp" always says "No proxy ARP entries". I tried the command even when everything started to work as I mentioned to Aleksei in the thread.

I ran the command when the firewall was working (for some reason) and it also said "No proxy ARP entries". I didn't have the VMAC enabled but I tried enabling it after it stopped working and it didn't make a difference.

0 Kudos
Maarten_Sjouw
Champion
Champion

try adding this command to both members in clish:

Member 1

add arp proxy ipv4-address 111.111.251.26 macaddress <vmac (get it with cpahprob -a if) or real mac of node> real-ipv4-address 10.10.10.1

Member 2

add arp proxy ipv4-address 111.111.251.26 macaddress <vmac (get it with cpahprob -a if) or real mac of node> real-ipv4-address 10.10.10.2

This will make sure the arp for the IP will be set anyway.

Regards, Maarten
0 Kudos
AlekseiShelepov
Advisor

Have you read Configuring Cluster Addresses on Different Subnets?

You need to create a scopelocal route for external interface.

In CLI:

set static-route 111.111.251.26 nexthop gateway logical eth0 on
set static-route 111.111.251.26 scopelocal on

The same can be done in web-interface, there is a checkbox for scopelocal.

Alex_Birkovsky
Participant

Thanks Aleksei! I've seen that document mentioned in a few places, including R80 manual. Unfortunately our license only has access to software updates and we do not have access to that solution. We are considering renewing the license with support at the end of the year but until then I'm stuck. Though I've managed to in-place upgrade from R65 to R77 through some hoops.

I tried adding the external interface with scopelocal as you suggested and it looked to have started working. The command requires a subnetmask on GAIA so I set it to /31. It took a few minutes for some of the external IP's to respond and traffic to get routed properly from the outside. But it did work! So I decided to try a failover. It never worked again, on either cluster member.

I'm a bit at a loss here on how that's even possible as I didn't change any settings. I tried rebooting both cluster members, pushing the policies but I could never get it to work again. I could only get as far as pinging my 111.111.251.25 gateway from either cluster member or any internal host as previously.

0 Kudos
AlekseiShelepov
Advisor

Sorry, I didn't check the exact commands now, but previously for sure I used scopelocal route for the same case. 

I suppose then you have to use the same network for scopelocal as you defined in policy:

eth0       External       111.111.251.26/24       10.10.10.1/24       10.10.10.2/24

I don't think you can really use /31 mask for that, at least /30.

Here is a part from the sk:

  • Configure the relevant static route for the cluster network via member's interface:

    HostName:0> set static-route 172.16.6.0/24 nexthop gateway logical <Name_of_Relevant_Interface_on_Side_A> on

    HostName:0> save config

  • Set the scopelocal attribute on the new static route for the cluster network via member's interface:

    Note: Currently setting this attribute is supported only via Clish.

    HostName:0> set static-route 172.16.6.0/24 scopelocal on

    HostName:0> save config

  • Verify that the scopelocal attribute was set:

    [Expert@HostName]# cat /etc/routed.conf

  • Now, it will be possible to route traffic using the new static route for the cluster network via member's interface.

    User should define the desired static route in the following way:

    HostName:0> set static-route <DESIRED_NETWORK_ADDRESS/MASK> nexthop gateway address <IP_ADDRESS_OF_NEXT_HOP_ON_CLUSTER_VIP_NETWORK> on

    HostName:0> save config

  • Verify that the static route to cluster VIP was added to Gaia OS kernel:

    HostName:0> show route

    [Expert@HostName:0]# netstat -rn

As I understand, you should have the following routes for it to work (on both nodes of the cluster, of course):

111.111.251.26/24 via eth0 (scopelocal)

111.111.74.0/24 connected to eth1

default via 111.111.251.25 (not via eth0)

And don't forget to save config.

0 Kudos
Alex_Birkovsky
Participant

Thanks for the suggestions, I'll give them a try! I'm also going to try a fresh install of management server and gateways without importing the R77 export just to see if it makes a difference. Though the firewall/routing work fine with the rules imported and just a single gateway without clustering. My biggest issue is that I can't test this during business hours and have so little time to play with when I make the switch for testing.

I can't seem to set the 111.111.251.26/24 gateway, it only allows a /31 or /32 otherwise the GUI and clish complain that the subnet doesn't match the IP.

"IPv4 unicast netmask check fails: Host bits are not set to zero in 111.111.251.26/24"

If I try without specifying a subnet in clish it asks for a "valid IPv4 address/netmask pair".

I tried 111.111.251.0/24 (local) and the routing didn't work but I wasn't able to get it to work again after the failover test. I also have to set 111.111.74.0/24 as a scopelocal or it stops routing traffic internally.

When it did work for a short time, I had the following routes on the gateway. I wrote these down when I thought I could always go back if I made any changes but they didn't work again.

Default 111.111.251.25 eth0

111.111.74.0/24   eth1 scopelocal

111.111.251.26/31 eth0 scopelocal

On the current R77 ClusterXL that's working I'm seeing the following route:

Destination         Gateway         Genmask       

111.111.251.24      *                  255.255.255.248

But I'm unable to add that manually to the cluster gateways. This route shows up in the R80 non-clustered gateway automatically when I set the IP's through the installation process.

Thanks again for taking the time to help!!! Smiley Happy

0 Kudos
Alex_Birkovsky
Participant

Just added the route from the old firewall scopelocal and it worked. Now just have to try it.

set static-route 111.111.251.24/29 nexthop gateay logical eth0 on

0 Kudos
AlekseiShelepov
Advisor

In my previous comment I wrote:

As I understand, you should have the following routes for it to work (on both nodes of the cluster, of course):

111.111.251.26/24 via eth0 (scopelocal)

and in the SK it is said:

Set the scopelocal attribute on the new static route for the cluster network via member's interface:

Note: Currently setting this attribute is supported only via Clish.

HostName:0> set static-route 172.16.6.0/24 scopelocal on

and the error in your case says:

"IPv4 unicast netmask check fails: Host bits are not set to zero in 111.111.251.26/24"

So, it seems like there must be a proper network address throught this scopelocal route. I strongly recommend to use the same mask on the extetrnal interface, in cluster object settings in policy, and in the scopelocal route.

My mistake when copying or just not carefully checking. I don't have a possibility to test it for now. But you can try the same on virtual machines before switching production devices.

At the same time you should not have any reason to configure scopelocal route for internal interface, as I see from your explanations.

Could you share the following information (censoring IP addresses and names):

  • Screenshot of cluster topology from SmartDashboard
  • cphaprob stat
  • cphaprob -a if
  • Routing and interfaces settings from show configuration
  • route -n
  • Part of cat /etc/routed.conf with your static routes
  • Can you ping external gateway and internal host from the active gateway of the cluster?

Again, from the SK:

(2) Procedure

There are two major steps required in order for ClusterXL to function correctly with cluster IPs on different subnets:

  1. The first step is to create static routes on each cluster member, which determine the interface connected to the cluster's network (the subnet, to which the cluster IP belongs). Unless these entries are created, the OS cannot route packets to the cluster's network. No additional configuration is required for the cluster members. It is, however, important to note that the unique IP addresses given to the members must share common subnets on each "side" of the cluster (meaning, each interface on each machine must have an interface on every other machine using the same subnet).

    Note: Configuring the static route is not needed in these cases:

    • On SecurePlatform OS Security Gateway with enabled Advanced Dynamic Routing (GateD daemon will add the route to cluster VIP network when the member's interface comes up).
    • On Gaia OS Security Gateway in VSX mode (this is done automatically when configuring routes in SmartDashboard).
  2. The second step relates to the configuration of the cluster topology (followed by the policy installation). Here, the cluster IP addresses are determined, and associated with the interfaces of the cluster members (each member must have an interface responding to each cluster IP address). Normally, cluster IP addresses are associated with an interface based on a common subnet. In this case, these subnets are not the same. It must be explicitly specified, which member subnet is associated with the cluster IP address.

0 Kudos
Maarten_Sjouw
Champion
Champion

What I still do not understand is when you do have a /29, why do you need this trick? Are there other hosts in this network that you cannot move anywhere else?

A /29 has 6 available addresses 3 for the FW's and 1 or for your routers, so I really don't understand.

Regards, Maarten
0 Kudos
Alex_Birkovsky
Participant

Got it working! (:

Aleksei, I set the settings like you recommended. Removed the "eth0" from the default route 111.111.251.25, removed "scopelocal" from 111.111.74.0/24 and added scopelocal on the external 111.111.251.24/29. Failed over a few times between gateways and everything worked.

The one thing that I did notice and did differently to every other time I've tried this was a reboot after route changes. Changing the default route without a reboot sometimes broke the local network responses from the gateways. Not sure why it would do that but it's definitely something I noticed on multiple occasions while testing and was quite confusing.

Maarten, the reason I'm using /29 is because that's the configuration I was given twelve years ago for the netmask on the firewall gateway. I think they allocated more IP's for me in case I needed them. Perhaps I could have just requested another IP for the gateway and have the upstream guys configure the rest instead of messing with this but when it worked last time I just let it run.

Thank you for your help!

Chinmaya_Naik
Advisor

Hiii Alex Birkovsky

Can you please explain in details.

As I see on your last update that you removed the "eth0" interface which is your external interface so what is the default gateway now.

You remove the scope local for the internal interface 111.111.74.0/24 that is OK.

#Chinmaya Naik

0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events