Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Justin_Hickey
Collaborator
Jump to solution

Subinterface down, results in failover

Having firewall failovers about once a day. The logs show sub interfaces within the DMZ physical interfaces going down. It's not always the same interface that is reported as down. I'm unsure how a virtual interface can report as physically down or missing. So far I've seen no issues in the limited poking around I've done on the Cisco switch side. Any help is appreciated. 

1 Solution

Accepted Solutions
Marco_Valenti
Advisor

Issue cphaprob stat from ssh and in cluster mode you will get the information needed , btw I will point my finger on layer 2 configuration of bot switches and see if igmp snooping is on Smiley Happy

 cphaprob stat

Cluster Mode:   High Availability (Primary Up) with IGMP Membership

Number     Unique Address  Assigned Load   State

1 (local)                        100%            Active
2                                     0%              Standby

View solution in original post

20 Replies
Marco_Valenti
Advisor

how many vlan are configured on that interface? is that a single interface or a bond? does gateway are r77.30 with any jumbo hotfixes installed?

are you on a HA cluster with ccp in multicast or broadcast mode? most of the time igmp snooping can be enabled switch side with the result that  multicast packet get dropped

Justin_Hickey
Collaborator

There are 2 independent DMZ trunks, each with 5 subinterfaces. Gateway is R80.10 with Take 35

Not sure where I check that HA cluster mode. Could you point me in the right direction ?

0 Kudos
Marco_Valenti
Advisor

Issue cphaprob stat from ssh and in cluster mode you will get the information needed , btw I will point my finger on layer 2 configuration of bot switches and see if igmp snooping is on Smiley Happy

 cphaprob stat

Cluster Mode:   High Availability (Primary Up) with IGMP Membership

Number     Unique Address  Assigned Load   State

1 (local)                        100%            Active
2                                     0%              Standby

PhoneBoy
Admin
Admin

FYI, if it's the lowest VLAN that "goes down" then the whole interface will report as down.

That's by design.

Hugo_vd_Kooij
Advisor

Shouldn't the default settings check both lowest and highest VLAN?

<< We make miracles happen while you wait. The impossible jobs take just a wee bit longer. >>
Norbert_Bohusch
Advisor

yes it's both lowest and highest. It's the case since I don't know exactly but assume R76 Smiley Happy

Justin_Hickey
Collaborator

R80.10 Take 35

0 Kudos
Justin_Hickey
Collaborator

Thanks for all the responses. IGMP snooping is indeed on. I'm asking the network team to disable snooping on the dmz switch as a whole. They have concerns it might impact something else in the switches. I honestly don't know what benefit if any IGMP snooping might have in stand alone DMZ switches.

0 Kudos
Marco_Valenti
Advisor

Well , if you use multicast in anyway could be , but I really don't thinks so I mean any application should register to his multicast group so that won't be an issue  , but you need to know that you can switch ccp packet to use broadcast mode intsead of multicast mode but this will increase traffic on your switch a LOT I mean really a lot if you have tons of interfaces configuread in cluster xl

Norbert_Bohusch
Advisor

The issue could be the following:

If the sub-interface which is seen down in logs is either lowest or highest of the trunk, it could be possible that if there is no traffic on this subnet besides the two firewall nodes than a policy install of this cluster could lead to such behavior.

Normally the firewall nodes reply to each other and so they see the interface as up, but during policy commit the nodes reply too slow and as there is really no traffic seen on the interface in this moment the gateway declares the interface as down!

To mitigate that, interfaces without nodes behind it should be changed to non-monitored ones!

Justin_Hickey
Collaborator

I mean, we do have dmz interfaces, (wired guest subnets), which could have no traffic on them for extended periods of time. No idea how I would mitigate this and they need to be up all of the time even if there is no traffic.

0 Kudos
Timothy_Hall
Legend Legend
Legend

The ClusterXL "Interface Active Check" can fail and an interface declared down even if the firewalls can see each other's CCP traffic consistently across an interface; this situation can occur if there is not at least one other pingable host located on that interface.  When ClusterXL notices that only the firewalls and their associated CCP traffic are present on an interface (because there are no ARP entries present for any other hosts on that interface), the cluster members will begin probing that interface's VLAN with ping scans trying to locate at least one responding host.  The first time I saw this ping scan behavior was quite unsettling as I thought there was some kind of compromise in the network.  This ClusterXL probing behavior is mentioned in item #3 here:

sk43984: Interface flapping when cluster interfaces are connected through several switches

What ClusterXL is looking for in this instance is a VLAN misconfiguration issue, where both firewalls are on the same VLAN and can see each other, but they are on the wrong VLAN to provide service to the hosts on that subnet.  After all, why did you create this clustered interface if there are no hosts present there?

Justin mentioned that the problematic interface is some kind of guest subnet, so it is plausible that during certain periods there are no pingable hosts present and the interface will be declared down until a host shows up.  The best solution here is to make sure there is a pingable host located on that interface at all times, by adding a switch or wireless access point management IP address that will always be present and responding.

There are some other ways to deal with this by modifying ClusterXL kernel variables and such, but the above solution is the easiest to implement and will help ClusterXL to more accurately detect real network failures.

--
My book "Max Power: Check Point Firewall Performance Optimization"
now available via http://maxpowerfirewalls.com.

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
Justin_Hickey
Collaborator

Thanks Tim, Just lit up 2 vlan interfaces on each and every subnet. Should know soon if that's the fix. Thanks.

0 Kudos
Hugo_vd_Kooij
Advisor

Justin,

My first suggestion would be to switch to broadcast mode first. If the issue is gone you know you have a multicast issue discuss with switch expert(s).

But I have seen issues where this could nog be resolved and the firewall are on broadcast mode as permanent solution.

Stretched VLAN's is one those buzzwords that causes me to grab a bottle of painkillers. 

<< We make miracles happen while you wait. The impossible jobs take just a wee bit longer. >>
Justin_Hickey
Collaborator

Thanks Hugo, Going to hold off to see if creating ping'able interfaces on all vlans is the solution. Then, the plan is to try this. Thanks for the reply.

0 Kudos
Timothy_Hall
Legend Legend
Legend

Yep stretched VLANs and ClusterXL don't tend to work together well unless network conditions are perfect, mainly because Cluster XL assumes that cluster networks meet the minimum requirements for latency (less than ~30 ms) and packet loss (less than ~2-3%).   Numbers going higher than these, even briefly, will cause all kinds of undesired ClusterXL behavior.

--
My book "Max Power: Check Point Firewall Performance Optimization"
now available via http://maxpowerfirewalls.com.

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
Justin_Hickey
Collaborator

The active IPs scenario didn't fix the problem so I am back to the option of disabling multicast to see if that is the issue. I did this by issuing the below command:


[Expert@HostName]# cphaconf set_ccp broadcast

I haven't noticed and performance difference. I'd like to see this broadcast traffic to judge for myself how much of it there is. I did an fw monitor on one of the real addresses of the firewall and am not seeing any broadcast traffic.

fw monitor -e "accept host(xxx.xx.206.2);"

Curious if anyone can help me craft a statement that will show the broadcasts.

0 Kudos
Norbert_Bohusch
Advisor

The CCP-mode is a layer 2 change!

This means the destination MAC address are broadcast or multicast addresses. Layer 3 remains untouched and Destination IP should be in both cases the network address.

This is from my lab in multicast mode:

# tcpdump -enni eth2
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth2, link-type EN10MB (Ethernet), capture size 96 bytes

14:02:07.773569 00:00:00:00:01:01 > 01:00:5e:28:e5:fa, ethertype IPv4 (0x0800), length 76: 0.0.0.0.8116 > 192.168.229.0.8116: UDP, length 34

and this in broadcast mode:

# tcpdump -enni eth2
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth2, link-type EN10MB (Ethernet), capture size 96 bytes
14:02:21.317617 00:00:00:00:01:00 > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 82: 0.0.0.0.8116 > 192.168.229.0.8116: UDP, length 40

Albin_Hakansson
Participant

Since CCP works on port 8116, you could try tcpdump -nei ethX port 8116 

Justin_Hickey
Collaborator

After switching to broadcast mode the fail overs have stopped. I ran this command, tcpdump -nei eth1-04 port 8116 , and I see that I am now processing about 20 broadcasts per second. I'd like the network team to disable IGMP Snooping because I don't think it has any real value in DMZ Switches.

Many thanks to everyone who responded with suggestions. This is an amazing support group. Hope I can return the favor someday.

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events