Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Ed_Eades
Contributor

DMZ Default Gateway Intermittent Connection Issue

I am looking for some community help on an issue that we have been troubleshooting and trying to piece together for weeks. I apologize up front for the lengthy post but I want to provide as many details on the issue we have been facing and some of the things we have done to troubleshoot. We are coming to the community to hopefully find someone that may have seen our problem before or has some additional ideas of what could be going on.

We appear to be facing a very random intermittent issue where devices will lose connectivity to the default gateway to our dmz interface. Sometimes this may affect a few DMZ devices and is almost like a hiccup and other times it may affect many more DMZ devices and can be about a 30 second interruption. This is happening with 2 different HA clusters in 2 seperate data centers (primary and secondary data center). At least 95% of Internet traffic is handled by primary data center daily. The issue does seem to happen with more frequency at the secondary data center over the primary. We are a health care provider so 24 x 7 is essential. The issue seldom occurs in evening/over night hours but does sometimes. It does seem to happen more often during core working hours but it may only happen once or sometimes several times that day.

Our topology is an HA cluster with 2 gateways at each Data Center. The inside and outside interface use 10Gb interfaces and the dmz interface is using bond of 5 1gb interfaces. The gateways plug into Cisco switches however they are different Cisco models and platforms at each data center. The bond is setup for Layer3+4 and Slow LACP on CheckPoint side and Cisco side has src-dst-ip as etherchannel load balancing method. Traffic seems to share across each member interface equally and we have had both CheckPoint and Cisco TAC review the bond/port channel setups.

Troubleshooting has lead us to where only the default gateway of the dmz interface loses connectivity. The default gateway is a virtual ip through the HA cluster. During a time when the default gateway is unreachable the dmz physical interface ips are still reachable. We have some icmp monitors to the dmz default gateway setup sourcing from some devices in the dmz and at the time the issue occurs the request packets are not received on the the gateways. The icmp monitors to the dmz interface ips do not fail during the issue and packet captures show all the requests being received.

We have been taking packet captures from different connection points and reviewing other areas of the gateways and switching infrastructure for possible answers. We tried adding static arp entries on the gateways from an example device that loses connection to the dmz default gateway but that did not change behavior. We are running 80.40 and also installed latest Jumbo Take 180 recently to see if it would help. The bond interfaces do show some continuous drop counters occurring but there are not any drops showing on the physical interfaces. The bond drop counters increment consistently and do not correlate to only when the issue is occurring. There are no output drops being registered on the Cisco side. We are at a point where we are just not sure what could be causing the dmz default gateway to basically disappear briefly and at very random intermittent times but somewhat seems to be mostly during more core traffic times. Some of the dmz devices that lose this connection are on the same switch as the gateways so the traffic essentially is contained within one switch.

It seems odd that it is only this dmz interface that seems to be affected and not the other interfaces (inside and outside) on the gateways which are also setup in the HA cluster. Also it is troubling that it is occurring on 2 separate HA clusters located in 2 separate data centers.

I could provide a topology diagram if helpful.

Many Thanks.

0 Kudos
17 Replies
This widget could not be displayed.

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events