Re: Potential Checkpoint Maestro Bridge Issue?

数据包巫师 · ‎2022-12-16

Intro

Hey everyone,

I'm not incredibly familiar with the nomenclature or internal workflows on the Checkpoint Maestro Hyperscale solution, but we're investigating an illusive issue with a particular workflow. I've provided a basic diagram to explain the connections.

Example Fabric

Example Path Through Fabric (leaf switch and spine switch selection are irrelevant)

Diagram Overview

There are 2 firewalls, each connect directly to a single Maestro switch. The Maestro switch is configured with two bridge groups. Traffic should come in from a firewall, enter the Maestro switch, pass through the Checkpoint IPS which is also attached to switch, and exit the South side interfaces to the leaf switches.

The leaf switch pairs each have their own distinct port channels connected to the maestro switch. The leaf switches connect to a spine layer (I've simplified the connectivity so you don't have to look at all of the redundant connections between the leaf and spine Clos architecture).

Problem

Let's call everything on the left side, side A, and everything on the right side, side B for simplicity sake.

If a host behind firewall A, or firewall A itself, on the left side tries to communicate with firewall B, or a host behind firewall B, on the right there is significant delay / jitter.

If a host behind firewall A communicates anywhere else in the network, even another host connected on switch pair B that isn't beyond the Maestro switch, there is no issue at all.

I've provided a second copy of the diagram with a red line to illustrate where things fall down. It doesn't matter if the traffic crosses switch 1 or 2 in pair A or B, or any of the 3 spine switches, the result is always the same.

We have sub-second latency between switch pair A and B. All other inter-leaf pair communications in the fabric work as expected.

My limited understanding of the Maestro switch is that when slave interfaces are assigned to a bridge, layer 2 traffic passively traverses the bridge from North to South, and can't communicate with another bridge. I don't understand how we exit the bridge to get to the IPS, but it appears either bridge can fork traffic to the attached IPS.

When we do a packet capture from a SPAN on our leaf switches we're seeing tons of TCP retransmits and out of order packets. For example Host A tries to start TCP 3 way handshake and sends a SYN across the wire. Host B doesn't receive the SYN for more time than is expected creating many retransmits, and finally it will receive it and replies back with SYN ACK. Host A now doesn't receive the SYNACK back so Host B starts retransmitting until finally an ACK is seen. Even after the underlying protocol is negotiated, the issue persists through the entire connection.

What We've Tried

TCP/UDP connection from host behind Firewall A or B to remote firewall in another data center. Result: Works great
TCP/UDP connection from host behind Firewall A or B across WAN. Result: Works great
TCP/UDP connection initiated from maestro facing interface on either Firewall A or B terminating directly on maestro facing interface on the opposing firewall. Result: Bad
TCP/UDP connection from host behind Firewall A or B to maestro facing interface on opposing firewall. Result: Bad
TCP/UDP connection from host behind Firewall A or B to another host behind the opposing firewall. Result: Bad
TCP/UDP connection from host behind Firewall A or B to maestro facing interface on locally connected firewall. Result: Works great
Disabling IPS policy enforcement temporarily for troubleshooting (Although traffic may still pass through the IPS despite the policies being turned off?) Result: Issue still occurs
Disabling firewall inspection policies related to TCP/IP based connections (including on both firewalls at the same time) Result: Issue still occurs
TCP/UDP connection originating from Switch Pair A or B to the opposing Switch Pair across the fabric. Result: Works great
TCP/UDP connection originating from Switch Pair A or B to the opposing firewall across the fabric. Result: Works great

Questions

I read somewhere on a Checkpoint forum post that traffic passing through the same Maestro twice could present issues? Is anyone aware of any limitations or bugs in a setup like this? The Maestro switch connections are meant to be passive, and as such we only see the firewall's MAC addresses advertised across, but our LACP peering is with the Checkpoint MACs. Each distinct switch pair sees a unique MAC for it's LACP peer. Any ideas?

PhoneBoy · ‎2022-12-18

This looks like a classic Double Inspection issue, which is not unique to Maestro.
See: https://supportcenter.checkpoint.com/supportcenter/portal?eventSubmit_doGoviewsolutiondetails=&solut...

Timothy_Hall · ‎2022-12-18

What version of Maestro code? If still using the SP series (i.e. R80.30SP) you are going to be facing an uphill battle on many fronts. R81.10 Maestro or later with the latest GA Jumbo HFA is highly recommended.

When you say there are two bridge groups on the Orchestrator, do you mean two security groups with one firewall each? In other words the firewall on side A is in one Maestro security group and the side B firewall is in another separate security group? Or are are both firewalls in the same security group as defined on the Orchestrator?

Is all traffic in your test scenario being handled between the uplinks and firewall downlinks only, or is some of the test traffic (in one direction or both directions) crossing the management interface ports defined in the Orchestrator? (usually the lowest numbered ports) If so you are not supposed to mix traffic/directions between uplinks and management interfaces (and it can cause the exact conditions you describe), but see here: sk179005: "Connections from Data Interface to the Management Interfaces (and Vice Versa)" feature

Is the traffic subject to NAT or VPN handling by the firewalls? If so this could be some kind of correction issue but I need the answers to my questions above to make that assessment. Traffic that is attempted to be corrected more than once in Maestro will be dropped which is probably what you heard: sk173845: Out of state drops for connections that are not symmetrical in Scalable Platforms (Maestro...

The Orchestrators don't have a Monitor Mode/SPAN port capability, at least that I am aware of. So not sure how your IPS fits in but it sounds like it is not relevant to the problem anyway.

Attend my 60-minute "Be your Own TAC: Part Deux" Presentation
Exclusively at CPX 2025 Las Vegas Tuesday Feb 25th @ 1:00pm

Chris_Atkinson · ‎2022-12-18

Which appliances are used for the actual individual SGMs out of interest?

Note:
* R81.20 introduced Maestro Fastforward (refer: sk173903).
* QLS and Maestro will also be supported together in the near term.

CCSM R77/R80/ELITE