Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Timothy_Hall
MVP Gold
MVP Gold

29000 NIC Slot Population Guidelines

This will probably end up being a @Bob_Zimmerman question, but here goes:

For the Quantum Force 29000 series (and only that appliance series), Check Point has published very explicit guidelines about how NICs are to be populated in the expansion slots:  sk181465: NIC Slot Population Guidelines for Quantum 29000 Appliances.

The gist is that you start with the lowest-speed expansion cards in the highest slot numbers, then the next-highest-speed cards in the medium slots, and finally the fastest ones in the lowest-numbered slots, but the last group is ascending instead of descending order.  Why is this?  To keep the faster cards on the lowest NUMA node where the SND cores are likely to be?  Something to do with how backplane bandwidth is assigned via lanes?  Minimize traffic between NUMA nodes inside the system?

Aside from the Maestro-related restrictions requiring extra unsupported cards to be removed, this is one of the few times Check Point has published this level of detail about expansion slot population.  Prior to this, there were only vague pronouncements, sk98348 has a good example of what I mean:

  • If you are using a motherboard with multiple PCI or PCI-X buses, make sure that each Network Interface Card is installed in a slot connected to a different bus.
  • If you are using more than two Network Interface Cards in a system with only two 64-bit/66Mhz PCI buses, make sure that the least-used cards are installed in slots connected to the same bus.

@HeikoAnkenbrand did some testing moving NICs between slots a long time ago and didn't really find much difference performance-wise: New! - R80.x Performance Tuning – Intel Hardware.  Just curious about what has changed here with the 29000 series specifically, and whether these guidelines may apply to other appliance models.  Thanks!

New Book: "Max Power 2026" Coming Soon
Check Point Firewall Performance Optimization
3 Replies
Bob_Zimmerman
MVP Gold
MVP Gold

If you check the CoreXL distribution in cpview on a two-socket box, you should generally see four blocks of SNDs. They typically occupy the low cores of each NUMA node and their hyperthread clusters. For maximum performance, I would expect it to be more important to assign the interfaces to SNDs on the same NUMA node, and ideally to keep traffic paths within the NUMA node (i.e, try to stick the egress VLAN for high-volume flows on an interface on the same NUMA node as the ingress interface).

I don't know if CoreXL's internal traffic dispatch cares about NUMA, so flows handled by a worker probably can't be optimized in the way flows handled by UPPAK can be.

Intel's processors only have so many PCIe4 lanes, and my memory is eight lanes from NUMA node 0 are taken up by the SSDs. It's possible some of the slots are plumbed with PCIe3 from the PCH. I wouldn't expect this to be the case, but I'd have to test the slots to be absolutely sure. I don't have any 29k units to check. You should be able to confirm by plugging in a card, then dumping information via ethtool and lspci.

The only way I remember offhand to trace your way back to the PCIe root port is to look at the memory regions the peripheral is using, then look at the regions the root ports offer. From there, the root port's LnkCap tells you what version it talks and how many lanes wide it is. For example, here's data from my 3600 trimmed by hand:

[Expert@DallasticXL-s01-01:0]# ethtool -i eth5
...
bus-info: 0000:02:00.0

[Expert@DallasticXL-s01-01:0]# lspci -vv
...
00:09.0 PCI bridge: Intel Corporation Device 19a4 (rev 11) (prog-if 00 [Normal decode])
	Memory behind bridge: dfd00000-dfdfffff
	Capabilities: [40] Express (v2) Root Port (Slot-), MSI 00
		LnkCap:	Port #9, Speed 8GT/s, Width x1, ASPM L1, Exit Latency L0s <1us, L1 <4us
		LnkSta:	Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt-
...
02:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network Connection (rev 03)
	Region 0: Memory at dfd00000 (32-bit, non-prefetchable) [size=128K]
	Region 3: Memory at dfd20000 (32-bit, non-prefetchable) [size=16K]
	Capabilities: [a0] Express (v2) Endpoint, MSI 00
		LnkCap:	Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <2us, L1 <16us
		LnkSta:	Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-

Note the peripheral's Region 0 matches with the root port's "Memory behind bridge". From this output, you can see the peripheral is only PCIe1 x1, but the "slot" is capable of PCIe3 x1. Poorly mechanized here:

[Expert@DallasticXL-s01-01:0]# lspci -vv | grep -A 30 "Memory behind bridge: $(lspci -vs "$(ethtool -i eth5 | grep -Po "(?<=bus-info: 0000:)[0-9a-f:\.]+")" | grep -Po "(?<=Memory at )[^ ]+" | head -n 1)" | egrep "Lnk(Cap|Sta):"
		LnkCap:	Port #9, Speed 8GT/s, Width x1, ASPM L1, Exit Latency L0s <1us, L1 <4us
		LnkSta:	Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt-
emmap
MVP Gold CHKP MVP Gold CHKP
MVP Gold CHKP

My limited understanding is that it's about PCI lane usage to optimise performance but I don't have any deep details on it.

0 Kudos
Bob_Zimmerman
MVP Gold
MVP Gold

PCIe is mostly a point-to-point connection standard. Processors offer a certain number of lanes with a certain speed. The boot ROM configures how the lanes are split into slots when initializing the processor. For example, if the processor offers 24 lanes total, the boot ROM is what tells the processor they should be arranged as three x8 slots.

When moving to a new version of PCIe signaling, it's common for processors to offer only some of their lanes at the new version. The Southbridge or Platform Controller Hub (PCH) often take some number of high-speed lanes and provide some number of low-speed lanes (and other low-speed interfaces like USB and SATA). These slower lanes are also arranged into slots by the boot ROM.

All of a given slot's lanes are plumbed to a single root endpoint. That is, you can't have a single slot which connects to both the processor and the PCH, or which connects half of its lanes to each processor.

PCIe switches exist, and allow you to connect more slot lanes to fewer root lanes. I don't think these would be used in any of Check Point's branded boxes.

Intel integrated the RAM controller into the processor quite a while ago. The link between processor sockets is much slower than the link of a processor to its own RAM. Thus, Non-Uniform Memory Access, or NUMA. Each NUMA node has direct access to some of the RAM in the system and to the PCIe lanes it provides, but it has to go through the other processor to get access to some of the RAM and to that processor's PCIe lanes.

For optimal performance, you need to keep traffic off of the inter-socket link. It should come in an interface, get handled by an SND on the same processor which handles the PCIe lanes, deal with memory on the same processor, and go out an interface on the same processor's PCIe lanes.

Looking at a 19200 I have, I see a total of ten PCIe root ports:

[Expert@Some19200]# lspci -vv | grep -A 30 "PCI bridge: Intel Corporation Device" | egrep -o "^([0-9a-f].+$|.+?NUMA node: [0-9]|.{2}(LnkCap:|LnkSta:).+Width x[0-9]{1,2})"
16:02.0 PCI bridge: Intel Corporation Device 347a (rev 04) (prog-if 00 [Normal decode])
	NUMA node: 0
		LnkCap:	Port #1, Speed 16GT/s, Width x16
		LnkSta:	Speed 16GT/s, Width x16
30:02.0 PCI bridge: Intel Corporation Device 347a (rev 04) (prog-if 00 [Normal decode])
	NUMA node: 0
		LnkCap:	Port #5, Speed 16GT/s, Width x16
		LnkSta:	Speed 2.5GT/s, Width x0
4a:02.0 PCI bridge: Intel Corporation Device 347a (rev 04) (prog-if 00 [Normal decode])
	NUMA node: 0
		LnkCap:	Port #13, Speed 16GT/s, Width x16
		LnkSta:	Speed 2.5GT/s, Width x0
64:02.0 PCI bridge: Intel Corporation Device 347a (rev 04) (prog-if 00 [Normal decode])
	NUMA node: 0
		LnkCap:	Port #17, Speed 16GT/s, Width x4
		LnkSta:	Speed 16GT/s, Width x4
64:03.0 PCI bridge: Intel Corporation Device 347b (rev 04) (prog-if 00 [Normal decode])
	NUMA node: 0
		LnkCap:	Port #18, Speed 16GT/s, Width x4
		LnkSta:	Speed 16GT/s, Width x4
64:04.0 PCI bridge: Intel Corporation Device 347c (rev 04) (prog-if 00 [Normal decode])
	NUMA node: 0
		LnkCap:	Port #19, Speed 16GT/s, Width x8
		LnkSta:	Speed 8GT/s, Width x8
97:02.0 PCI bridge: Intel Corporation Device 347a (rev 04) (prog-if 00 [Normal decode])
	NUMA node: 1
		LnkCap:	Port #1, Speed 16GT/s, Width x16
		LnkSta:	Speed 2.5GT/s, Width x0
b0:02.0 PCI bridge: Intel Corporation Device 347a (rev 04) (prog-if 00 [Normal decode])
	NUMA node: 1
		LnkCap:	Port #5, Speed 16GT/s, Width x16
		LnkSta:	Speed 16GT/s, Width x16
c9:02.0 PCI bridge: Intel Corporation Device 347a (rev 04) (prog-if 00 [Normal decode])
	NUMA node: 1
		LnkCap:	Port #13, Speed 16GT/s, Width x16
		LnkSta:	Speed 16GT/s, Width x16
e2:02.0 PCI bridge: Intel Corporation Device 347a (rev 04) (prog-if 00 [Normal decode])
	NUMA node: 1
		LnkCap:	Port #17, Speed 16GT/s, Width x16
		LnkSta:	Speed 2.5GT/s, Width x0

If you check lspci -tv, you can see 64:02.0 and 64:03.0 are connected to the SSDs, which explains why they are x4 slots. 64:04.0 is connected to the SFP28 slots for the interfaces named Sync and Sync2. The rest are all PCIe4 x16 slots, so my concern about some lanes connecting to the PCH isn't valid for this box. It looks to me like the 19200 and the 29200 probably share a configuration, and three of the "slots" on the 19200 just aren't connected to anything. If I'm right, all of the slots on the 29200 are probably PCIe4 x16 for 256gbps of throughput per direction per slot.

0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events