Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Wolfgang
Authority
Authority
Jump to solution

Quantum Force CPU cores <=> virtual

For some of the new Quantum force appliances curious virtual cores are shown in the datasheet. 10 physical and only 16 virtual. Should be 20 or not ?

9200 => 1 CPU, 4 physical cores, 8 virtual 

9300 => 1 CPU, 10 physical cores, 16 virtual

9400 => 1 CPU, 14 physical cores, 20 virtual

9700 => 1 CPU, 16 physical cores, 32 virtual

0 Kudos
2 Solutions

Accepted Solutions
danel
Employee
Employee

Hi Wolfgang

The new 9300 and 9400 appliances contain Intel's CPUs in hybrid architecture,
were some of the cores are power efficient (E cores) and some are performance cores (P cores).

Power efficient cores are designed to run non-critical tasks while the performance cores are designed for intensive workloads.
Also, each power efficient core support a single virtual core where each performance core support two.

So the number of virtual cores are based on the number of each power efficient cores and performance cores:

In 9300 appliances:
No. Performance cores: 6 (12 virtual)
No. Efficient cores:          4 (4 virtual)
Total cores:                     10 (16 virtual)

In 9400 appliances:
No. Performance cores: 6 (12 virtual)
No. Efficient cores:          8 (8 virtual)
Total cores:                     14 (20 virtual)

Hope it helps,
Dan.

View solution in original post

(3)
danel
Employee
Employee

Hi @Timothy_Hall ,
The following SK explains the behavior of setting cores for SNDs and Firewall Workers
in hybrid architecture CPUs:
https://support.checkpoint.com/results/sk/sk182000

Hope it helps,
Dan.

cc: @genisis__ 

View solution in original post

(1)
26 Replies
Markus_Genser
Contributor

I think they will use the Intel 12th or 13th gen CPUs with the P and E cores, where only the P cores are hyper threaded.

https://en.wikipedia.org/wiki/Alder_Lake

(2)
danel
Employee
Employee

Hi Wolfgang

The new 9300 and 9400 appliances contain Intel's CPUs in hybrid architecture,
were some of the cores are power efficient (E cores) and some are performance cores (P cores).

Power efficient cores are designed to run non-critical tasks while the performance cores are designed for intensive workloads.
Also, each power efficient core support a single virtual core where each performance core support two.

So the number of virtual cores are based on the number of each power efficient cores and performance cores:

In 9300 appliances:
No. Performance cores: 6 (12 virtual)
No. Efficient cores:          4 (4 virtual)
Total cores:                     10 (16 virtual)

In 9400 appliances:
No. Performance cores: 6 (12 virtual)
No. Efficient cores:          8 (8 virtual)
Total cores:                     14 (20 virtual)

Hope it helps,
Dan.

(3)
Timothy_Hall
Legend Legend
Legend

Hi @danel,

Is CoreXL aware of P-cores vs. E-cores and treats them differently?  Or does it not distinguish them and just rely on Multi-Queue and the Dynamic Dispatcher to not overload the E-cores, which overall look to be about 30% slower than P-cores? 

As an example there is no way I'd want an SND instance running on an E-Core, I'd also not like to see an elephant flow get assigned to a Firewall Worker Instance on an E-core (although Hyperflow can help out to some degree). 

This seems like a scheme to save power/battery which I suppose I'd expect on a laptop, but not a server.  I've had some very bad experiences with server-based power saving schemes in the past (i.e. P-States, C-States, & HPC Optimizations, Dynamic Power Capping) in regards to performance.  Thanks!

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
danel
Employee
Employee

Hi Timothy,
An SK which explains the logic of which type of cores (E or P) is affined to SND or Firewall Worker in MultiQueue auto mode and in Dynamic Dispatcher will be published soon.
I'll post it here when it's ready.

Thanks,
Dan.

genisis__
Leader Leader
Leader

I look forward to that.

danel
Employee
Employee

Hi @Timothy_Hall ,
The following SK explains the behavior of setting cores for SNDs and Firewall Workers
in hybrid architecture CPUs:
https://support.checkpoint.com/results/sk/sk182000

Hope it helps,
Dan.

cc: @genisis__ 

(1)
Bob_Zimmerman
Authority
Authority

So instead of relying on the Linux scheduler, the worker threads are pinned to exact cores, and the CoreXL scheduler is asymmetric-core-aware? Interesting.

Which cores are removed from the worker pool in what order? If I have a 9300 (6p4e) and I decide to run two SNDs and 13 workers, am I left with an unloaded hyperthread, E-core, or P-core?

0 Kudos
AmitShmuel
Employee
Employee

Hi Bob,

The FW worker threads have been pinned to exact cores since day one of CoreXL.

What do you mean by asymmetric-core-aware? The CoreXL dispatcher is aware of queue utilization, and CPU utilization, hence, naturally, if E-cores are working harder, it will dispatch less connections to them.

Regarding SNDs/workers split, as always, you configure the number of workers via cpconfig, and those will be pinned to the cores in decreasing order, from the top, and the remaining cores will be assigned to SNDs.
Nowadays it is less common to manually configure CoreXL, as Dynamic Balancing does that for you out of the box.

0 Kudos
Bob_Zimmerman
Authority
Authority

I handled a few tickets in the early days of CoreXL with workers which moved from core to core as the Linux scheduler wanted. Here's some output from one of my R81.20 jumbo 26 firewalls taken just a few minutes ago:

[Expert@MyFirewall:3]# fw ctl multik stat
ID  | Active  | CPU    | Connections | Peak    
-----------------------------------------------
0   | Yes     | 2-7    |         501 |      859
1   | Yes     | 2-7    |          17 |       91
2   | Yes     | 2-7    |          25 |       97
3   | Yes     | 2-7    |          24 |      103

They're definitely not always pinned to specific cores.

As for "asymmetric-core-aware", I mean just that. Until fairly recently, all cores available to a kernel scheduler had the same capabilities and the same performance characteristics (outside the mainframe world, anyway). This is the symmetric in "symmetric multiprocessing". Relatively recently, asymmetric core complexes have become available outside of mainframes and other exotic environments. The first really common one I saw was ARM's big.LITTLE arrangement. The Linux scheduler at the time didn't know that the little cores couldn't handle the same amount of work in the time slice. It would schedule a relatively light process on a little core, which would max it out, so the process would get moved to a big core, leaving the little core idle, so a process would get moved back to it. This would thrash the cores' caches, seriously hurting performance.

The Linux kernel scheduler didn't become sufficiently aware of Intel's E-core differences to allow effective use of P-cores until 5.15.35 in early 2022.

0 Kudos
AmitShmuel
Employee
Employee

This output doesn't seem right, it usually means that something in CoreXL is misconfigured.

I'd be happy to try and help offline, feel free to contact me at amitshm@checkpoint.com 

0 Kudos
emmap
Employee
Employee

This is output from VSX, in VSX the VS worker threads are not pinned to CPU cores, they a free to float about the pool of cores available for them (in your case, 2-7). 

Timothy_Hall
Legend Legend
Legend

Thanks the SK was very informative.  After thinking about the differences between P-cores and E-cores this is a pretty big deal.

All cores have always been assumed to have equal capabilities, with the lone exception that a SMT/Hyperthreaded "core" is really two threads heading for the same physical core.  In the past various server-based power-saving schemes have wreaked havoc, to the point that for open hardware servers on pages 83-84 of my last Max Power book I advised disabling all these schemes in the BIOS to help ensure that all cores are equal at all times, power consumption be damned.  This is a pretty fundamental tenet of how CoreXL and SecureXL works, with various technologies such as Multi-Queue, the Dynamic Dispatcher and Hyperflow to keep the load between all the equal cores relatively balanced.

With some cores much faster than others (and possibly supporting different processor extensions and CPU cache sizes between them - see below), I can envision a number of scenarios where bottlenecks may occur that weren't really possible before.  A few thoughts which are pure speculation:

1) Are E-Cores currently acting as Firewall Workers eligible to be reassigned as PPE-based cores for Hyperflow when an elephant is detected?  I would assume not.

2) When detecting spikes via the Spike Detective on an E-Core, will the same thresholds apply as do for a P-core?  Given the disparity the thresholds might need to be a little lower for an E-Core.

3) I'm a little concerned that a P-Core SND blasting a high rate of packets at an slower E-Core Firewall Worker may cause CoreXL queuing problems (and loss) for traffic trying to reach the E-Core Firewall Worker.

4) Same threshold for activation of Priority Queuing on an E-Core vs. P-Core?  Might need to adjust it a little lower on an E-Core to try to keep it out of queuing trouble sooner?

5) It appears E-Cores will support fewer processor extensions than P-cores.  My concern here is encryption-oriented operations such as VPNs and HTTPS Inspection.  Extension AES-NI is an obvious example, I'd think we'd want to keep encryption-based operations away from an E-Core if advanced processor extensions are being relied upon for performance?  AVX512?

6) E-Cores will have much less fast CPU cache than P-Cores.  Would we want to keep operations away from E-Cores that rely heavily on fully-populated "hot" CPU fast caches for performance?  Operations like rulebase lookups perhaps?

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
(1)
AmitShmuel
Employee
Employee

Hi Tim,

Thanks for the great questions and feedback 🙂

1) Your assumption is correct, both SND and PPE workers are excluded from E-Cores pool, I've added it to the SK.

2+4) The Dynamic Dispatcher should naturally send less connections to FW workers running on E-Cores, keeping the cores mostly balanced. So far, we haven't observed any significant difference in our testing or customers environments in regards to those features. Super instance is a good example of our ability to work with cores with varying capabilities.

3+6) Unless we're talking about Elephant Flows, the Dynamic Dispatcher should take care of that, sending less connection to that FW instance. As to Elephant Flows, in the upcoming JHF, Dynamic Balancing will be able to swap E-core FW instance handling an EF, with a P-core FW instance, making sure heavily loaded workers get the benefit of stronger cores.

5) Intel has made sure both cores have the same instruction set.

Hope that clears things up, feel free to contact me offline anytime.

Thanks!
Amit

(1)
Timothy_Hall
Legend Legend
Legend

Looks like you guys have thought of everything, as usual.  Very interesting about the new "swapping" ability for a heavy flow off an E-Core to P-Core, that must have taken some serious work as normally something like that would not be possible.  Could this eventually lead to the ability on all firewalls for existing "mice" connections trapped on a Firewall Worker with an big meanie elephant flow to "scurry away" to another worker core for handling, and thus cease getting stomped by the elephant on the overloaded core?

I'm still not real clear on why Intel is doing this kind of power-saving scheme at the server level (other than an attempt to appear to be keeping up with AMD), but that is a discussion for a completely different forum than CheckMates.  😀 Thanks!

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
0 Kudos
AmitShmuel
Employee
Employee

The "swapping" refers to the worker's affinity, so we only move the thread from one core to another, it will still handle the same connections.

0 Kudos
Timothy_Hall
Legend Legend
Legend

So basically the firewall instance/worker on the E-Core (including all its elephants and mice) gets moved to a P-core, and the firewall instance/worker formerly on that selected P-Core moves to to the E-Core.  Got it.  Thought for a moment there you meant that a connection or series of connections could simply be moved to a new Firewall Worker instance without restarting them, but it looks like they all have to go together which makes sense.  Thanks!

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
Daniel_Kavan
Advisor

Does processor speed make any difference?   I would assume the processors are faster in a 9300 than a 5800, but I don't see any documentation on the exact processor being used on the white paper.  IOW, if I need more firewall power, I'm tempted to go with a 9300 (16 cores) instead of the recommended replacement for a 5800 the 9200 (8 cores).   

https://www.checkpoint.com/downloads/products/quantum-force-9300-datasheet.pdf

Looking at it another way the recommended path from 6900 is 9300, both 16c.   But if the processors are more capable, why not go with the 9200 (8c)?   

Or is the overall recommendation just to look at the overall firewall throughput 70 Gbps (9300) vs 22 Gbps (5800) vs 37 Gbps (6900) vs 60 Gbps (9200).  In my case looking at NGFW is more telling.  5800 is at 2 and the 6900 is at 17 !

0 Kudos
Bob_Zimmerman
Authority
Authority

Processor speed makes some difference, but not much. Most of Intel's processors support what they call "Turbo Boost" which allows the processor to run at a higher speed, sometimes more than twice the nominal clock speed. For example, the Intel i5-8257U has a nominal clock speed of 1.4 GHz, but a max turbo speed of 3.9 GHz. This is limited by power supply and by heat sinking, which is rarely shared. Some high-end processors today have a nominal TDP of about 250W, but they can draw as much as 800W for brief stretches.

When comparing branded boxes, you should look mostly at the rated throughputs, but fewer cores is better for elephant flows. When comparing open servers, pick the processor option with the highest TDP per core.

Daniel_Kavan
Advisor

One other question/clarification, when you see the NGFW speed of 9300 firewall throughput at 28.2 Gbps that's just internal to the firewall correct?   We only have a 1 Gbps external ISP connection for example.  Management thinks that applies externallu, so 28.2 Gbps wouldn't be used.

0 Kudos
Bob_Zimmerman
Authority
Authority

All figures are total throughput for the box. Internal, external, everything. Keep in mind most connections are duplex, and uploading something at 1g plus downloading something at 1g represents 2g of throughput.

0 Kudos
Bob_Zimmerman
Authority
Authority

Intel isn't doing asymmetric processors at the server level. All of the current and announced Xeons use only one kind of core. The E cores are based on the Atom line, and the P cores are based on the Pentium/Core line. An upcoming line of Xeons called Sierra Forest uses only E cores (specifically Crestmont), making it a de-facto successor to the Xeon Phi and possibly Atom C###. Then another upcoming line of Xeons called Granite Rapids has only P-cores, succeeding the current Xeon line.

Intel's only chips with P-cores and E-cores together are their consumer processors (Core i3 through Core i9). Based on the core and thread counts in the datasheets, the 9300 uses an Alder Lake or Raptor Lake i5 and the 9400 uses an Alder Lake i7 or Raptor Lake i5 or i7. Interestingly, Intel has finally started allowing their consumer chips to use ECC, so depending on the exact processor model and the chipset in use, the 9300 and 9400 might support ECC RAM.

0 Kudos
Bob_Zimmerman
Authority
Authority

Possibly worth noting that the data sheets have some obvious inaccuracies. I'm not sure how much I trust them at this point. For example, the 9100 through 9400 claim they have an SFP28 slot for sync, but the photos definitely don't have an SFP slot of any kind. The catalog also lists those models as having 8x1/10g copper interfaces built in.

genisis__
Leader Leader
Leader

That's not a good start, I also think the price point is way too high, totally get the appliances are more powerful, but cost is and always has been a massive sticking point.


0 Kudos
_Val_
Admin
Admin

@Bob_Zimmerman @genisis__ 

Why are we talking about price here? Let's stay on point. The appliance sheet has correct information, and the reason for the odd amount of cores is already established in this discussion.

0 Kudos
Bob_Zimmerman
Authority
Authority

It has some correct information, but also some obviously incorrect information. For example, this screenshot is from the fourth page of the 9300 data sheet as loaded under ten minutes before making this post:

The 9300 includes 1x 1/10 copper GbE management ports plus an additional 1x 1/10/25 SFP28 ports for synchronization when using the 9300 PLUS in a cluster. With one network I/O slot, modify the base or plus configuration to meet your networking requirements.The 9300 includes 1x 1/10 copper GbE management ports plus an additional 1x 1/10/25 SFP28 ports for synchronization when using the 9300 PLUS in a cluster. With one network I/O slot, modify the base or plus configuration to meet your networking requirements.

My company's sales team has confirmed for me the 9300 has no 1/10g copper interfaces, and no SFP28 slot.

0 Kudos
_Val_
Admin
Admin

Thanks for pointing this out. However, I ask again, please stay on topic. The current post is about the number of cores. If you want to report errors in the datasheets, please use different means to convey it. 

Appreciate your understanding.

0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events