Magic decoder ring for setting CoreXL Virtual syst...

Tommy_Forrest · ‎2021-07-23

Hi everyone!

Is anyone aware of a decoder ring to determine how to configure/tune the virtual system instances value for a given VS when migrating real hardware into a VS instance?

For example, I have a (active/passive clustered) 26k gateway that, on average, floats between 40 and 60 percent of CPU load at any given moment. And on rare occasion, we'll see peaks to nearly 80%.

In the coming months we'll be migrating the workload on that gateway to a security group in a Maestro backed VSX instance that's going to live on 3 28600's.

We're going to have three additional instances living on the security group. 2 of those instances will be minor players. The 3rd instance will be a moderate player.

So taking all of that into consideration, how do you figure out how to configure CoreXL for each VS instance when you have live data from real hardware to look at?

And oh yeah, keeping that value in check across all of the instances so there's still room to grow.

Timothy_Hall · ‎2021-07-24

One variable is a possible transition from kernel mode to User Space Firewall (USFW) which incurs more overhead but allows up to 70 cores to be used. USFW is enabled on both the 26k and 28k models by default (and USFW is more or less VSX by any other name), so amount of overhead should be about the same.

Next variable is per-CPU speed, if you have the non-turbo 26k appliance you do get a step-up in performance of about 25% moving to a 28k, while the per-CPU speed of the turbo 26k and 28k are roughly equivalent as far as I know. Obviously the 28k may have more available total cores for allocation than the 26k depending on model.

Not sure if the cpsizeme tool takes into account converting from a regular firewall to VSX, but might be worth a look. Beyond that it is a bit of crap shoot, perhaps @Kaspars_Zibarts can shed some light here.

Gaia 4.18 (R82) Immersion Tips, Tricks, & Best Practices Video Course
Now Available at https://shadowpeak.com/gaia4-18-immersion-course

Tommy_Forrest · ‎2021-07-24

Just got done doing a cpsizeme exersize to replace some other gateways. It just gives you the details. It didn't delve much into what tuning paramaters one would need.

Ideally, it would be nice if someone could say: "Generally speaking, if you set a value of 1 it will provide the performance of X and Y".

Just to get a baseline.

Especially since it is so punitive to adjust it after going into production.

People don't like the internet going down.

Kaspars_Zibarts · ‎2021-08-16

Hi @Tommy_Forrest !

Maybe my reply will be too late, sorry, had my summer holidays 🙂 so bits and bytes were shelved for a while.

Seeing those CPU usage numbers (80% max) made me little nervous if I'm totally honest. Normally I would not recommend VSX for gateways running high load. And seeing it's 26K it must be running fairly hot at 80%.

Are you able to share more details, like:

what blades are you running
how big is NAT usage
total throughput
total connections
total new connections per second (last three you can see from cpview easily)
current SND / FW core split
acceleration status

VSX is great product but I would I avoid "traffic-heavy" multi-blade firewalling and stick with a regular gateways instead. We are actually in process of splitting out one extremely loaded VSX platform into multiple and some will become regular FWs instead of VSes.

Same goes to Maestro / Scaleable Platforms. It's a great product in principle but takes toll in administration - it does require additional knowledge that's not that easy to find on the job market and troubleshooting gets way more complex due to flow correction. We ended up actually migrating back from two chassis (41k) with 4 blades to a cluster of appliances (26K turbo) just for those two reasons. I.e. people who had knowledge had left organisation and we could not find replacements.

That said, I do not want to scary you off the idea 🙂

_Val_ · ‎2021-08-16

Agree with @Kaspars_Zibarts, 80% raises a concern. Although, I would move to Maestro nowadays.

Tommy_Forrest · ‎2021-08-16

@Kaspars_Zibarts / @_Val_

For the gateways in question, they will be backed by MHO170's and 4 - 28600's.

NAT isn't a concern, we do NATing upstream from the gateways.
Throughput is in the 3-6Gbps range
PPS is in the 1.1 to 1.2 million range
CPS is in the, 5000-6000 range

As for blades, right now, IDS,AV,AB,IPS. We would ideally like to enable application for O365 stuff, especially.

"That said, I do not want to scary you off the idea" - Too late, we're fully committed to this solution. Check Point was a full partner in us fleshing out this design. Our local sales folks had the national design folks in on our sessions.

All that said, it would be nice to have a good base number to setup for the instance count. Maybe some sort of matrix. Or calculator that asks all those questions and spits out a number.

And so far, in my experience, Maestro isn't that hard. 🙂

Kaspars_Zibarts · ‎2021-08-16

Hi! Maestro / SP isn't hard in principle but it adds challenges when troubleshooting complex cases. To give you an example we found a bug for a connection that was corrected over two SGMs and if TCP RST packet arrived simultaneously from both ends, then connection was left "hanging" in one of SGMs and eventually exhausted memory. To get to bottom of it, took me over two months and gigabytes of packet captures on multiple SGMs and trying to compare them. Hopefully you have better luck retaining resources that are trained on Maestro/SP 🙂

Considering blade usage, I'm not surprised with somewhat low throughput numbers yet high CPU. Running "just" FW blade, we are pushing 30Gbps at approx 50-60% CPU.

Hopefully all works out for you guys! Make sure you discuss CPU design with CP techs - i.e. SND / FW split and FW allocation for different VSes: will you use a common CPU resource pool for all or do more strict per VS manual allocation to protect important VSes. And see that your monitoring/reporting systems are configured to work with VSX/Maestro in order to have statistics available straight away as SNMP counters will change a lot 🙂 My two cents

Tommy_Forrest · ‎2021-08-17

"Make sure you discuss CPU design with CP techs - i.e. SND / FW split and FW allocation for different VSes"

Well, that's kinda why I'm posting here about the subject! 🙂

_Val_ · ‎2021-08-17

What he probably means is consulting with your local office and/or PS to have your particular config "rubber-stamped" and officially approved 🙂 This takes some of the responsibilities from your burden, regardless the outcome

Tommy_Forrest · ‎2021-08-17

Oh, all that's been done. PS has been engaged the entire time.

But they can't solve the decoder ring about what this thread is all about. I still don't have any information on how to pre-tune that value. Diamond-TAC suggested starting no less than 10.

_Val_ · ‎2021-08-17

Got, it, thanks for the clarification. With all due respect to all great minds we have here, not sure the outcome will be more effective than what PS/Diamond TAC tell you.

Tommy_Forrest · ‎2021-08-17

They're only providing a best guess.

I feel like R&D should have the ability to maybe take cpsizeme data from a string of gateways and a given use-case for how all of that would get squished into a given Maestro configuration and spit out a baseline value.

Something a little more scientific than a guess.

I wouldn't be so OCD about this if changing that value didn't prompt a downtime. And if you don't get that value right you're going to have an unplanned downtime anyway. I know, I've been there. It wasn't pleasant.

Timothy_Hall · ‎2021-08-17

There are so many variables involved it really is always going to be a best educated guess. I've learned the hard way that performance estimation & optimization is truly an art as well as a science.

Gaia 4.18 (R82) Immersion Tips, Tricks, & Best Practices Video Course
Now Available at https://shadowpeak.com/gaia4-18-immersion-course

Tommy_Forrest · ‎2021-08-17

Hey Tim. I don't disagree that tuning is as much of an art as science. In another career I was tuning Java stacks left and right for Websphere and ColdFusion.

However, with regards to tuning CoreXL in an environment like this it is all guessing at this point in time. Check Point should have performance metrics for their gear. I don't see why, if given a cpsizeme report of sufficient length for a given configuration (blades, hardware size/build) and a known variable such as "my new security group is built like XYZ" and "We plan to do ABC with the rest of the environment" they should have a calculator as to what to use as a good starting "guess".

So, as a discussion point:

Hey, CheckPoint, I'm running Application "Super Awesome" on a clustered pair of 15600's with blades AV,IPS,IDC,AB, with X amount of bandwidth and Y amount of traffic slow and CPU is XYZ and memory is ABC (and all of these other values you can get out of cpsizeme."

I want to move "Super Awesome" (and insert other applications/gateways here) to a Maestro solution with MHO-XYZ and backed with X number of ABC gateways with a given set of blades enabled.

Plug all of that into a calculator and it spits out a value for CoreXL to start from.

I don't see why that would be so hard since they have all the metrics.

Kaspars_Zibarts · ‎2021-08-17

can't agree more! 🙂 have stumbled across this so many times! I think PS should be able to do fairly good calculation anyways, even tool does not exist - they have experience with other customers so should be able to do it

genisis__ · ‎2021-08-17

I'm going through this as well, I've got to the point that I don't trust Checkpoints cpsizeme and now requesting statements and demos to backup the theory (Especially when it comes to Maestro as its a significant investment for the business)

My requirement as an example:

- I require 10Gbps throughput and the maximum CPU utilisation, at this throughput level must not exceed 70%.

- Assume the firewall policy is 500-600 rules and is not optimized.

- Assume FW/IPS/AV/ABOT/MONITOR blades are enabled.

I'm using the above as a general statement of requirements even of traditional gateways.

Kaspars_Zibarts · ‎2021-08-17

That's why I needed to know your current SND/FW split and CPU utilisation on both sides (in case it's rather different) 🙂

From what I understood, the three new instances / VSes that will be added into the mix will be considerably smaller?

Tommy_Forrest · ‎2021-08-17

@Kaspars_Zibarts- I think there might be some confusion. We are in the middle of a huge project migrating to a brand new data center with brand new gear.

In the current DC, I have a ton of firewalls that we're going to collapse into these new shiny Maestro environments. Let me lay out 1 example:

In the old DC we have internet hosted on a 3 node cluster of 26600's, it's our heavy hitter with the metrics described above. We have guest internet on a pair of 15600's. And we have an Always On VPN solution that is hosted on a pair of 7000's.

In the new DC, I've got a pair of MHO-170's in HA configured with 2 security groups. SG1 is going to host everything discussed in the paragraph above on 4 - 28600's (we scored a 4th box from the boss, he was in a good mood last week).

On SG1, there will be 3 vsenv's. Internet, Guest Internet and AoVPN.

The current performance metrics are known and can easily be collected in cpsizeme.

How can one take the data of the existing gateways and translate it and collapse it down to easily digest what the CoreXL value should be for each VS?

Kaspars_Zibarts · ‎2021-08-17

Sent you pm @Tommy_Forrest

genisis__ · ‎2021-07-24

Taking Mastero out of play (mainly because I've not touched that yet).

Are you able to do load testing on the new VS to determine how many fwk instances should be used, I would start higher and then monitor to see where the sweet spot is, example give it 10 fwk instances to start with.

Of course you can do manual affinity, or if your running R80.10 perhaps even see dynamic dispatcher works (which I believe is now supported in R80.10).

Tommy_Forrest · ‎2021-07-24

We don't have the horsepower to test the new environment. I am keenly knowledgeable about how the existing environment is behaving though.

The current 26k's are on 80.40. The new Maestro/VSX environment is on R81.

I highly recommend playing with Maestro. It is AMAZINGLY cool! The limitations around NATing not withstanding.

genisis__ · ‎2021-07-24

It is something I hope to explore to soon.

genisis__ · ‎2021-08-17

Suggest going to R81.10 if you can on Maestro. b.t.w watch out for "mixed" appliances support. example you cannot mix a 6700 and a 7000 in the same Security group (See SK162373).

So not only is there a question about the CPU design but also appliance mixing as you may wish to increase resources but with smaller or larger tin.

Tommy_Forrest · ‎2021-08-17

We won't be able to get to R81.10, as much as I really want too in time. This gear is about to go live. We are on R81 JHT36. I won't be able to get the MDSes upgraded to R81.10 until next month sometime when they move to new hardware.

Mix-n-match-n isn't a concern for us as this is all new gear and there's a LOT of it. It will be an issue when it comes time to upgrade the hardware a few years down the road. Maybe I'll be retired by then and won't have to worry about it! 🙂

Are you a member of CheckMates?

Magic decoder ring for setting CoreXL Virtual system instances in VSX