Re: Experience with vulnerability scanner in the i...

Kolafer · ‎2022-12-08

hi community,

maybe someone can share their experience with vulnerability scans in the internal network.

we use tenable scanner to scan networks for vulnerabilities. This scans affect the firewall performance with twice conncurent connections and connection per socconds. the throughput does not really increase.

I do not thing that entry in the fast_accel table will help here, because we have alot of new connections oder various destination ports.

If you compare the throughput, connection/seconds and concurrent connection with the data sheet, then we're only at half the performance, so more should be possible.

Memory also not increasing but the cpu of snd and fw_worker is increasing to 80% and with 100% spikes.
Dynamic balancing is active.

Do you have any tips on where I can look?

Many thanks

PhoneBoy · ‎2022-12-08

A vulnerability scanner mostly generates a lot of new connections and doesn't really pass a lot of data.
So this is expected behavior.

You should be able to use any for both the protocol and port parameters to the fast_accel command.

Kolafer · ‎2022-12-08

We only use the fw blade. And i also put the ip's of scanner in the fast_accel table.

Is this really all what we can do ?

PhoneBoy · ‎2022-12-08

Sorry, I don’t understand what “fast_accel Betty for the scanner invalidates in the table” means, can you clarify?
Also, what version/JHF are you running on what kind of appliance?
Please also provide Super Seven output: https://community.checkpoint.com/t5/Scripts/S7PAC-Super-Seven-Performance-Assessment-Commands/td-p/4...

the_rock · ‎2022-12-08

I second what phoneboy said, I had seen that be indeed an expected behavior. It does not really matter, in my experience at least, how many blades you have enabled on the firewall.

Best,
Andy

Kolafer · ‎2022-12-09

This is a 26000 Appliance and have only FW Blade active. Version R80.40 JHF 156

s7pac attached.

Timothy_Hall · ‎2022-12-09

OK sorry to come in late on this thread. After looking at your s7pac output and other screenshots:

1) Looks like dynamic balancing/split has you in a 28/44 split on your 72-core firewall which is to be expected with only the FW blade active. However one interesting side effect of this is that there are 60 firewall instances/workers, but only 44 cores available for them. So the firewall instances are doubling and tripling up on certain cores, presumably for the duration of their existing connections to decay and the firewall instance to be eventually de-allocated.

2) The high CPU on your firewall workers/instances is almost certainly caused by the very high number of rulebase lookups occuring on your firewall instances in the F2F path (and possibly exacerbated by the associated generation of logs), because the scanners are launching new accepted connections at a very high rate. The first packet of every new connection must be handled by a worker, whether the connection is matched to a SecureXL Accept template (less likely) or has to perform a full rulebase lookup in F2F (more likely). Assuming the scanners are hitting lots of diverse destination IP addresses, very few accept templates will be formed and you will be stuck with the full overhead of F2F rule base lookups for each new connection.

3) fast_accel or disabling the IPS blade will not help this situation, as those features only specify what to do after that first packet has matched a rule or template on a worker, which would be to offload subsequent packets to the SecureXL accelerated path (fast_accel), or disable IPS inspection of the subsequent data stream inside the connection which will happen anyway with fast_accel active.

So trying to improve this situation will depend on your mutually-exclusive goal:

Goal: Allow scans to run as fast and accurate as possible, all other traffic be damned

In your firewall/network policy add a rule as close to the top as possible matching the source IP addresses of the scanning systems with an action of Accept and a Track of None. This would have made a huge difference in R77.30 and earlier, but in R80.10+ with the introduction of Column-based matching I'm not sure how much this will help (if at all) but it is worth a try. If your scanning traffic is currently being accepted after rule 1898 where it is ineligible for templating this change will definitely help, perhaps a lot. The savings in logging overhead may improve the CPU issue on the workers as well, also make sure Accounting is not enabled for this rule.

Ideally if you could somehow manually add broad-ranging SecureXL Accept templates for these scanning systems that would be great, but that is not possible as far as I know.

Goal: Limit the impact of scans on firewall CPU to prefer non-scanning traffic, with the side effect of some (or a lot of) scan traffic getting lost/throttled

Establish new connection quotas that will be enforced directly by SecureXL with no firewall worker involvement via the fwaccel dos rate command: sk112454: How to configure Rate Limiting rules for DoS Mitigation (R80.20 and higher).

I'll give this situation some more thought, but this is the best I can come up with for now.

Gaia 4.18 (R82) Immersion Tips, Tricks, & Best Practices Video Course
Now Available at https://shadowpeak.com/gaia4-18-immersion-course

Kolafer · ‎2022-12-09

Thanks

honestly we have already a firewall policy on top of the policy rule base, which we do not log and we also change the tcp session timeout for that specific service from the default 3600 to 20.

Than we need to slow down the speed of scan as fap as possible.

It was important for me to understand why the values are much smaller in the productive environment compared to the data sheet.

Chris_Atkinson · ‎2022-12-09

Have you thought about moving the scanner?

What are you actually trying to achieve, speed up the scan?

CCSM R77/R80/ELITE

Kolafer · ‎2022-12-09

Try to unterstand why the appliance reach it limits with high 80-100% cpu.

Datasheet:

Performance specification:

Firewall (Gbps) ---- 106.2 Gbps
Connections/sec ---- 550,000
Concurrent connections3 -----10/20/32M

We have only this and the firewall are on the limit.

Firewall (Gbps) ---- 21.3 Gbps
Connections/sec ---- 90 000
Concurrent connections3 -----2,5 M

nearly nothing if compare to datasheet.

Daniel_Kavan · ‎2022-12-09

It would be nice to have somewhere in dashboard to define "internal scanners"?

IOW, to whitelist them from IPS DoS attacks etc. Nesus, lumeta etc.

I found this as well.

https://community.checkpoint.com/t5/Security-Gateways/Allow-my-vulnerability-scanner-through-gateway...

PhoneBoy · ‎2022-12-09

When the scanner is running, where precisely are you seeing 80-100% CPU?
Is it just on a few CPUs?

I suspect the vulnerability scanner traffic is kinda like an elephant flow, given it all comes from a single source.
Even our Hyperflow feature in R81.20 requires features other than firewall and VPN to be enabled.
Which means there may not be much you can do about it.

I recommend engaging with the TAC here.

Daniel_Kavan · ‎2022-12-09

When creating an IPS exception, if I leave Protection/Site/File/Blade as N/A, then action as inactive, and source as my nessus scanner, will that set ALL IPS defenses as inactive? There is a slash Blade, but I don't see an option to set blade to IPS. spoke too soon, I was able to select blade IPS. nice.

1. disable IPS blade for nessus and other scanners

2. add them to table.def per sk104468

3. push policy

PhoneBoy · ‎2022-12-09

Believe so, yes, but if you want to ensure only Firewall and VPN are used for a specific connection, use fast_accel.

Kolafer · ‎2022-12-09

On all cpu's. CPview attached. This is not one scanner there are 8 scanner 🙂

TAC only mention to check the scanner and check what we can do on the scanner it self.

PhoneBoy · ‎2022-12-09

TAC is probably right in this case as you’re at the maximum number of worker cores for that platform and they’re all at ~80%.
Almost all your packets are being accelerated to boot.

Every platform has a limit to the number of new connections per second that can be opened.
I believe this is represented on the data sheet in the lab performance section (meaning under ideal conditions): 550,000/sec.
The real world number is lower, obviously, and with 8 vulnerability scanners operating through your gateway, you could easily be bumping into this limit.

Are you a member of CheckMates?

Experience with vulnerability scanner in the internal network