Here are the answers to some of the chat questions. Note that some of these answers and elements of the presentation will be slightly different for R80.20 gateway which was released the same day as this webinar.
Can you tell me a recommended ratio of SND/ firewall cores? Eg: 12 cores and 2 core.
As mentioned in the presentation, this depends on which blades you have enabled, how much traffic is being fully-accelerated by SecureXL in the SXL path, and whether you are seeing RX-DRPs. In general though it is common to allocate more SND/IRQ cores than are allocated by default, especially on larger firewalls. The "Super Seven" commands covered in the presentation should help you quickly make that determination. Table 4 on page 228 of my book also makes some initial core allocation recommendations.
Do you recommend to enable corexl with 2 cores? R77.30 with just 2 cores. One is always hitting 100% and Check Point support distributed the interfaces across the 2 cores with the sim affinity -s command but I have not seen an improvement. Should I look at Dynamic Dispatcher next?
By default with a 2-core box there will be 2 SND/IRQ instances and 2 Firewall Worker instances, so each of the two physical cores will be serving both functions. If performance is unacceptable try disabling CoreXL from cpconfig (reboot required) which will allocate one core as a SND/IRQ and one core as a Firewall Worker. This *may* help performance over the default setup depending on your configuration but there is no way to know in advance for sure without trying it. It is unlikely that the additional overhead incurred by using the Dynamic Dispatcher on a 2-core box with CoreXL enabled will help, and might actually make things worse.
So when transitioning from R77.30 to R80.10, the behavior changes for firewall worker cores. Do we need to do tuning when upgrading to R80.10 to handle this? We have a 4 core box.
Generally speaking, no. However due to the R80.10 change in process affinity mentioned in the presentation, you may notice higher CPU utilization on the Firewall Worker cores and lower CPU utilization on the SND/IRQ cores. Whether this difference will be enough to dictate an adjustment of the CoreXL split is tough to say.
Can i make few packets pass SXL and few on F2F?
There is no direct way to force certain traffic into a particular path like PXL, other than the "Selectively disabling SecureXL for certain IP addresses " trick covered in the presentation which makes certain traffic go F2F. The firewall will always attempt to handle traffic in the most efficient path possible.
In big boxes, we can see cores allocated to interfaces being highly used once the rest of SND cores not (even 90% of the traffic accelerated). what is the relation between Interface affinity and SND?
By default if SecureXL is enabled, Automatic Interface Affinity is active and every 60 seconds will automatically try to balance handling of SoftIRQ and accelerated traffic operations among the defined SND/IRQ cores. But no matter how the traffic is balanced, by default only one SND/IRQ core can empty a particular physical NIC's ring buffer. On a firewall with enough SND/IRQ cores defined, typically the busiest interfaces will end up with their own dedicated SND/IRQ core while the less-busy interfaces share one (current allocation can be viewed with sim affinity -l). However even with a dedicated SND/IRQ core, RX-DRPs can still occur if the busy interface is utilized enough, and this is a textbook case where Multi-Queue needs to be enabled on that interface to allow the other SND/IRQ cores to help out with that particular busy interface.
Though secureXL is obviously assisting in better performance, according to Checkpoint's documentation, better performance can sometimes be observed when SecureXL is actually off. Would it be possible for us to know in which cases that would hold true? "With CoreXL, there are cases when performance without SecureXL is better than with it, even when SecureXL does manage to accelerate part of the traffic."
Those cases are going to be pretty rare in today's world. If 100% of the traffic is F2F due to the conditions specified on slides 43/44 and you can't do anything about it, I suppose removing the SXL path completely by disabling SecureXL would be more efficient. However Automatic Interface Affinity would also go away resulting in the nasty effects detailed on slide 47, and the need to configure manual interface affinity.
Is it okay to disable CoreXL during business hours for troubleshooting purposes?
Disabling CoreXL requires a reboot, so I think you mean SecureXL. As mentioned at the bottom of slide 46, the bigger the firewall the more likely there will be a noticeable performance impact when SecureXL is disabled. If you just need to obtain a packet capture use tcpdump (which is generally immune to the state of SecureXL), or selectively disable SecureXL as mentioned on slide 48 for the address(es) you wish to capture with fw monitor.
Worth mentioning that MQ supports only 5 interfaces
Correct Multi-Queue cannot be enabled on more than five physical interfaces at a time. Note that this limit applies to individual physical interfaces regardless of whether they are part of a bond or have VLAN-tagged subinterfaces. Also SecureXL must be enabled to use Multi-Queue.
Does 10G interfaces require Multi-que to fully used the whole bandwith?
Maybe, see the answer to the "In big boxes..." question above. Generally more SND/IRQ cores should be allocated first before trying Multi-Queue.
What are the things to consider if enabling dynamic dispatcher on R77.30 cluster?
Make sure you have the latest R77.30 Jumbo HFA applied first, as there were issues with the Dynamic Dispatcher (DD) in the early Jumbo HFA's for R77.30. Treat enabling the DD as a version upgrade (i.e. do one member at a time) as I don't think they will sync properly while only one member has DD enabled but I could be wrong about that.
Can you explain fw worker process and core xl ow do they intercat those two elemtns
CoreXL is basically the splitting of SND/IRQ/SXL processing and PXL/F2F/INSPECT processing onto different physical cores. Slide 8 lays this out in more detail.
How to see the change in performance after enabling optimization? And present in a view.
Use the cpview, top...1, and cpstat os -f multi-cpu -o 1 commands to baseline the performance of your firewall before and after tuning adjustments.
Do hyperthreading cores increase licensing costs?
No, but your Check Point appliance processor architecture must support it. See sk93000.
If I enable dynamic dispatcher then can I go to enable hyper threading ? Is it a good idea ?
These are two completely different things. They work fine together when both are enabled if that is what you are asking. Enabling the Dynamic Dispatcher on R77.30 is about as close to a no-brainer as it gets. SMT less so as covered on slide 12 of the presentation.
Is SMT still configurable on an open servers even though according to Checkpoint it is not supported?
No, when you try to enable it via cpconfig I think it says your hardware is not supported. See sk93000.
What can be done if accepted templates are disabled by low rule no # ?
Upgrade to R80.10 gateway as the templating restrictions were relaxed significantly in that version. The only service objects that halt accept templates on an R80.10 gateway are DCE/RPC services and certain complex services such as "mapped" services. See sk32578 under "Acceleration of connections [templates]" for the restrictions, in R77.30 and earlier the most common service types that halt templating are DHCP, traceroute, Time/Dynamic/Domain, and DCE/RPC objects.
Cand SND use cores without addition licensing since they are essentially only routing traffic? I have seen a lot of issues where the firewalls just can accept packets fast enough that don't need the medium path. Nic intterupts are what I am talking about mainly. Licensing shouldn't restrict accepting packets if it isn't in the medium path it seems like to me. I would think we would be able to use as many SNDs without additional licensing as we want spread over as many cores as we want as that is mainly accepting packets on interrupts. I see systems with a lot of soft IRQ which seem to be limited by licensing on cores. Is there a way to have a 4 core license for medium path stuff but have Interrupts spread over 8 cores that are on the box just to help with interrupts?
No. The number of licensed cores on an open hardware firewall dictates the total number of cores that can be allocated for both SND/IRQ instances and Firewall Worker instances. I guess you might be able to manually specify interface affinity to an unlicensed core, but that packet would then have to make its way to a different core for handling by the SND. Keeping the SoftIRQ processing and SND handling on the same physical core sounds a lot more efficient to me. Even if you find some unsupported way to subvert the core licensing limits, after a version upgrade you may find that suddenly your workaround no longer functions or even causes new problems.
Are there certain blades that will cause all traffic to miss SXL?
Any "Deep Inspection" blades such as APCL/URLF and Threat Prevention generally cannot be handled in the SXL path.
Is this accelaration for all traffic or only for TCP
For throughput acceleration via SXL, only TCP/UDP packets can potentially be handled via SXL; everything else goes F2F including ICMP. SecureXL accept templates can only be formed for TCP and UDP connections as well.
What if drop and overrun are the same?
For some reason certain driver/NIC combos increment the RX-DRP and RX-OVR counters displayed by netstat -ni in lock-step with each other. Use ethtool -S (interface) to see more detailed interface counters that will allow you to distinguish NIC hardware overruns from RX buffering drops/misses.
Is there any command to check if dynamic dispatcher is enabled or not?
This was covered by Case Study 4 on slide 53 which we didn't have time to cover.
R77.30: fw ctl multik get_mode
R80.10: fw ctl multik dynamic_dispatching get_mode
Is really "free" command valid? shouldn't it be reather cpstat os -f memory?
The free command provides greater granularity, please see my response in this thread:
https://community.checkpoint.com/message/11906-re-cpviewmem-vs-free?commentID=11906#comment-11906
Are there any differences in using the Super Seven on 1400 or 5200 appliances?
The Super Seven commands are valid on Check Point firewall appliance models 2200-23XXX and open hardware. They will probably work on other models such as 600-1700 and 41000-64000 but YMMV.
Do you have any tips on gathering info during perfomrance issue while there is no one to monitor firewalls e.g during out of hours.
By default cpview keeps system performance history for the last 30 days, and it accessible by passing the -t option to cpview.
When would you increase the size of the ring buffer? Mind you that if you upgrade from ancient versions you will have too small ringbuffers. (256 instead of 1024)
Only when you are out of other options should you consider changing the ring buffer size from its default value. Situation: more SND/IRQ cores cannot be allocated due to a low number of physical cores or the utilization across all physical cores is >75%, thus indicating the firewall is undersized or in desperate need of serious tuning. While increasing the ring buffers may reduce RX-DRPs, it can get you in further trouble by causing something nasty called Bufferbloat.
Would any core removed from the worker cores pool be automatically added to SND/IRQ?
Yes. There is no direct way to specify the number of SND/IRQ cores you want; basically any core not designated a Firewall Worker automatically becomes a SND/IRQ core on firewalls with 4+ physical cores.
How is the swap space sized during installation? e.g. on a 23500 HPP (64 GB RAM) you get 32 GB swap space - if you use around 4 GB swap space I assume the gw is almost dead...
Will probably have to defer this one to Check Point, but ideally the firewall is not using swap space at all as shown by free -m.
When should you balance what Interfaces go to what core? what are the pros and cons about configuring manually affinity of cores / process and/or cores / NICs ? In general, do you feel confortable with auto affinity ? is the recalculation and redistribuition painful for the gateway ? is this better or worst than having a fixed manual distribution ?
Manual interface affinity is covered in Appendix A of my book. In earlier releases the Automatic Interface Affinity algorithm had various problems, but far as I know they have all been rectified and it has been a very long time since I've needed to configure manual interface affinity. Doing so is probably more likely to get you into further trouble if heavy traffic patterns through your firewall suddenly shift to new interfaces in an unexpected way. The only exception to this is if SecureXL is disabled; in that case Automatic Interface Affinity is disabled as well and you will probably need to implement manual interface affinity. However please see Slide 46 before going down this road.
On a 4 core appliance there is not much that can be done, true?
The default split with 4 physical cores is 1/3, if the single SND/IRQ is overloaded you can try a split of 2/2 which I've seen definitely help in certain situations. That's about it.
Route based VPN use F2F, does ESP packets also use F2F
Route-based VPN traffic is forced to go F2F to ensure every packet visits the Gaia IP driver for routing which may have been changed by something like OSPF. Traffic handled in PXL and SXL does not actually go through the IP driver as shown on slide 9. ESP traffic traversing the firewall (i.e. the firewall itself is not an endpoint of the VPN tunnel) goes F2F because it is not TCP or UDP. ESP traffic that terminates for decryption at the firewall (or is encrypted by the firewall) can potentially be handled in the SXL path subject to some restrictions.
Should you still disable securexl before running TCPDUMP?
tcpdump will *generally* provide a complete capture even when SecureXL is enabled. Please see the exceptions to this in my reply here:
TCPDUMP and SecureXL
If you have on R80.10 a unified policy (FW, Appctl, URLF...) is normal having low values for acceleration?
Specifically for templating (i.e. Connections/sec as shown by fwaccel stats -s) the answer is yes. If there is any other feature enabled in the Network Policy layer (like APCL/URLF) thus making it an inline layer, the firewall's templating rate will immediately drop to zero. Traffic subject to the Anti-bot blade can also cause a zero templating rate. This condition has no effect whatsoever on throughput acceleration via the SXL/PXL/F2F paths. SecureXL accept templates are no longer that important due to the new R80.10 Column-based matching feature which has substantially reduced the overhead incurred by a rulebase lookup in F2F:
Unified Policy Column-based Rule Matching
Will sk104468 also work for ports? The table.def seems to give a hint in that direction
Never found the need to ensure certain ports go F2F, but looks like it will work. Will have to defer this one to Check Point...
What is effect of fw monitor for traffic in SXL path?
You won't see the traffic in your fw monitor output at all, or you will only see the first packet of the connection heading to F2F for a rulebase lookup if there was no SecureXL accept template present for that connection. Other than maybe slowing it down a bit under heavy load, using fw monitor does not change whether traffic is allowed or denied by the firewall.
How good is the spike monitor script that the TAC offer
Don't think I've worked with this specific tool (or maybe it was called something else at one time), will defer to Check Point on that one.
On a 2 seat/cpu cloudguard license/installation, are you better off not enabling corexl ?
Outside my area of expertise, will defer to Check Point.
On 2380's what is the recommedned number of for SND/IRQ cores?
I assume you mean a 23800, given the large number of cores and high amount of traffic typically pushed through these boxes I usually start with a 6/18 split with SMT disabled and go from there with the Super Seven commands. Enabling Multi-Queue and SMT might happen depending on what I find with the Super Seven commands once the box is under load.
All the advice given today, is this just as relevant for VSX
The content here will apply to VSX to some degree but due to the heavy use of user-space processes in the VSX implementation YMMV. By far the best tuning guide for VSX is from Michael Endrizzi : VSX & CoreXL Training- You’ll love the price | DreezSecurityBlog
For 12+ cores: cores-4 was listed as the best one in one of the Checkpoint video but sk62620 shows different numbers
The table in sk62620 is old and incorrect. Here are the default CoreXL allocations taken from sk98737, with the maximum number of total cores corrected for a 64-bit Gaia R80.10 gateway:
Number of CPU cores | Default number of CoreXL IPv4 FW instances | Default number of Secure Network Distributors (SNDs) |
1 | 1 Note: CoreXL is disabled | 0 Note: CoreXL is disabled |
2 | 2 | 2 |
4 | 3 | 1 |
6 - 20 | [Number of CPU cores] - 2 | 2 |
More than 20 | [Number of CPU cores] - 4 Note: However, no more than 40. | 4 |
Can u put some light on dealing with https inspection?
In R77.30/R80.10 gateway all traffic subject to HTTPS Inspection will be handled in F2F with calls into process space via wstlsd/pkxld for the initial HTTPS negotiations. Hate to pull this stunt, but pages 361-365 of my book cover in depth everything you can do to mitigate the HTTPS Inspection performance hit as much as possible. These SKs can also help quite a bit:
sk108202: Best Practices - HTTPS Inspection
sk109772: R77.30 NGTP, NGTX and HTTPS Inspection performance and memory consumption optimization
sk65123: HTTPS Inspection FAQ
What is the default value for ring buffer on RX and TX? Is that controlled by the NIC driver/ Checkpoint
Default ring buffer size is set by the NIC manufacturer in the corresponding driver I think. In general the default ring buffer size is 256 frames for 1Gbps interfaces and 512 for 10Gbps interfaces, but that may vary depending upon the vendor involved. The ring buffer size can be changed from clish but that is definitely a last resort.
--
Second Edition of my "Max Power" Firewall Book
Now Available at http://www.maxpowerfirewalls.com
Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com