Re: The security gateway is dropping packets due t...

Chinmaya_Naik · ‎2021-07-01

Hi Team,

We observed that The security gateway is dropping packets due to CoreXL queue size.

Some of the core only havely utilized.

OS : R80.30 with jumbo hotfix take_227

Initially we observed this issue on non production hour like on week end where we basically reboot our firewall or connected switch or other testing.

But now we also see this issue on the production hour.

I need to understand what exactly issue is.

Query:

1. What exactly the reason even the Dynamic Dispatcher is enable and still few FW_Worker is fully utilized ?

2. As this issue is not on the SND level , so here FW_worker is fully utilized so we are getting packets drop so is this the first approach to fine tune the core configuration instead of increase the queue size ?

3. If we increased the input queue size to overcome the issue as mention on the sk61143 did the issue will resolved ?

4. As we know that we basically increase the buffer so is this resolve the issue or increase the latency ?

Pls suggest a best suggestion to over come this issue.

@Chinmaya_Naik

Benedikt_Weissl · ‎2021-07-02

Is SecureXL active? Please print the output of "fwaccel stats -s" and "fwaccel stat".
Did you identify the connection thats causing the load? If the load is caused by a high bandwidth trustworthy connection like storage replication or backup, you can use fast_accel to bypass fw_worker, see sk156672.

Timothy_Hall · ‎2021-07-02

An imbalance of utilization on Instances/Workers even with the Dynamic Dispatcher enabled is usually the result of elephant flows (what Check Point calls "heavy connections"). In your R80.30 release all packets of a single connection can only be processed on one Instance/Worker; later releases utilize the pipeline paths to help spread out this load. As mentioned earlier in the thread you can fastaccel the traffic if it doesn't need to be handled in the F2F path for some reason. This workaround and how to diagnose load imbalance problems like this are covered in my CPX 2020 speech.

Generally, increasing the buffer sizes (including CoreXL queues & ring buffers for NICs) is not recommended as it is only addressing a symptom of the problem and not the cause.

Gaia 4.18 (R82) Immersion Tips, Tricks, & Best Practices Video Course
Now Available at https://shadowpeak.com/gaia4-18-immersion-course

Chinmaya_Naik · ‎2022-07-11

@Timothy_Hall Thanks for the details

Hi Tim and Checkmates Team,

Again we face this issue.

Compare to other days during the mock (week end) we observed more packets drop due to the CoreXL queue size.

As the release of all packets of a single connection can only be processed on one Instance/Worker. Might be because of single heavy connections cause this issue.

Below update from TAC :

Query:

1. What exactly the reason even the Dynamic Dispatcher is enable and still few FW_Worker is fully utilized ? - Because of the stateful inspection

2. As this issue is not on the SND level , so here FW_worker is fully utilized so we are getting packets drop so is this the first approach to fine tune the core configuration instead of increase the queue size ? No fw_worker is not fully utilized and packet is dropped , it is the core queue which is full, I am checking and testing internally if we can increase the queue or even checking if the report is legit.

3. If we increased the queue size to overcome the issue as mention on the SK , did the issue will resolved ? - Most cases 4GB was sufficient , but in your environment you still face but known via pro report.

4. Is this fastaccel the traffic , if it doesn't need to be handled in the F2F path for some reason ? - Nothing to do with our issue , But if the traffic is not accelerate yes takes f2f path consuming massive CPU

As we know that , we basically by increase the buffer so is this resolve the issue or increase the latency ? - Resolves issue , Latency level is not extensively.

@Chinmaya_Naik

Timothy_Hall · ‎2022-08-01

Will need to see outputs of enabled_blades and the Super Seven to assess further. It is rather unusual to be dropping traffic in the CoreXL queues without issues occurring elsewhere in the firewall at the same time. If we confirm large amounts of F2F I will share some debugging commands that can be used to determine why the traffic is going F2F as there can be many causes.

Gaia 4.18 (R82) Immersion Tips, Tricks, & Best Practices Video Course
Now Available at https://shadowpeak.com/gaia4-18-immersion-course

Are you a member of CheckMates?

The security gateway is dropping packets due to CoreXL queue size