Solved: Re: Concurrent Connections - R80.40

j_silva · ‎2021-12-06

Hi,

I have an appliance 15600 with Gaia R80.40 JHF 126. Everyday when the connections is above 80000 all traffic to internet stops unexpectedly. I don´t understand because the fw ctl pstat output show me that the connection doesn´t have limit.

System Capacity Summary:
Memory used: 16% (7758 MB out of 47763 MB) - below watermark
Concurrent Connections: 28844 (Unlimited)
Aggressive Aging is enabled, not active

I don´t have issues related with CPU, memory or disk and no nat fragmentation as well. I saw on Smartconsole that is possible to change the connection value limit but i believe isn´t necessary to change it.

Could you help with this issue?

j_silva · ‎2021-12-16

Hi guys,

I want to share the solution to this case. The solution was disabling the GNAT, according describe in this sk -https://supportcenter.checkpoint.com/supportcenter/portal?eventSubmit_doGoviewsolutiondetails=&solut...

After disabling GNAT the connections are stable even up 100k connections.

Thanks all feedbacks!

View solution in original post

_Val_ · ‎2021-12-06

Check if you do not exhaust your NAT Hide ports yet.

j_silva · ‎2021-12-06

Hello

I don't have this problem, I've searched Smartlog and cpview. All related questions are apparently normal.

_Val_ · ‎2021-12-06

What did you search it for, exaclty? Also, when issue happens, look at your NAT kernel tables to see if they are not maxed.

j_silva · ‎2021-12-06

Hi,

During the problem a NAT exhaustion is checked in cpview according to the image attached.

Is possible to check the NAT use with another command?

the_rock · ‎2021-12-06

https://supportcenter.checkpoint.com/supportcenter/portal?eventSubmit_doGoviewsolutiondetails=&solut...

Best,
Andy
"Have a great day and if its not, change it"

_Val_ · ‎2021-12-06

Limit 10000 but actual peak is 28000? Does not sound okay to me. Also, you are too close to that 28000, which might point in the right direction:

j_silva · ‎2021-12-06

Hi,

I already asked it to TAC. According TAC i have 28 cpu core, so each core has a limit 10000, so i need to calculate 28x10000 = 280000.

I my opinion, this case i just loose the cache but the NAT still going to translate address normally.

the_rock · ‎2021-12-06

First time ever I hear that sort of calculation...not really sure thats correct, maybe someone else can confirm.

Best,
Andy
"Have a great day and if its not, change it"

j_silva · ‎2021-12-06

Yes the_rock

TAC explained that to me during a chat. I´ll ask again.

Thanks for all feedback.

the_rock · ‎2021-12-06

I will let others confirm, but I NEVER heard that before. I wish they gave you some document about it...not sure where that came from, honestly.

Best,
Andy
"Have a great day and if its not, change it"

genisis__ · ‎2021-12-06

I'd like to see the SK or document where it explains that.

j_silva · ‎2021-12-06

Hi,

I just have a response during exchange message on the case.

Hi Juarez,

" Appreciating your update, You can clear the table and check if that will resolve as a first step. The limit 10000 is per firewall worker so I am thinking there are 28 workers, each worked can have up to 10K entries in that table, so 28 X 10000 = 280000 so that is why you see it at 280K. When we increase the limit it will be by firewall worker."

*** Your service request is not monitored outside of assigned engineers working hours, should you require immediate assistance, please feel free to call in. ***

Regards

Glen_Bayless · ‎2021-12-06

If you are able, a ticket with TAC will resolve the issue the quickest.

However some steps you can follow during the issue is to run the following in expert mode "fw ctl zdebug + drop" for a few seconds and see what error messages you get. TAC would request this information as well.

If you post it here someone will review but it needs to be during the issue.

j_silva · ‎2021-12-06

Hi Glen,

I have ran "fw ctl zdebug + drop" but i didn´t get messages about drop...

So, i already open a case with TAC but no issues about NAT was asked me until this moment.

Eitan_Gilad-Lug · ‎2021-12-06

Can you please share the SR number with TAC so we can close the loop.

thanks

the_rock · ‎2021-12-06

@_Val_ actually brought up a very good point. I recall there was an sk from while back about this, probably R77 and before and it involved modifying some stuff on mgmt server file. I will see if I can find it.

Best,
Andy
"Have a great day and if its not, change it"

j_silva · ‎2021-12-06

Hi,

I have found a SK that inform to modify the limit of connection -> /opt/CPsuite-R80.40/fw1/conf/objects_5_0.C

# :connections_limit (25000)

But when i run fw ctl pstat i have unlimited as below:

System Capacity Summary:
Memory used: 17% (8250 MB out of 47763 MB) - below watermark
Concurrent Connections: 50372 (Unlimited)
Aggressive Aging is enabled, not active

Hash kernel memory (hmem) statistics:
Total memory allocated: 8513806336 bytes in 2078566 (4096 bytes) blocks using 2 pools
Initial memory allocated: 5007998976 bytes (Hash memory extended by 3505807360 bytes)
Memory allocation limit: 40066088960 bytes using 512 pools
Total memory bytes used: 0 unused: 8513806336 (100.00%) peak: 6942913848
Total memory blocks used: 0 unused: 2078566 (100%) peak: 1812703
Allocations: 373539907 alloc, 0 failed alloc, 346496895 free

System kernel memory (smem) statistics:
Total memory bytes used: 11712721952 peak: 13962568616
Total memory bytes wasted: 123912475
Blocking memory bytes used: 119820396 peak: 815182716
Non-Blocking memory bytes used: 11592901556 peak: 13147385900
Allocations: 2250859302 alloc, 207 failed alloc, 2250776409 free, 0 failed free
vmalloc bytes used: 11447248008 expensive: no

Kernel memory (kmem) statistics:
Total memory bytes used: 6601477344 peak: 10154447436
Allocations: 2624246410 alloc, 207 failed alloc
2597134592 free, 0 failed free
External Allocations:
Packets:124969024, SXL:37052266, Reorder:0
Zeco:0, SHMEM:2176, Resctrl:0
ADPDRV:0, PPK_CI:13015296, PPK_CORR:0

Cookies:
1870268372 total, 809226293 alloc, 809155399 free,
3679877010 dup, 4020703328 get, 2165983218 put,
2327063251 len, 740443353 cached len, 26932 chain alloc,
26932 chain free

Connections:
540014320 total, 466588826 TCP, 71211150 UDP, 2203472 ICMP,
10872 other, 406247 anticipated, 1883050 recovered, 50372 concurrent,
124545 peak concurrent

Fragments:
92073 fragments, 44095 packets, 69 expired, 0 short,
0 large, 0 duplicates, 0 failures

NAT:
1779716794/0 forw, -431240825/0 bckw, 1135361697 tcpudp,
17091927 icmp, 1548710183-419225040 alloc

the_rock · ‎2021-12-06

This is one of sk's I was thinking of...check bottom for R80.x mgmt server (or R81 if thats what you use for mgmt)

https://supportcenter.checkpoint.com/supportcenter/portal?eventSubmit_doGoviewsolutiondetails=&solut...

Best,
Andy
"Have a great day and if its not, change it"

j_silva · ‎2021-12-06

Hi the_rock

I've seen this sk, but my question is whether it's even necessary to adjust this parameter since the gateway has no connection limit according to fw ctl pstat output.
I have seen gateway with low capacity with no problem related nat or limits as well.

It´s very stranger to me.

the_rock · ‎2021-12-06

@_Val_ is right, I just noticed the same thing actually. How come limit says 10000, but peak is 280000??!! That makes no sense...if you edit fw object in dashboard and go to optimization from left side menu, what do you have set there for capacity optimization?

Andy

Best,
Andy
"Have a great day and if its not, change it"

Timothy_Hall · ‎2021-12-09

Sorry to come in late on this thread, but the fwx_cache table is solely used to cache NAT rulebase hits and avoid as much as possible a full NAT rulebase lookup which can take substantial overhead. If the fwx_cache table is full (or close to being full) it should not cause NAT operations to break, other than causing higher CPU load on the workers. If the workers are already riding the edge as far as high CPU, the additional load caused by suddenly having to do far more resource-intensive NAT rulebase lookups may be causing the behavior you describe. In fact it is possible to completely disable the fwx_cache mechanism and NAT still works just fine although it consumes more CPU for NAT rule lookups. I uncovered most of this when researching the third edition of my book an excerpt of which is included below, and it took me awhile as there was not a lot of documentation about this mechanism available at the time.

The 10K cache entries per worker is documented, but only in my book. 😀

Bottom line is I don't think the fwx_cache table getting full or close to full is directly causing your problem; it lies elsewhere.

New Book: "Max Power 2026" Coming Soon
Check Point Firewall Performance Optimization

emmap · ‎2021-12-06

The old default used to be 25000 before it changed to automatic, so that parameter will be a legacy thing that is no used when automatic limit is enabled. You can safely ignore it if pstat is correctly showing automatic.

Henrik_Noerr1 · ‎2021-12-16

This is still very much a thing on VSX.

j_silva · ‎2021-12-16

Hi guys,

I want to share the solution to this case. The solution was disabling the GNAT, according describe in this sk -https://supportcenter.checkpoint.com/supportcenter/portal?eventSubmit_doGoviewsolutiondetails=&solut...

After disabling GNAT the connections are stable even up 100k connections.

Thanks all feedbacks!

the_rock · ‎2021-12-16

Thanks very much for sharing that...that sk is super useful!

Best,
Andy
"Have a great day and if its not, change it"

Are you a member of CheckMates?

Concurrent Connections - R80.40