Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
j_silva
Contributor

Concurrent Connections - R80.40

Jump to solution

Hi,

I have an appliance 15600 with Gaia R80.40 JHF 126. Everyday when the connections is above 80000 all traffic to internet stops unexpectedly. I don´t understand because the fw ctl pstat output show me that the connection doesn´t have limit.

System Capacity Summary:
Memory used: 16% (7758 MB out of 47763 MB) - below watermark
Concurrent Connections: 28844 (Unlimited)
Aggressive Aging is enabled, not active

 

I don´t have issues related with CPU, memory or disk and no nat fragmentation as well. I saw on Smartconsole that is possible to change the connection value limit but i believe isn´t necessary to change it. 

 

Could you help with this issue?

0 Kudos
1 Solution

Accepted Solutions
j_silva
Contributor

Hi guys,

I want to share the solution to this case. The solution was disabling the GNAT, according describe in this sk -https://supportcenter.checkpoint.com/supportcenter/portal?eventSubmit_doGoviewsolutiondetails=&solut...

After disabling GNAT the connections are stable even up 100k connections.

Thanks all feedbacks!

View solution in original post

25 Replies
_Val_
Admin
Admin

Check if you do not exhaust your NAT Hide ports yet.

0 Kudos
j_silva
Contributor

Hello

I don't have this problem, I've searched Smartlog and cpview. All related questions are apparently normal.

0 Kudos
_Val_
Admin
Admin

What did you search it for, exaclty? Also, when issue happens, look at your NAT kernel tables to see if they are not maxed.

0 Kudos
j_silva
Contributor

Hi,

During the problem a NAT exhaustion is checked in cpview according to the image attached.

Is possible to check the NAT use with another command?

0 Kudos
the_rock
Champion
Champion
0 Kudos
_Val_
Admin
Admin

Limit 10000 but actual peak is 28000? Does not sound okay to me. Also, you are too close to that 28000, which might point in the right direction:

Screenshot 2021-12-06 at 16.10.13.png

j_silva
Contributor

Hi,

I already asked it to TAC. According TAC i have 28 cpu core, so each core has a limit 10000, so i need to calculate 28x10000 = 280000.

I my opinion, this case i just loose the cache but the NAT still going to translate address normally.

0 Kudos
the_rock
Champion
Champion

First time ever I hear that sort of calculation...not really sure thats correct, maybe someone else can confirm.

j_silva
Contributor

Yes the_rock

TAC explained that to me during a chat. I´ll ask again.

Thanks for all feedback.

0 Kudos
the_rock
Champion
Champion

I will let others confirm, but I NEVER heard that before. I wish they gave you some document about it...not sure where that came from, honestly.

genisis__
Advisor

I'd like to see the SK or document where it explains that.

 

0 Kudos
j_silva
Contributor

Hi,

I just have a response during exchange message on the case.

Hi Juarez,

" Appreciating your update, You can clear the table and check if that will resolve as a first step. The limit 10000 is per firewall worker so I am thinking there are 28 workers, each worked can have up to 10K entries in that table, so 28 X 10000 = 280000 so that is why you see it at 280K. When we increase the limit it will be by firewall worker."

*** Your service request is not monitored outside of assigned engineers working hours, should you require immediate assistance, please feel free to call in. ***
 
Regards

0 Kudos
Glen_Bayless
Employee
Employee

If you are able, a ticket with TAC will resolve the issue the quickest.

However some steps you can follow during the issue is to run the following in expert mode "fw ctl zdebug + drop" for a few seconds and see what error messages you get. TAC would request this information as well.

If you post it here someone will review but it needs to be during the issue.

0 Kudos
j_silva
Contributor

Hi Glen,

 

I have ran "fw ctl zdebug + drop" but i didn´t get messages about drop...

So, i already open a case with TAC but no issues about NAT was asked me until this moment.

0 Kudos
Eitan_Gilad-Lug
Employee
Employee

Can you please share the SR number with TAC so we can close the loop.

thanks

 

0 Kudos
the_rock
Champion
Champion

@_Val_ actually brought up a very good point. I recall there was an sk from while back about this, probably R77 and before and it involved modifying some stuff on mgmt server file. I will see if I can find it.

j_silva
Contributor

Hi,

I have found a SK that inform to modify the limit of connection -> /opt/CPsuite-R80.40/fw1/conf/objects_5_0.C

# :connections_limit (25000)

But when i run fw ctl pstat i have unlimited as below:

System Capacity Summary:
Memory used: 17% (8250 MB out of 47763 MB) - below watermark
Concurrent Connections: 50372 (Unlimited)
Aggressive Aging is enabled, not active

Hash kernel memory (hmem) statistics:
Total memory allocated: 8513806336 bytes in 2078566 (4096 bytes) blocks using 2 pools
Initial memory allocated: 5007998976 bytes (Hash memory extended by 3505807360 bytes)
Memory allocation limit: 40066088960 bytes using 512 pools
Total memory bytes used: 0 unused: 8513806336 (100.00%) peak: 6942913848
Total memory blocks used: 0 unused: 2078566 (100%) peak: 1812703
Allocations: 373539907 alloc, 0 failed alloc, 346496895 free

System kernel memory (smem) statistics:
Total memory bytes used: 11712721952 peak: 13962568616
Total memory bytes wasted: 123912475
Blocking memory bytes used: 119820396 peak: 815182716
Non-Blocking memory bytes used: 11592901556 peak: 13147385900
Allocations: 2250859302 alloc, 207 failed alloc, 2250776409 free, 0 failed free
vmalloc bytes used: 11447248008 expensive: no

Kernel memory (kmem) statistics:
Total memory bytes used: 6601477344 peak: 10154447436
Allocations: 2624246410 alloc, 207 failed alloc
2597134592 free, 0 failed free
External Allocations:
Packets:124969024, SXL:37052266, Reorder:0
Zeco:0, SHMEM:2176, Resctrl:0
ADPDRV:0, PPK_CI:13015296, PPK_CORR:0

Cookies:
1870268372 total, 809226293 alloc, 809155399 free,
3679877010 dup, 4020703328 get, 2165983218 put,
2327063251 len, 740443353 cached len, 26932 chain alloc,
26932 chain free

Connections:
540014320 total, 466588826 TCP, 71211150 UDP, 2203472 ICMP,
10872 other, 406247 anticipated, 1883050 recovered, 50372 concurrent,
124545 peak concurrent

Fragments:
92073 fragments, 44095 packets, 69 expired, 0 short,
0 large, 0 duplicates, 0 failures

NAT:
1779716794/0 forw, -431240825/0 bckw, 1135361697 tcpudp,
17091927 icmp, 1548710183-419225040 alloc

0 Kudos
the_rock
Champion
Champion

This is one of sk's I was thinking of...check bottom for R80.x mgmt server (or R81 if thats what you use for mgmt)

 

https://supportcenter.checkpoint.com/supportcenter/portal?eventSubmit_doGoviewsolutiondetails=&solut...

j_silva
Contributor

Hi the_rock

I've seen this sk, but my question is whether it's even necessary to adjust this parameter since the gateway has no connection limit according to fw ctl pstat output.
I have seen gateway with low capacity with no problem related nat or limits as well.

It´s very stranger to me.

0 Kudos
the_rock
Champion
Champion

@_Val_ is right, I just noticed the same thing actually. How come limit says 10000, but peak is 280000??!! That makes no sense...if you edit fw object in dashboard and go to optimization from left side menu, what do you have set there for capacity optimization?

Andy

Timothy_Hall
Champion
Champion

Sorry to come in late on this thread, but the fwx_cache table is solely used to cache NAT rulebase hits and avoid as much as possible a full NAT rulebase lookup which can take substantial overhead.  If the fwx_cache table is full (or close to being full) it should not cause NAT operations to break, other than causing higher CPU load on the workers.  If the workers are already riding the edge as far as high CPU, the additional load caused by suddenly having to do far more resource-intensive NAT rulebase lookups may be causing the behavior you describe.  In fact it is possible to completely disable the fwx_cache mechanism and NAT still works just fine although it consumes more CPU for NAT rule lookups.  I uncovered most of this when researching the third edition of my book an excerpt of which is included below, and it took me awhile as there was not a lot of documentation about this mechanism available at the time.  

The 10K cache entries per worker is documented, but only in my book.  😀

Bottom line is I don't think the fwx_cache table getting full or close to full is directly causing your problem; it lies elsewhere.

natcache1.pngnatcache2.pngnatcache3.png

 

 

New 2021 IPS/AV/ABOT Immersion Self-Guided Video Series
now available at http://www.maxpowerfirewalls.com
emmap
Employee
Employee

The old default used to be 25000 before it changed to automatic, so that parameter will be a legacy thing that is no used when automatic limit is enabled. You can safely ignore it if pstat is correctly showing automatic.

0 Kudos
Henrik_Noerr1
Collaborator

This is still very much a thing on VSX.

j_silva
Contributor

Hi guys,

I want to share the solution to this case. The solution was disabling the GNAT, according describe in this sk -https://supportcenter.checkpoint.com/supportcenter/portal?eventSubmit_doGoviewsolutiondetails=&solut...

After disabling GNAT the connections are stable even up 100k connections.

Thanks all feedbacks!

the_rock
Champion
Champion

Thanks very much for sharing that...that sk is super useful!