Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Teddy_Brewski
Contributor
Jump to solution

Saturation of concurrent connections

Hello,

VSX cluster running Check Point R80.40 (Take 154) on open servers (HP) with two VSs.

On a random basis, but always during out of office hours and on weekends, we experience 1-2 minutes intermittent access due to the spikes of concurrent connections.
CPU load goes 100%, all concurrent connections are saturated and the firewall starts dropping packets. It only lasts for 2-3 minutes, and could happen once/twice per week/month. It does look like a DDoS, but it only lasts for few minutes.

I keep raising the concurrent connections value, initially from 25000 to 35000, and then to 55000, but it doesn't seem to help. We have plenty of RAM and potentially can go higher, but I'm not sure it's the right way.
I can't spot anything unusual from the SmartTracker logs during those minutes -- just regular port scans from various networks.

Any ideas/tips/hints would be greatly appreciated.

PS: Just to add that we've experienced the same with R77.30, so I don't think it's linked to the version.

PSS: The specs are: ProLiant DL360 Gen10 (Intel Xeon Gold 6144 3.50GHz (8 cores), 64GB RAM).

Thank you in advance.

0 Kudos
40 Replies
the_rock
Legend
Legend
0 Kudos
Sorin_Gogean
Advisor

Hello @Teddy_Brewski  and @the_rock ,

 

Find attached a document containing some explication and the script files.

There are 3 files that needs to be created on your system.

All the details are in the document, if you have questions, let me know.

 

Thank you,
PS: some clean-up is still required, I might do it this weekend and come back to be published 😁

PS2: if you need to have a remote session, I'm in East Europe, so GMT+2 works

0 Kudos
the_rock
Legend
Legend

Thanks @Sorin_Gogean 

0 Kudos
Sorin_Gogean
Advisor

Hello @Teddy_Brewski  and @the_rock ,

 

Were you able to run it?

Any questions ?

 

Thank you and have a nice weekend,

0 Kudos
the_rock
Legend
Legend

Hey @Sorin_Gogean ...sorry mate, just saw this post, I studied like a maniac for 2 days to pass CCTE exam, it was bit harder than I thought, but who cares, its done now : - )

Anyway, I did not have chance to do anything with this, do you have ready script I can run?

Cheers,

Andy

0 Kudos
Sorin_Gogean
Advisor

hello @the_rock , no worries, take your time 🙂,

as for an script ready-to-run, you have them in the document, not sure what else you require.

 

have a nice weekend,

0 Kudos
the_rock
Legend
Legend

I may give it a go Monday. Still "recovering" from CCTE exam 😂😂

0 Kudos
Teddy_Brewski
Contributor

Thank you very much @Sorin_Gogean!  I'm going to try it tomorrow.

Two quick questions:

- highConn_v6e.sh is not for the cron right? To fire it up for the first time it should be executed as 'highConn_v6e.sh &' ? How do you stop it?

- for the VSX setups, it must be executed in the context of the VS? Can it be used in multiple VSs?

0 Kudos
Sorin_Gogean
Advisor

Hello @Teddy_Brewski ,

 

You start the script with "/usr/bin/nohup /opt/CPsuite-R81/fw1/scripts/highConn_v6e.sh &" .
This will start it in "background" -> "Nohup, short for no hang up is a command in Linux systems that keep processes running even after exiting the shell or terminal

To kill the process you just do an "ps axf | grep high" to see processes that have high in the name - like below - and pick the pid of that process, then with a kill -9 'pid' - like kill -9 18890 you will kill that specific process (simple linux kowledge).
[“ kill -9” command sends a kill signal to terminate any process immediately when attached with a PID or a processname. It is a forceful way to kill/terminate a or set of processes. “ kill -9 <pid> / <processname>” sends SIGKILL (9) — Kill signal. This signal cannot be handled (caught), ignored or blocked.]

[Expert@AXXA-FW02:0]# ps axf | grep high
15271 pts/2 S+ 0:00 \_ grep --color=auto high
18890 ? S 1:02 /bin/sh /opt/CPsuite-R81/fw1/scripts/highConn_v6e.sh
[Expert@AXXA-FW02:0]#

 

The cron I give it as an example, still I have issues with Gaia starting it up, and I have to dig up a bit and see why it's not triggered automatically. Maybe some folder permissions, we'll see.

 

For the VSX, I need to understand a bit, like each VSX is separate if you run an "fw ctl pstat" you will see the results from that particular subsystem . If my understanding is correct, then yes, you can install/run that script in each instance, so it will collect reports per VSX instance . [sorry we don't do VSX]

 

Let me know if there are any other questions.

 

Thank you,

PS: in regards to Gaia "Job Schedule" can someone give a hint why a JOB with "sleep 2m && /usr/bin/nohup /opt/CPsuite-R81/fw1/scripts/highConn_v6c.sh &" is not starting, but manually I can without issues?

0 Kudos
Kaspars_Zibarts
Employee Employee
Employee

It seems like you already have all the good advise - have a script to analyze the connections table when the issue happens. I might have missed in the thread but have you looked at top source/destination/port in log view for time specific to the incident. Back in the day we found that our own internal scanners were too aggressive and filled up connection tables every Thursday hehe. But yeah go with script! That's the easiest 

Sorin_Gogean
Advisor

Hello @Kaspars_Zibarts ,

 

This is roughly the logic I've applied to the data grabbing/analysis and it's like you say/recommend 😊.

Script checks every “CHECK_INTERVAL” seconds the number of HighConnections that is determined from “$FW ctl pstat | $GREP Concurrent | $AWK '{print $3}” and compares it against variable “HCONNS” .

Additionally, we have a function that calculates the Average High Connections for last hour and we compare against that.

When an HighConnectio event gets triggered, then we collect all firewall connections in an “EXPFILE” file .

The content of the file is parsed and we report the Top 5 Sources "Begin listing TOP X SRC connections” and afterwards the detailed top 10 for the first 3 “Detailed listing TOP X DST connections for ${my_array[$j]} SRC on $EXPFILE" .

We do the same with Top 5 Destinations "Begin listing TOP X DST connections”.

 

Based on this, we were able to see that some 1MIL connection peaks were due to DNS DDoS we had from time to time.

 

Thank you and have a nice weekend,

0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events