Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Bernardo_1979
Explorer

AWS standby gateway increasing memora usage till can't be accessed via SSH or AWS console.

Hello CheckMates.

I have a cluster deployed in AWS cloud and noticed that within some period of time the standby (Member B) member presents error in Smartconsole and it's not possibel to access via SSH nor via AWS console. 
In AWS we can check the status check and it shows 2/3 checks passed, if we just try to reboot via AWS it does not work and we must stop the instance and start it again.
It takes around 24h tops to present the issue again.
I used the following command to check the memory usage.

"ps -eo size,pid,user,command --sort -size | awk '{ hr=$1/1024 ; printf("%13.2f Mb ",hr) } { for ( x=4 ; x<=NF ; x++ ) { printf("%s ",$x) } print "" }' > /home/admin/memory.txt"


In the file I have noticed a lot of “python3 /opt/CPsuite-R81.20/fw1/scripts/cloudwatch.py” which one using 5.3MB. The last time i run the command to get the output it jumped from 403 occurences to 465 in approximatly 45 minutes.

Have you ever see this kind of behavior?

Note:
R81.20 on JHF take 98
Funny fact, before apply the JHF the member which has the issue was Member A and after the JHF the issue seems to be "migrated" to Member B.
I can't forget to quote that there is an ongoing SR.

Regards

P.S.: Attached has a print screen of the AWS check and the memory use when alread have lost the connectivity.

0 Kudos
4 Replies
PhoneBoy
Admin
Admin

I assume the TAC case is related to this issue?

0 Kudos
Bernardo_1979
Explorer

Yes, the case is realted to this issue.
Already uploaded the CPINFO after reboot, and a second one after few hours with memory already in high use. Also sent the HCP output  and the files created after followed sk35496.

0 Kudos
Chris_Atkinson
MVP Gold CHKP MVP Gold CHKP
MVP Gold CHKP

Take 99 addresses some memory issues but best to consult TAC to properly diagnose / validate this is your issue.

CCSM R77/R80/ELITE
0 Kudos
Bernardo_1979
Explorer

Sorry for the delay to post an update.

After a long TAC case the solution arrived, the issue was the cloudwatch.py script which has infinite runs causing the memory use increase till the FW get "stuck".
Checkpoint sent a new cloudwatch.py script to replace the original one.
The steps:

1. Stop the Cloudwatch cron calls using:

/sbin/cloudwatch stop 

2. Reboot the member.

3.Backup the old cloudwatch.py using:

mv $FWDIR/scripts/cloudwatch.py{,.bck}

4.Copy the new cloudwatch.py file to $FWDIR/scripts

5.Start Cloudwatch again using:

/sbin/cloudwatch start

Voilà, Problem solved!!! 😀

0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.