Re: Segmentation fault - Page 2

Vincent_Bacher · ‎2018-01-29

Hello mates,

after upgrading several gateways to R80.10 T56 a customer always gets segmentation fault when perperforming "show configuration" command in clish.

I know that there are several sk regarding R77 but not R80.10.

Anybody who knows anything?

Best regards

Vincent

and now to something completely different - CCVS, CCAS, CCTE, CCCS, CCSM elite

William_Tavares · ‎2018-05-08

Hi Hugo

Yes. I'm facing that same problem with 5200 and 5400 appliances.

It really seems a clue...

KennyManrique · ‎2018-05-08

5200 Appliance here with IPS and SNMP Active.

Also has other customers with 5XXX Appliances with IPS or SNMP (not both at the same time) and do not have this issue.

Regards.

William_Tavares · ‎2018-05-08

Hi Kenny

I also downgraded from take 91 to take 70 (with fresh install) without success.

So lask week I fresh installed and upgraded the gateway to take 103 as suggested by Brazilian SEs and the problem seems to be solved.

We have been monitoring this node without any crash for 5 days.

We are planning to upgrade another cluster member this Thursday.

Regards

KennyManrique · ‎2018-05-08

Thanks for your answer William,

Today TAC suggested to install take 103 also. The problem with the crash is you cant find the root cause ... the crash is very random and unexpected. Sometimes I had the problem even two or three weeks after a patch install or after follow a certain workaround.
Even with kernel parameters enabled to generate vmcore files, there is no traces to the cause.

Let's see how it goes with HFA103.

Regards.

Maarten_Sjouw · ‎2018-05-09

This is a long shot, BUT, when you have AV/AB and ACPL/URLF turned on, these 2 use scheduled updates on the gateway, both are set to the same interval and this is set to 2 hours. This schedule is set in the TP policy settings bottom left Update and in the application policy bottom left updates. It won't hurt to change them but it might also just releave you from this mess...

We experienced SMB devices completely freeze on doing these updates at the same time and since we changed the settings to 1:50 for AV and 2:17 for APCL updates interval it looks like the issue is gone.

Regards, Maarten

KennyManrique · ‎2018-05-09

I will try the recommendation Marteen.However, the update times are the same as in R77.30 where the device works perfectly.

Regarding SMB devices, they have inferior hardware compared to 5XXX Appliances and with the newest versions of Gaia embbeded it seems that consume far more resources, specially at update time and after modifying policy.

Regards.

Maarten_Sjouw · ‎2018-05-09

I tend to agree on the hardware differences, but as I already posted earlier today, running 2 udate schedueles like these at exactly the same time interval is just asking for problems.

Next to that the problems we have seen only started a week ago on the SMB devices and there were 10 of them 1450 and 1490 located all over the world for 2 different customers that were all the sudden showing issues. And I don't like nor do I believe in coincidences.

Regards, Maarten

KennyManrique · ‎2018-05-22

William Tavares‌ do you have any news with HFA103 installed? Does it work ok for you?

Regards.

William_Tavares · ‎2018-05-22

Hi Kenny

Yes, I have.

All gateways have been working perfectly with take 103 since May 8th.

Go ahead!

But... you'll never find the root cause... unfortunately...

KennyManrique · ‎2018-05-22

Thanks for the fast answer!

I will update this weekend to HFA103.

Let us know if you have a Segmentation Fault again.

Regards.

VENKAT_S_P · ‎2018-05-23

Thanks for the update, heard CP included many fixes in take 103.

Could you update to which model you applied?

William_Tavares · ‎2018-05-23

5400 and 5600

VENKAT_S_P · ‎2018-05-23

Thank you.

Hugo_Frauches · ‎2018-06-04

Hello,

Since the update to take 103 there was no problem with segmentation fault on the Gateways (5200 Appliance), i recommend the update for everyone!

Hugo_Frauches · ‎2018-06-07

Hello guys, i think everyone here have already updated to take 103, im doubt about decrease in performance with this new take, have anyone notice anything related to this?

I have an 5200 Cluster, the segmentation fault issue was gone but i think the CPU and Memory usage went higher!

Hugo_Frauches · ‎2018-07-23

Hello Guys,

Since i have updated our 5200 Cluster to Jumbo Hotfix Take 112 we had again the problem with Segmentation Fault. I wish to knwo if anyone its still having this issue?

KennyManrique · ‎2018-07-23

Hi Hugo,

After 45 days stable with HFA103 I had a segmentation fault again a week ago. I updated to HFA112 and apparently the issue has presented again. I will validate this and let you know any news.

You were able to get any kernel mode dump for CP's analysis??

Regards

Hugo_Frauches · ‎2018-07-24

Yes, we have sent a lot of Usermode Dumps for Checkpoint support, after analysis they have provided an Portfix (Again) to try solve this issue, i installed on both Gateways from the CLuster but after 2 days the Segmentation Fault error occured again, the support didnt update the case yet. I will update this thread and give more information when they reply. Its sad to know that this issue its comming back and for other CP Users too.

KennyManrique · ‎2018-07-24

Effectively we had a segmentation fault again with HFA112. We are waiting for TAC's updates.

The most strange issue for me was the kernel parameters requested by CP were enabled (bt_on_stk and panic_on_stk) after install HFA112 and should be generate a kernel mode dump; but I didnt get it. Had a lot usermode dumps but none of them are relevant for the case until now.

Regards.

JozkoMrkvicka · ‎2018-07-24

Hi there,

Just faced exactly the same issue.

Background in my case:

Upgraded 12600 node from R77.30 to R80.10 (via installer upgrade)

Installed THE LATEST take 112.

After reboot all was fine, till when I tried to execute command:

show users

After that, only a few users were displayed and I was disconnected from ssh session.

/vat/log/messages says something "segmentation fault".

I will try to test also "show configuration" and check if dome core or kernel dumps are created.

Kind regards,
Jozko Mrkvicka

Hugo_Frauches · ‎2018-07-24

Hello Jozko,

In my case and i think with other users too, when this issue happen on the gateway not only the "Show configuration" cmd gives the error "Segmentation Fault", i have an list of clish cmds that i can not issue when the gateway has the Segmentation fault problem:

show configuration

cphaprob stat

fw stat

cpview

ClusterXL_Admin (UP|Down)

cpstat fw

Because of that, to "solve" the issue i can only force reboot the gateway...

JozkoMrkvicka · ‎2018-07-24

Hi Hugo,

Well, I tried to execute all of your commands, except "show configuration" and failover (because only 1 node is on R80.10). No issue at all.

After upgrade I was executing tons of commands to see if we have some new info added and at some point I tried "show users" and boom, I got disconnected from SSH session like I described above.

I can try everything, because my guy is not in production.

Maybe I will open the case...

Kind regards,
Jozko Mrkvicka

JozkoMrkvicka · ‎2018-07-25

update:

Today I checked it again using the same command "show users" and I wasnt disconnected from ssh. I just got segmentation fault.

No core dump. kernel dump wasnt checked.

Tried "show configuration" and "clusterXL_admin up" without any issue.

Didnt have time to check it more...

Kind regards,
Jozko Mrkvicka

Bishal_Upadhyay · ‎2019-04-21

Hi Jozko,
Is there any solution to this "Segmentation fault" error?
We also got the same error on 5100 appliance having R80.10 Gaia OS.
With Regards,
Bishal

Timothy_Hall · ‎2019-04-21

A segmentation fault is just a process crashing which most of the time is due to a bug, and sometimes a resource limitation. Take a look in /var/log/dump/usermode, by default there will be the two most recent core dumps sitting there for each process. Generally you just need to upload these to TAC for analysis, but if you want to see the crash traceback and what function the crashed process was in the middle of when it crashed see here:

sk77080: How to use GDB to get function stack from User Mode process

Sometimes the traceback will provide actionable hints about what function the crashed process was trying to accomplish at the moment of death and may point you to the root cause of the issue; in some cases it may even allow you to determine a workaround.

Gaia 4.18 (R82) Immersion Tips, Tricks, & Best Practices Video Course
Now Available at https://shadowpeak.com/gaia4-18-immersion-course

Bishal_Upadhyay · ‎2019-04-21

Hi Timothy,
Thanks for the reply.
TAC told us to perform fresh installation of the appliance, and then restore the backup(backup should be from the period before the problem starts to occur).
We are searching for the alternative solution though. So, we might try crash traceback.

Bishal_Upadhyay · ‎2019-05-05

Finally, we did the fresh installation of the problematic appliance and restored the backup. It has been past 2 weeks and the problem seems to be solved.

Are you a member of CheckMates?

Segmentation fault