Mobile Access Blade with CIFS segfault?

Dreyfuss · ‎2020-04-14

Hi there! We are facing a problem with the 80.30 version of our gw.

Messages from /var/log/messages:

CIFS VFS: Send error in SessSetup = -11

CIFS VFS: cifs_mount failed w/return code = -11

CIFS VFS: Server X.X.X.X has not responded in 120 seconds. Reconnecting...

cvpnd[3728]: segfault at 76c ip 00000000f6f1f080 sp 00000000ffbbed60 error 4 in libCvpnIsLogging.so[f6f11000+15000]

Notice that:

1 - The server X.X.X.X is responding ICMP.

2 - The command df -h stucks and takes a long time to respond. Sometimes freezes the ssh connection to the GW.

3 - The command top freezes the terminal too.

When I make a manual switchover to the other secutiry appliance, everything gets fine.

Thanks all in advance, Any help is welcome!

PhoneBoy · ‎2020-04-14

This is probably going to require a TAC case to troubleshoot.

Dreyfuss · ‎2020-04-14

In a direct way, I already knew that would be the answer, but it never hurts to ask, right? Thank you so much!

Dreyfuss · ‎2020-06-25

Update on this case:
After running cluster_adminXL and cpstop in one of the sgw in the cluster (with 2 sgws), the CIFS VFS -11 error appears on /var/log/messages, but the access doesn't stops (such as when clustered) and segfault error does not appear too. Probably the issue is in the cluster arch. (80.30)
After 2 months, TAC remains silent on this case.

PhoneBoy · ‎2020-06-27

Do you have a precise set of steps that reproduce this issue?

Dreyfuss · ‎2020-06-29

Hi there!

1 - Enabling the ClusterXL (Sync by exclusive IF) - all functional.

2 - Defining a File Sharing in Mobile Access (indexing by IP, not the FQDN) to the File Server (one per group of users - 45 mounting points in total).

3 - Defining the access only for the shared folder to the user's group.

This cause a error CIFS VFS describe in /var/log/message:

Jun 26 18:03:08 2020 XXXX kernel: CIFS VFS: cifs_mount failed w/return code = -112
Jun 26 18:08:41 2020 XXXX kernel: CIFS VFS: Send error in SessSetup = -11
Jun 26 18:08:41 2020 XXXX kernel: CIFS VFS: cifs_mount failed w/return code = -11
Jun 26 18:08:41 2020 XXXX kernel: CIFS VFS: Send error in SessSetup = -11
Jun 26 18:08:46 2020 XXXX kernel: CIFS VFS: Send error in SessSetup = -11
Jun 26 18:08:46 2020 XXXX kernel: CIFS VFS: cifs_mount failed w/return code = -112

After these errors, with the other Security Gateway (let's call YYYY) started, the Mobile Access Portal (only in the File Sharing portion of the site) stops working. All other functions, like SNX, Native Applications and Web Applications runs normally.

So, I was need to execute the following commands to recover:

clusterXL_adminXL down;cvpnstop;cvpnstart;clusterXL_admin up (the cvpnrestart do not work at all - strange).

When I tried to solve this problem, taking out the YYYY from the cluster com cpstop, for my suprise, these errors remains, but the File Sharing system do not stucks, I mean, the system freezes for 1 minute (or less), but recovers by itself.
When in cluster, It used to take 30 minutes or more and now, the segfault error do not appear anymore.

When I was Googling this error (CIFS -11), it describes that the server is unreacheable, but the server responds pings normally during the occurrence of this issue.

This issue occurs at anytime. Is completely aleatory even non-working hours.

Dreyfuss · ‎2020-06-30

Update
Messages from cvpnd.elg when the issue occurrs:
[30 Jun 14:56:26] T_event_fdclr_epoll: failed to clear socket: 30 from epoll set: Bad file descriptor
[30 Jun 14:56:26] T_event_fdclr_epoll: failed to clear socket: 31 from epoll set: Bad file descriptor
[30 Jun 14:56:26] T_event_fdclr_epoll: failed to clear socket: 32 from epoll set: Bad file descriptor
[30 Jun 14:56:26] T_event_fdclr_epoll: failed to clear socket: 33 from epoll set: Bad file descriptor
[30 Jun 14:56:26] T_event_fdclr_epoll: failed to clear socket: 34 from epoll set: Bad file descriptor
[30 Jun 14:57:07] T_event_fdclr_epoll: failed to clear socket: 30 from epoll set: Bad file descriptor
[30 Jun 14:57:07] T_event_fdclr_epoll: failed to clear socket: 31 from epoll set: Bad file descriptor
[30 Jun 14:57:07] T_event_fdclr_epoll: failed to clear socket: 32 from epoll set: Bad file descriptor
[30 Jun 14:57:07] T_event_fdclr_epoll: failed to clear socket: 33 from epoll set: Bad file descriptor
[30 Jun 14:57:07] T_event_fdclr_epoll: failed to clear socket: 34 from epoll set: Bad file descriptor

PhoneBoy · ‎2020-06-30

PM me the TAC SR.

Dreyfuss · ‎2020-07-01

Hi! I already sent you via PM. Thanks again!

Dreyfuss · ‎2021-03-23

Hi!

I created another topic with the solution:

https://community.checkpoint.com/t5/Remote-Access-VPN/Mobile-access-blade-with-segfault/td-p/114354

Are you a member of CheckMates?

Mobile Access Blade with CIFS segfault?