Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Christian_Riede
Collaborator

pdpd logging off all agent users

In pepd.eld, we see the messages below. Shortly after, all users using an identity agent are logged off and need to reauthenticate. Until now, no answer from Checkpoint Support about that.

Has anyone seen those messages?

Did you find a reason for that behavior? Or even a fix?

18001 4105729808]@***********[4 Feb  8:51:10] [TRACKER]: #625961 -> OUTGOING -> IDENTITY_UPDATE -> pep (v4): 127.0.0.1pep (v6): , identity: UpdateInformation dump:
[18001 4105729808]@***********[4 Feb  8:51:12] [TRACKER]: #625962 -> OUTGOING -> IDENTITY_REVOKE -> to pep: 127.0.0.1 (ipv4);  (ipv6), RevokeInformation dump:
[18001 4105729808]@***********[4 Feb  8:51:15] [TRACKER]: #625963 -> OUTGOING -> IDENTITY_REVOKE -> to pep: 127.0.0.1 (ipv4);  (ipv6), RevokeInformation dump:
[18001]@***********[4 Feb  8:51:15] Warning:cp_timed_blocker_handler: A handler [0xf7cb6f10] blocked for 70 seconds.
[18001]@***********[4 Feb  8:51:15] Warning:cp_timed_blocker_handler: Handler info: Library [/opt/CPsuite-R80/fw1/lib/libpdplib.so], Function offset [0x4e9f10].
[18001]@***********[4 Feb  8:51:15] Warning:cp_timed_blocker_handler: Handler info: Nearest symbol name [_ZN3pdp23TimersExpirationManager26timersExpirationCheckEventEPv], offset [0x4e9f10].
[18001]@***********[4 Feb  8:51:42] Warning:cp_timed_blocker_handler: A handler [0xf7d442c0] blocked for 26 seconds.
[18001]@***********[4 Feb  8:51:42] Warning:cp_timed_blocker_handler: Handler info: Library [/opt/CPsuite-R80/fw1/lib/libpdplib.so], Function offset [0x5772c0].
[18001]@***********[4 Feb  8:51:42] Warning:cp_timed_blocker_handler: Handler info: Nearest symbol name [_ZN3pdp13StatusManager17updateStatusThunkEPv], offset [0x5772c0].
[18001 4105729808]@***********[4 Feb  8:51:43] [TRACKER]: #625964 -> INTERNAL -> DISCONNECTED_FROM_PEP -> 127.0.0.1
[18001]@***********[4 Feb  8:55:32] Warning:cp_timed_blocker_handler: A handler [0xf7cb6f10] blocked for 228 seconds.
[18001]@***********[4 Feb  8:55:32] Warning:cp_timed_blocker_handler: Handler info: Library [/opt/CPsuite-R80/fw1/lib/libpdplib.so], Function offset [0x4e9f10].
[18001]@***********[4 Feb  8:55:32] Warning:cp_timed_blocker_handler: Handler info: Nearest symbol name [_ZN3pdp23TimersExpirationManager26timersExpirationCheckEventEPv], offset [0x4e9f10].
[18001]@***********[4 Feb  8:56:13] Warning:cp_timed_blocker_handler: A handler [0xf7d442c0] blocked for 41 seconds.
[18001]@***********[4 Feb  8:56:13] Warning:cp_timed_blocker_handler: Handler info: Library [/opt/CPsuite-R80/fw1/lib/libpdplib.so], Function offset [0x5772c0].
[18001]@***********[4 Feb  8:56:13] Warning:cp_timed_blocker_handler: Handler info: Nearest symbol name [_ZN3pdp13StatusManager17updateStatusThunkEPv], offset [0x5772c0].
[18001 4105729808]@***********[4 Feb  8:56:16] [TRACKER]: #625965 -> INCOMING -> AGENT_REQUEST -> ip: 192.168.117.209, type: machine_kerb_auth
[18001]@***********[4 Feb  8:56:42] Warning:cp_timed_blocker_handler: A handler [0xf537e830] blocked for 26 seconds.
[18001]@***********[4 Feb  8:56:42] Warning:cp_timed_blocker_handler: Handler info: Library [/opt/CPshrd-R80/lib/libhttpsvc.so], Function offset [0x10830].
[18001]@***********[4 Feb  9:27:06] Warning:cp_timed_blocker_handler: A handler [0xf7cb6f10] blocked for 1824 seconds.
[18001]@***********[4 Feb  9:27:06] Warning:cp_timed_blocker_handler: Handler info: Library [/opt/CPsuite-R80/fw1/lib/libpdplib.so], Function offset [0x4e9f10].
[18001]@***********[4 Feb  9:27:06] Warning:cp_timed_blocker_handler: Handler info: Nearest symbol name [_ZN3pdp23TimersExpirationManager26timersExpirationCheckEventEPv], offset [0x4e9f10].
[18001 4105729808]@***********[4 Feb  9:27:14] [TRACKER]: #625966 -> INTERNAL -> CONNECTED_TO_PEP -> 127.0.0.1
[18001 4105729808]@***********[4 Feb  9:27:14] [TRACKER]: #625967 -> INTERNAL -> TASK_STARTED -> Starting task 2: PDP to PEP identities sync (127.0.0.1)
[18001]@***********[4 Feb  9:27:32] Warning:cp_timed_blocker_handler: A handler [0xf6c1b330] blocked for 17 seconds.
[18001]@***********[4 Feb  9:27:32] Warning:cp_timed_blocker_handler: Handler info: Library [/opt/CPsuite-R80/fw1/lib/libnac_server_shared_utils.so], Function offset [0x1c330].
[18001]@***********[4 Feb  9:27:32] Warning:cp_timed_blocker_handler: Handler info: Nearest symbol name [_ZN17NAC_SERVER_MODULE12TasksManager17T_startNextChunckEPv], offset [0x1c330].
[18001 4105729808]@***********[4 Feb  9:27:34] [TRACKER]: #625968 -> OUTGOING -> IDENTITY_REVOKE -> to pep: 127.0.0.1 (ipv4);  (ipv6), RevokeInformation dump:
[18001 4105729808]@***********[4 Feb  9:27:35] [TRACKER]: #625969 -> OUTGOING -> IDENTITY_UPDATE -> pep (v4): 127.0.0.1pep (v6): , identity: UpdateInformation dump:
[18001 4105729808]@***********[4 Feb  9:27:35] [TRACKER]: #625970 -> OUTGOING -> IDENTITY_UPDATE -> pep (v4): 127.0.0.1pep (v6): , identity: UpdateInformation dump:
[18001 4105729808]@***********[4 Feb  9:27:36] [TRACKER]: #625971 -> OUTGOING -> IDENTITY_UPDATE -> pep (v4): 127.0.0.1pep (v6): , identity: UpdateInformation dump:
[18001 4105729808]@***********[4 Feb  9:27:36] [TRACKER]: #625972 -> OUTGOING -> IDENTITY_UPDATE -> pep (v4): 127.0.0.1pep (v6): , identity: UpdateInformation dump:
[18001 4105729808]@***********[4 Feb  9:27:36] [TRACKER]: #625973 -> OUTGOING -> IDENTITY_UPDATE -> pep (v4): 127.0.0.1pep (v6): , identity: UpdateInformation dump:

14 Replies
Kaspars_Zibarts
Employee Employee
Employee

You would need to provide a bit more info about your setup - SW versions to start with, then how's your PDP and PEP and IDC/ADQA is configured? Are they on the same box or distributed? We can see from the log that PEP go disconnected.

0 Kudos
Christian_Riede
Collaborator

VSX Cluster, VLSLS mode. happening in one of the VSes with about 2000 users. R80.10 Jumbo 142 with additional private hotfixes. Users authenticate locally on the box, no identity sharing. We use Captive Portal with transparent Kerberos authentication and agent authentication. No AD query, no other authentication methods. Large scale guide (sk88520) has been followed. Setup was done by Checkpoint PS. Similar error on an old R77.30 Jumbo 338 ClusterXL. Here we do not see the mentioned error messages in the log, but we see a restart of the pdpd with the same result of all users being logged off when the problem starts.

0 Kudos
Kaspars_Zibarts
Employee Employee
Employee

Anything interesting in pdpd.elg at the same time?

We are on the same hotfix 142 / VSX with 30000 users and IDC. Seems fairly stable

I would create a case with CP, seems odd that PDP would restart on it's own, most likely a bug. Royi Priov‌ - what do you think Royi? (soory to drag you in again..) Smiley Happy

0 Kudos
Royi_Priov
Employee
Employee

Hi Christian Riede‌!

I'm not sure if the above output from pepd.elg is a result of a grep or it's the full flow, but here is my understanding:

  • Once pdp is disconnected from pep, after grace period all identities associated by this PDP (in your case, since there is no sharing) will be revoked.
  • It means that we need to focus on the reason for this line:

[18001 4105729808]@***********[4 Feb 8:51:43] [TRACKER]: #625964 -> INTERNAL -> DISCONNECTED_FROM_PEP -> 127.0.0.1

and we can see that only 30 min afterward the connection was established (no other lines before that in full debug?)

[18001 4105729808]@***********[4 Feb 9:27:14] [TRACKER]: #625966 -> INTERNAL -> CONNECTED_TO_PEP -> 127.0.0.1

can you check on pdpd.elg during this period of time (8:51 - 9:27) what happened?

was there a pdp crash? (/var/log/dump/usermode)

by the way - what is the SR number?

Thanks,

Royi Priov - Identity Awareness R&D.

Thanks,
Royi Priov
Group manager, Identity Awareness R&D
0 Kudos
Christian_Riede
Collaborator

S/R is 6-0000621394. Running since August last year.  We have had several Remote Sessions with Checkpoint. There is no pdpd crashdump in /var/log/dump/usermode - that was the first thing I checked when I saw a restart of the pdpd. pdpd.elg files have been uploaded numerous times with "cpinfo -f"

Here a graph of the number of users and hosts logged in via agent (pdp mon all | grep "Client Type: Identity Agent"|wc -l)

0 Kudos
Royi_Priov
Employee
Employee

I have contacted TAC escalation points to get more information.

Thanks for bringing this to my attention.

Thanks,

Royi Priov.

Thanks,
Royi Priov
Group manager, Identity Awareness R&D
Christian_Riede
Collaborator

Now Checkpoint is asking for an export of the smartcenter to replicate. Not that I did upload that with cpinfo last week in case anyone would need it... But files uploaded with cpinfo go to nirvana anyway most of the time.

0 Kudos
Royi_Priov
Employee
Employee

Hi Christian,

I had another email thread with TAC regarding this incident, and I gave them my direction for investigation. I will ask updates from them, but you can also welcome to update me with the progress.

Anyhow, I would like to suggest a different way of monitoring the agents, let me know if it works for you:

You are currently running a query on our database, which fetches all the sessions and going one by one and prints it (pdp monitor). On this output, you are filtering the agents' sessions.

I can suggest different approaches on R80.10 and north:

  1. SNMP query: 1.3.6.1.4.1.2620.1.38.9 will give you the result of all sessions (users + machines) for identity agent (see all SNMP OIDs of IDA in $FWDIR/conf/identity_server.cps)
  2. output of "cpstat identityServer -f logins", see example from my lab (no sessions were associated):

    [Expert@hugo1-take-421:0]# cpstat identityServer -f logins

    CaptivePortal: 0
    AD Query: 0
    Agent: 0
    Terminal Server: 0
    RADIUS Accounting: 0
    Remote Access: 0
    Identity Collector [Active Directory]: 0
    Identity Collector [Cisco ISE]: 0
    Identity Awareness API: 0
    Users With Anti-Spoofing Protection: 0
    Successful User Logins [Kerberos]: 0
    Successful Machine Logins [Kerberos]: 0
    Successful User Logins [User name and password]: 0
    Successful User Logins [AD Query]: 0
    Successful Machine Logins [AD Query]: 0
    Unsuccessful User Logins [Kerberos]: 0
    Unsuccessful Machine Logins [Kerberos]: 0
    Unsuccessful User Logins [User name and password]: 0

  3. Directly from the kernel table which stores the session counters:

    [Expert@hugo1-take-421:0]# fw tab -t pdp_monitor_counters -u | grep "<00000000, 00000015"
    <00000000, 00000015; 00000000>

    where the bolded 00000015 is for users connected with agent, 00000016 will be for machines, and 00000017 will be for both. 

Thanks,

Royi Priov, Identity Awareness R&D.

Thanks,
Royi Priov
Group manager, Identity Awareness R&D
0 Kudos
Christian_Riede
Collaborator

that's nice - thank you!

For CheckMK, you can use this local check:

#!/bin/bash
source /opt/CPshared/5.0/tmp/.CPprofile.sh
source /etc/profile.d/vsenv.sh

if fw vsx stat -v 2>&1 | grep -q "VSX is not supported on this platform" ; then
        VSLIST="0"
else
        VSLIST=`fw vsx stat -v | grep Trust | grep -v '<Not Applicable>' | grep -v 'SIC Status' | awk -F\| '{print $1}'`
        VSLIST="0 $VSLIST"
fi

for i in $VSLIST ; do
        vsenv $i >&/dev/null
        cpstat identityServer -f logins | grep : | awk -F: -v vs=$i '{gsub(/[ \[\]]/,"_",$1) ; gsub(/ /,"",$2); print "0 CPIA_" $1 "_vs" vs " " $1 "=" $2 " " $2 " " $1}'
done

Royi_Priov
Employee
Employee

Christian Riede wrote:

that's nice - thank you!

with pleasure.

btw - I forgot to mention, that in this command:

# fw tab -t pdp_monitor_counters -u | grep "<00000000, 00000015"
<00000000, 00000015; 00000000>

the bolded part is the counter value  - which means that 0 users are connected.

this value is in hex base, so in a case of 50 users connected, you will see "32".

Royi.

Thanks,
Royi Priov
Group manager, Identity Awareness R&D
0 Kudos
Christian_Riede
Collaborator

Why do I see Agent logins when Agent authentication is off in the firewall object of vs?

Why do i see different numbers on the two nodes? Shouldn't  that be synced?

(vs is currently active on node 2)

vs on node 1:

CaptivePortal:                                     580
AD Query:                                          0
Agent:                                             805
Terminal Server:                                   0
RADIUS Accounting:                                 0
Remote Access:                                     0
Identity Collector [Active Directory]:             0
Identity Collector [Cisco ISE]:                    0
Identity Awareness API:                            0
Users With Anti-Spoofing Protection:               0
Successful User Logins [Kerberos]:                 0
Successful Machine Logins [Kerberos]:              0
Successful User Logins [User name and password]:   0
Successful User Logins [AD Query]:                 0
Successful Machine Logins [AD Query]:              0
Unsuccessful User Logins [Kerberos]:               0
Unsuccessful Machine Logins [Kerberos]:            0
Unsuccessful User Logins [User name and password]: 0

vs on node 2:

CaptivePortal:                                     4801
AD Query:                                          0
Agent:                                             21
Terminal Server:                                   0
RADIUS Accounting:                                 0
Remote Access:                                     0
Identity Collector [Active Directory]:             0
Identity Collector [Cisco ISE]:                    0
Identity Awareness API:                            0
Users With Anti-Spoofing Protection:               0
Successful User Logins [Kerberos]:                 159
Successful Machine Logins [Kerberos]:              0
Successful User Logins [User name and password]:   1
Successful User Logins [AD Query]:                 0
Successful Machine Logins [AD Query]:              0
Unsuccessful User Logins [Kerberos]:               24
Unsuccessful Machine Logins [Kerberos]:            0
Unsuccessful User Logins [User name and password]: 0

0 Kudos
Royi_Priov
Employee
Employee

Hi,

If I remember the design correctly, each gateway reports it's own logins in the output you have presented.

The data is summarized in the kernel table, so if you will execute the "fw tab" command I added earlier, it should be the same number on both members. This is also the number presented in cpview.

Let me know if this helps.

Royi.

Thanks,
Royi Priov
Group manager, Identity Awareness R&D
0 Kudos
Christian_Riede
Collaborator

Your firewall just got stuck. Access to nac portal times out. Some people report websites not reachable. We see the following - any ideas?

[Expert@fw-reh-vsx1:3]# vsenv 3
Context is set to Virtual Device fw-reh-vsx1_fw-reh-vssurf1 (ID 3).
[Expert@fw-reh-vsx1:3]# cpstat identityServer -f logins

Failed to establish session with AMON server at 127.0.0.1:18192

[Expert@fw-reh-vsx1:3]# vsenv 8
Context is set to Virtual Device fw-reh-vsx1_fw-reh-vssurf2 (ID 8).
[Expert@fw-reh-vsx1:8]# cpstat identityServer -f logins

CaptivePortal:                                     2435
AD Query:                                          0
Agent:                                             4090
Terminal Server:                                   0
RADIUS Accounting:                                 0
Remote Access:                                     0
Identity Collector [Active Directory]:             0
Identity Collector [Cisco ISE]:                    0
Identity Awareness API:                            0
Users With Anti-Spoofing Protection:               0
Successful User Logins [Kerberos]:                 0
Successful Machine Logins [Kerberos]:              0
Successful User Logins [User name and password]:   0
Successful User Logins [AD Query]:                 0
Successful Machine Logins [AD Query]:              0
Unsuccessful User Logins [Kerberos]:               0
Unsuccessful Machine Logins [Kerberos]:            0
Unsuccessful User Logins [User name and password]: 0

[Expert@fw-reh-vsx1:8]#

0 Kudos
Ezequiel
Explorer

Hi Royi,

 

Checking your post I saw all contained OIDs inside $FWDIR/conf/identity_server.cps file as per your comment. I would need to monitor with SNMP the  following OID:

 

TABLE-NAME, "Identity Collector Sources"

TABLE-OID, 53.1

COLUMN, "Events Recieved", 5, UINT32, 0, TRUE, FALSE

So I try to do a snmwalk on 1.3.6.1.4.1.2620.1.38.53.1.5 but I get No Such Instance currently exists at this OID

Any ideas why? How can I get per SNMP the events received count? Thanks!

0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events