Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Arturxr
Contributor
Jump to solution

VIP ping delay and active addresses of the MGMT interface

Hello, we've discovered that we have high ping latency to the VIP address and the active node address on our Check Point cluster on the MGMT IP. This seems to be causing communication issues between the Identity Collector and Check Point, as timeout errors periodically occur, and some events may not reach Check Point. Furthermore, the pdp monitor user command output returns the message "daemon did not respond or not running!", while the pdp monitor ip command displays a result.
I also can't run the cpinfo -y all command (the output freezes), and the gateway periodically appears red in the Smart Console.

Could you tell me what could be causing the ping latency to the VIP address and the active addresses of the MGMT interface?

0 Kudos
1 Solution

Accepted Solutions
Arturxr
Contributor

it seems that the problem has been resolved, we will monitor it, we have done core balancing, according to https://sc1.checkpoint.com/documents/R81.10/WebAdminGuides/EN/CP_R81.10_PerformanceTuning_AdminGuide...

in short,: set dynamic-balancing state enable

reboot

View solution in original post

38 Replies
the_rock
MVP Diamond
MVP Diamond

Hey @Arturxr 

When did this issue happen? Btw, for what is worth, can you ensure time is adjusted and correct? Sometimes that can definitely cause these problems, seen it few times.

Best,
Andy
"Have a great day and if its not, change it"
0 Kudos
Arturxr
Contributor

I checked the time, everything is set up correctly. The problem was noticed recently when some users stopped being identified in the access role, ping delays were accidentally detected, and judging by the schedule in the monitoring system, there were always delays, usually during the entire working hours, when there was a lot of user traffic. But after restarting the cluster, the situation seemed to worsen and delays increased even more.

0 Kudos
the_rock
MVP Diamond
MVP Diamond

Fair enough. Are you able to figure out based on the policy revisions if there were any changes that could have impacted this behavior?

Best,
Andy
"Have a great day and if its not, change it"
0 Kudos
Arturxr
Contributor

we did not make any changes to the policy, the problem arose on its own.

0 Kudos
the_rock
MVP Diamond
MVP Diamond

Do you see any drops if you try zdebug and grep for specific ip?

fw ctl zdebug + drop | grep x.x.x.x (just replace x.x.x.x with affected ip address)

Best,
Andy
"Have a great day and if its not, change it"
0 Kudos
Arturxr
Contributor

I tried looking at the Identity Collector address, as well as the address of the active node, but I didn't notice any drops.

0 Kudos
Vincent_Bacher
MVP Silver
MVP Silver

The error message ‘daemon did not respond or not running!’ for pdp commands usually appears when the system has a load spike or is generally overloaded.
Have you used top to check whether the CPU utilisation is OK or too high?

and now to something completely different - CCVS, CCAS, CCTE, CCCS, CCSM elite
0 Kudos
Arturxr
Contributor

Yes, the strangest thing is that the load is not high and does not exceed 48% during the day, but there are problems as if the load were high.

0 Kudos
Vincent_Bacher
MVP Silver
MVP Silver

Nevertheless, it would be interesting to see what problem the pdpd daemon has, because normally the message is only displayed when there is a load problem.
To be on the safe side, I would analyse the daemon using perf.
First, collect data with

perf record -p $(pidof pdpd)

Then wait a while and cancel with Ctrl+C.
Then display the result with
perf report
.
It may be that everything is OK, but it's better to be safe than sorry. As I said, “daemon did not respond or not running!” does not appear without reason.

and now to something completely different - CCVS, CCAS, CCTE, CCCS, CCSM elite
0 Kudos
Arturxr
Contributor

I'll try to collect it tomorrow, since now user traffic has dropped and it looks like Check Point has started to feel free, but still it seems to me that the root of the problem lies in packet delays on the mgmt interface.

0 Kudos
the_rock
MVP Diamond
MVP Diamond

Can you try below when issue is there? Just replace eth0 with right mgmt name:


[Expert@CP-GW:0]# ethtool -S eth0
NIC statistics:
Tx Queue#: 0
TSO pkts tx: 0
TSO bytes tx: 0
ucast pkts tx: 28236
ucast bytes tx: 7204990
mcast pkts tx: 0
mcast bytes tx: 0
bcast pkts tx: 0
bcast bytes tx: 0
pkts tx err: 0
pkts tx discard: 0
drv dropped tx total: 0
too many frags: 0
giant hdr: 0
hdr err: 0
tso: 0
ring full: 0
pkts linearized: 0
hdr cloned: 0
giant hdr: 0
Tx Queue#: 1
TSO pkts tx: 0
TSO bytes tx: 0
ucast pkts tx: 3385966
ucast bytes tx: 330360905
mcast pkts tx: 0
mcast bytes tx: 0
bcast pkts tx: 0
bcast bytes tx: 0
pkts tx err: 0
pkts tx discard: 0
drv dropped tx total: 0
too many frags: 0
giant hdr: 0
hdr err: 0
tso: 0
ring full: 0
pkts linearized: 0
hdr cloned: 0
giant hdr: 0
Tx Queue#: 2
TSO pkts tx: 0
TSO bytes tx: 0
ucast pkts tx: 63375
ucast bytes tx: 6192225
mcast pkts tx: 0
mcast bytes tx: 0
bcast pkts tx: 0
bcast bytes tx: 0
pkts tx err: 0
pkts tx discard: 0
drv dropped tx total: 0
too many frags: 0
giant hdr: 0
hdr err: 0
tso: 0
ring full: 0
pkts linearized: 0
hdr cloned: 0
giant hdr: 0
Tx Queue#: 3
TSO pkts tx: 0
TSO bytes tx: 0
ucast pkts tx: 3371959
ucast bytes tx: 291134280
mcast pkts tx: 0
mcast bytes tx: 0
bcast pkts tx: 2
bcast bytes tx: 84
pkts tx err: 0
pkts tx discard: 0
drv dropped tx total: 0
too many frags: 0
giant hdr: 0
hdr err: 0
tso: 0
ring full: 0
pkts linearized: 0
hdr cloned: 0
giant hdr: 0
Tx Queue#: 4
TSO pkts tx: 0
TSO bytes tx: 0
ucast pkts tx: 57490
ucast bytes tx: 6673403
mcast pkts tx: 0
mcast bytes tx: 0
bcast pkts tx: 0
bcast bytes tx: 0
pkts tx err: 0
pkts tx discard: 0
drv dropped tx total: 0
too many frags: 0
giant hdr: 0
hdr err: 0
tso: 0
ring full: 0
pkts linearized: 0
hdr cloned: 0
giant hdr: 0
Tx Queue#: 5
TSO pkts tx: 0
TSO bytes tx: 0
ucast pkts tx: 3357203
ucast bytes tx: 289774189
mcast pkts tx: 0
mcast bytes tx: 0
bcast pkts tx: 0
bcast bytes tx: 0
pkts tx err: 0
pkts tx discard: 0
drv dropped tx total: 0
too many frags: 0
giant hdr: 0
hdr err: 0
tso: 0
ring full: 0
pkts linearized: 0
hdr cloned: 0
giant hdr: 0
Tx Queue#: 6
TSO pkts tx: 0
TSO bytes tx: 0
ucast pkts tx: 23619
ucast bytes tx: 5230029
mcast pkts tx: 0
mcast bytes tx: 0
bcast pkts tx: 0
bcast bytes tx: 0
pkts tx err: 0
pkts tx discard: 0
drv dropped tx total: 0
too many frags: 0
giant hdr: 0
hdr err: 0
tso: 0
ring full: 0
pkts linearized: 0
hdr cloned: 0
giant hdr: 0
Tx Queue#: 7
TSO pkts tx: 0
TSO bytes tx: 0
ucast pkts tx: 51158
ucast bytes tx: 7085314
mcast pkts tx: 0
mcast bytes tx: 0
bcast pkts tx: 0
bcast bytes tx: 0
pkts tx err: 0
pkts tx discard: 0
drv dropped tx total: 0
too many frags: 0
giant hdr: 0
hdr err: 0
tso: 0
ring full: 0
pkts linearized: 0
hdr cloned: 0
giant hdr: 0
Rx Queue#: 0
LRO pkts rx: 370577
LRO byte rx: 570807857
ucast pkts rx: 10543134
ucast bytes rx: 2463264301
mcast pkts rx: 0
mcast bytes rx: 0
bcast pkts rx: 212155
bcast bytes rx: 21748729
pkts rx OOB: 0
pkts rx err: 0
drv dropped rx total: 0
err: 0
fcs: 0
rx buf alloc fail: 0
Rx Queue#: 1
LRO pkts rx: 0
LRO byte rx: 0
ucast pkts rx: 0
ucast bytes rx: 0
mcast pkts rx: 0
mcast bytes rx: 0
bcast pkts rx: 0
bcast bytes rx: 0
pkts rx OOB: 0
pkts rx err: 0
drv dropped rx total: 0
err: 0
fcs: 0
rx buf alloc fail: 0
Rx Queue#: 2
LRO pkts rx: 0
LRO byte rx: 0
ucast pkts rx: 0
ucast bytes rx: 0
mcast pkts rx: 0
mcast bytes rx: 0
bcast pkts rx: 0
bcast bytes rx: 0
pkts rx OOB: 0
pkts rx err: 0
drv dropped rx total: 0
err: 0
fcs: 0
rx buf alloc fail: 0
Rx Queue#: 3
LRO pkts rx: 0
LRO byte rx: 0
ucast pkts rx: 0
ucast bytes rx: 0
mcast pkts rx: 0
mcast bytes rx: 0
bcast pkts rx: 0
bcast bytes rx: 0
pkts rx OOB: 0
pkts rx err: 0
drv dropped rx total: 0
err: 0
fcs: 0
rx buf alloc fail: 0
Rx Queue#: 4
LRO pkts rx: 0
LRO byte rx: 0
ucast pkts rx: 0
ucast bytes rx: 0
mcast pkts rx: 0
mcast bytes rx: 0
bcast pkts rx: 0
bcast bytes rx: 0
pkts rx OOB: 0
pkts rx err: 0
drv dropped rx total: 0
err: 0
fcs: 0
rx buf alloc fail: 0
Rx Queue#: 5
LRO pkts rx: 0
LRO byte rx: 0
ucast pkts rx: 0
ucast bytes rx: 0
mcast pkts rx: 0
mcast bytes rx: 0
bcast pkts rx: 0
bcast bytes rx: 0
pkts rx OOB: 0
pkts rx err: 0
drv dropped rx total: 0
err: 0
fcs: 0
rx buf alloc fail: 0
Rx Queue#: 6
LRO pkts rx: 0
LRO byte rx: 0
ucast pkts rx: 0
ucast bytes rx: 0
mcast pkts rx: 0
mcast bytes rx: 0
bcast pkts rx: 0
bcast bytes rx: 0
pkts rx OOB: 0
pkts rx err: 0
drv dropped rx total: 0
err: 0
fcs: 0
rx buf alloc fail: 0
Rx Queue#: 7
LRO pkts rx: 0
LRO byte rx: 0
ucast pkts rx: 0
ucast bytes rx: 0
mcast pkts rx: 0
mcast bytes rx: 0
bcast pkts rx: 0
bcast bytes rx: 0
pkts rx OOB: 0
pkts rx err: 0
drv dropped rx total: 0
err: 0
fcs: 0
rx buf alloc fail: 0
tx timeout count: 0
[Expert@CP-GW:0]#

Best,
Andy
"Have a great day and if its not, change it"
0 Kudos
Vincent_Bacher
MVP Silver
MVP Silver

A brief version and easier to read would be netstat -ni or watch netstat -ni to monitor if errors are increasing and how fast.

netstat -ni
Kernel Interface table
Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
eth0 1500 0 56492512 0 1719 0 54322958 0 0 0 BMRU
lo 65536 0 81189170 0 0 0 81189170 0 0 0 LMdPORU




and now to something completely different - CCVS, CCAS, CCTE, CCCS, CCSM elite
the_rock
MVP Diamond
MVP Diamond

Yep...or even ethtool -S may help too.

Best,
Andy
"Have a great day and if its not, change it"
0 Kudos
Arturxr
Contributor

no errors detected by rx and tx

0 Kudos
Vincent_Bacher
MVP Silver
MVP Silver

So, as I suspected, the issie with ping replies seems to be more of a symptom than the cause, which points more to a load issue.

and now to something completely different - CCVS, CCAS, CCTE, CCCS, CCSM elite
0 Kudos
Arturxr
Contributor

no stats available

0 Kudos
Vincent_Bacher
MVP Silver
MVP Silver

For me it sounds that there are multiple symptoms and the question is, which is the root cause. The packet delay could as well be just a symptom. But hard to say at this point.

and now to something completely different - CCVS, CCAS, CCTE, CCCS, CCSM elite
the_rock
MVP Diamond
MVP Diamond

Could be "red herring", as they say...hard to tell, for sure.

Best,
Andy
"Have a great day and if its not, change it"
0 Kudos
Vincent_Bacher
MVP Silver
MVP Silver

However, since even ‘cpinfo -y all’ freezes when the problem occurs, I wonder whether this can really be attributed to the interface. Of course, the output of the command may be delayed due to network problems, but this can be determined by calling ‘time cpinfo -y all’.

 time cpinfo -y all | tail -n 1

This is Check Point CPinfo Build 914000250 for GAIA


real    0m2.280s
user    0m1.170s
sys     0m0.958s

If the command takes forever to execute according to the time display, I believe that this is less of a problem with the mgmt interface. Or, to be more precise, it may not be the only problem.

and now to something completely different - CCVS, CCAS, CCTE, CCCS, CCSM elite
0 Kudos
Arturxr
Contributor

real 3m17.391s
user 0m5.854s
sys 0m7.796s

0 Kudos
Arturxr
Contributor

from the suspicious, the output showed the following:
25.95% pdpd [kernel.kallsyms] [k] native_queued_spin_lock_slowpath

0 Kudos
Vincent_Bacher
MVP Silver
MVP Silver

I guess, a high percentage of CPU time spent in native_queued_spin_lock_slowpath indicates significant contention and potential performance issues with the pdpd daemon.
I'd say this could point to suboptimal CoreXL/SND configuration, such as allocating too many SND cores, which can lead to excessive locking overhead.

However, I doubt that adjusting CoreXL settings specifically for PDP would be advisable.

Overall, this supports my assumption that offloading IDC → Firewall connections by introducing a dedicated, upstream PDP instance could improve the situation.

and now to something completely different - CCVS, CCAS, CCTE, CCCS, CCSM elite
0 Kudos
Arturxr
Contributor

can you tell me if there is a guide on how to do this, that is, is it possible to allocate an entire core for one process?

0 Kudos
Vincent_Bacher
MVP Silver
MVP Silver

Regarding my assumption about the process, maybe @Timothy_Hall  is best contact for performance improvement. He is way better than myself when it comes to performance optimisation.

and now to something completely different - CCVS, CCAS, CCTE, CCCS, CCSM elite
0 Kudos
the_rock
MVP Diamond
MVP Diamond

Thats actually super valid point, Vince.

Best,
Andy
"Have a great day and if its not, change it"
0 Kudos
Timothy_Hall
MVP Gold
MVP Gold

Sounds to me like memory problems which are leading to network problems.  Please provide the output of the Super Seven run on the problematic gateway for further analysis:

https://community.checkpoint.com/t5/Scripts/S7PAC-Super-Seven-Performance-Assessment-Commands/m-p/40...

Also are you using the Identity Collector software?  If not, trying to perform all IA functions on the gateway itself can overload the pdpd daemon and cause IA issues.

New Book: "Max Power 2026" Coming Soon
Check Point Firewall Performance Optimization
0 Kudos
Arturxr
Contributor

I ran the script, everything looks good in general, but something is still confusing.:

fwaccel stats -s
Accelerated conns/Total conns : 143/89086 (0%)
LightSpeed conns/Total conns : 0/89086 (0%)
Accelerated pkts/Total pkts : 54565446822/61758703001 (88%)
LightSpeed pkts/Total pkts : 0/61758703001 (0%)
F2Fed pkts/Total pkts : 7193256179/61758703001 (11%)
F2V pkts/Total pkts : 265061601/61758703001 (0%)
CPASXL pkts/Total pkts : 47012465094/61758703001 (76%)
PSLXL pkts/Total pkts : 7121643784/61758703001 (11%)
CPAS pipeline pkts/Total pkts : 0/61758703001 (0%)
PSL pipeline pkts/Total pkts : 0/61758703001 (0%)
CPAS inline pkts/Total pkts : 0/61758703001 (0%)
PSL inline pkts/Total pkts : 0/61758703001 (0%)
QOS inbound pkts/Total pkts : 0/61758703001 (0%)
QOS outbound pkts/Total pkts : 0/61758703001 (0%)
Corrected pkts/Total pkts : 0/61758703001 (0%)

especially: Accelerated conns/Total conns : 143/89086 (0%)

0 Kudos
Timothy_Hall
MVP Gold
MVP Gold

Please post the full Super Seven results along with the output of enabled_blades.

I'm assuming that fwaccel stat reports that Accept templates are fully enabled?  If so a zero templating rate is caused by policy layer construction or the use of protocol signatures with services in the policy.  Please post the output of fwaccel templates -R for further diagnosis.

 

New Book: "Max Power 2026" Coming Soon
Check Point Firewall Performance Optimization
0 Kudos
Arturxr
Contributor

fwaccel templates -R

fwaccel: illegal option -- R
Usage: fwaccel templates <options>

Options:
-m <max entries> - max number of entries to print
-s - print only the number of offloaded templates
-S - prints statistics
-d - prints drop templates
-h - this help message

Accept templates flags (one or more of the below flags):
U - unidirectional
N - NAT
A - accounted
S - pxl enabled
Q - qxl enabled
I - NAC enabled
O - created for rule with/below dynamic object
X - created for NAT rule with translated dynamic object
E - created for NAT rule with IDA object
M - created for rule with/below domain object
T - created for rule with/below time object
Z - created for rule with/below Sec Zone object
B - created for rule with/below IDA support object
R - created for rule with/below Traceroute object
P - created for a connection that may match on a service with src port
Drop templates flags (one or more of the below flags):
D - drop template
L - log drop action

0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events