Re: SmartUpdate is extremely slow after MDS upgrad...

kamilazat · ‎2025-03-11

Hello everyone.

We are experiencing heavy lagging on SmartUpdate after upgrading from R81.10 to R81.20 JHF Take 98. It does NOT hang or freeze, but everything works extremely slowly. This did not happen before the update so something feels wrong with the current state of the system.

At first we thought that it is normal due to reindexing causing heavy load until it finishes, but it's been more than a week since the update and the lagging is still there. SmartConsole doesn't lag like SmartUpdate.

We tried following sk41793 and sk112334 for debugging on global domain (since we connect to SU from global) and SmartConsole, but none of the logs gave us anything. At this point we're kind of lost as to where (and how) to look further. cpwd doesn't show any restarts of any processes, and we also didn't see any complaints in cpd, messages etc. The only strange thing is that we see that fwm keeps writing so much information even after we turned off the debug parameters as described in sk41793:

[Expert@HostName]# fw debug fwm off TDERROR_ALL_SU=0
[Expert@HostName]# fw debug fwm off TDERROR_ALL_cprep=0
[Expert@HostName]# fw debug fwm off TDERROR_ALL_cpget=0
[Expert@HostName]# fw debug fwm off TDERROR_ALL_cpms=0
[Expert@HostName]# fw debug fwm off OPSEC_DEBUG_LEVEL=0
[Expert@HostName]# fw debug fwm off SU_DEBUG_LEVEL=0

Also, here's an output from top command, if anything:

top - 10:35:52 up 8 days, 16:11, 1 user, load average: 4.76, 4.88, 6.09
Threads: 2743 total, 10 running, 2733 sleeping, 0 stopped, 0 zombie
%Cpu0 : 21.0 us, 2.9 sy, 0.0 ni, 71.4 id, 1.9 wa, 0.0 hi, 2.9 si, 0.0 st
%Cpu1 : 13.3 us, 7.6 sy, 1.9 ni, 76.2 id, 0.0 wa, 1.0 hi, 0.0 si, 0.0 st
%Cpu2 : 19.2 us, 3.8 sy, 1.9 ni, 75.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu3 : 45.7 us, 7.6 sy, 2.9 ni, 42.9 id, 0.0 wa, 0.0 hi, 1.0 si, 0.0 st
%Cpu4 : 24.5 us, 3.8 sy, 2.8 ni, 67.0 id, 0.9 wa, 0.0 hi, 0.9 si, 0.0 st
%Cpu5 : 35.6 us, 8.7 sy, 3.8 ni, 51.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu6 : 49.1 us, 1.9 sy, 0.9 ni, 47.2 id, 0.0 wa, 0.9 hi, 0.0 si, 0.0 st
%Cpu7 : 24.8 us, 4.8 sy, 1.9 ni, 67.6 id, 0.0 wa, 0.0 hi, 1.0 si, 0.0 st
%Cpu8 : 50.5 us, 2.9 sy, 0.0 ni, 45.7 id, 0.0 wa, 0.0 hi, 1.0 si, 0.0 st
%Cpu9 : 18.1 us, 5.7 sy, 1.9 ni, 72.4 id, 1.9 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu10 : 23.1 us, 13.5 sy, 1.0 ni, 62.5 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu11 : 20.2 us, 10.6 sy, 1.0 ni, 68.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu12 : 18.1 us, 2.9 sy, 6.7 ni, 71.4 id, 1.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu13 : 15.2 us, 7.6 sy, 1.9 ni, 74.3 id, 1.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu14 : 29.0 us, 4.7 sy, 3.7 ni, 61.7 id, 0.9 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu15 : 26.4 us, 1.9 sy, 0.0 ni, 71.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 98416716 total, 12708860 free, 47303964 used, 38403892 buff/cache
KiB Swap: 33551748 total, 32049184 free, 1502564 used. 48368700 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ P COMMAND
9144 admin 20 0 2670296 2.318g 43640 R 61.0 2.5 394:32.66 9 cpd
6412 admin 20 0 1009428 673704 50348 S 58.1 0.7 698:26.05 12 fwm
15303 admin 20 0 22292 4320 2080 R 55.2 0.0 0:00.58 6 esc_db_c+
27786 cp_post+ 20 0 1819428 1.671g 1.552g R 32.4 1.8 96:26.72 3 postgres
14022 admin 20 0 25.035g 0.014t 12356 R 28.6 15.7 0:06.16 10 qtp28026+
13681 admin 20 0 25.035g 0.014t 12356 R 25.7 15.7 0:08.37 8 qtp28026+
6360 admin 20 0 25.035g 0.014t 12356 R 19.0 15.7 0:08.79 0 qtp28026+
13682 admin 20 0 25.035g 0.014t 12356 S 18.1 15.7 0:08.54 5 qtp28026+
11493 admin 20 0 25.035g 0.014t 12356 S 16.2 15.7 0:07.78 7 qtp28026+
12813 admin 20 0 25.035g 0.014t 12356 S 16.2 15.7 0:10.29 1 qtp28026+
12802 admin 20 0 25.035g 0.014t 12356 R 15.2 15.7 0:09.57 3 qtp28026+

Any ideas and recommendations would be very much appreciated.

Cheers!

the_rock · ‎2025-03-11

Just curious, is it same whether you open it from smart console or CP folder where guidbedit also "resides"?

Andy

kamilazat · ‎2025-03-11

Hi Andy!

Interesting approach. But the behavior is the same.

the_rock · ‎2025-03-11

Fair enough...what jumbo is installed?

Andy

kamilazat · ‎2025-03-11

JHF Take 89. We installed it right after upgrading.

the_rock · ‎2025-03-11

I know 98 (latest one) is recommended at this point. Not sure if its worth trying that take...

Andy

AkosBakos · ‎2025-03-11

Hi @kamilazat

take 96 has a relevant fix. Maybe this behaviour is the same:

Before you do any deeper investigation install the take 98 as @the_rock mentioned.

My experience: this lowered the load at least 40% of my SmartLog.

We have almost 4 MGMT with take 98, I can say, it is safe to install! We are waiting fr your feedback 🙂

Akos

----------------
\m/_(>_<)_\m/

the_rock · ‎2025-03-11

Hey @AkosBakos ...on slightly different note, though still relevant to jumbo 98, do you have it installed on any firewalls? I ask, since there was a post here by someone saying they installed it and it broke remote access. Thats pretty significant, since I had pretty large customer that has people connecting from all over world ask me about installing that jumbo hotfix, but I told them to wait, specifically because of that post.

Will see if I can find it.

Andy

kamilazat · ‎2025-03-11

@the_rock @AkosBakos oh my... That was an unfortunate typo. We have JHF Take 98 🙂

EDIT: We collected cpinfo from global MDS and HCP found that there are cpd coredumps that happened on 8th and 9th March. Do you think it may be relevant? It bugs me how an upgrade can cause cpd to crash.🤔

the_rock · ‎2025-03-11

No worries. Did you try cprestart to see if that helps?

Andy

AkosBakos · ‎2025-03-11

I don't suggest it, because it starts everything in te "same" time. #evstart starts only the indexing.

Akos

----------------
\m/_(>_<)_\m/

the_rock · ‎2025-03-11

True true, I just figured since its management, its pretty safe to do.

Andy

kamilazat · ‎2025-03-11

So you think it may be related to indexing? Can you elaborate on it a bit? I can't imagine how log indexing can be related. I'll need a good reason to propose a MW for that 🙂

the_rock · ‎2025-03-11

That was also my thinking, hence why I suggested cprestart.

Andy

AkosBakos · ‎2025-03-11

Hi @kamilazat

This can be a good reason for the maintatance window:

Some indexed logs are not visible. Although some queries yield results, some show empty lines.

https://support.checkpoint.com/results/sk/sk168315

----------------
\m/_(>_<)_\m/

kamilazat · ‎2025-03-11

@AkosBakos I will try to arrange a MW and run evstop/evstart at the soonest. I'll update you with the results.

the_rock · ‎2025-03-11

Screenshot attached.

Andy

AkosBakos · ‎2025-03-11

Hi @kamilazat

Without any reason, try #evstop then # evstart ot the MGMT.

This solved a lot of problems in the past.

Akos

----------------
\m/_(>_<)_\m/

the_rock · ‎2025-03-11

Not a bad idea 🙂

kamilazat · ‎2025-03-11

That will trigger reindexing, correct? If yes, it will take some time to report back due to maintenance window restrictions 🙂

AkosBakos · ‎2025-03-11

Yes, exactly!

----------------
\m/_(>_<)_\m/

the_rock · ‎2025-03-11

For what its worth, here is AI Copilot answer 🙂

Andy

There could be several reasons why SmartUpdate is slow after upgrading MDS to R81.20. Here are some potential causes and solutions:

Database Re-indexing:
- After an upgrade, the system may need to re-index the database, which can temporarily slow down operations. This is especially true if there are a large number of objects or logs to process.
Resource Utilization:
- The upgrade process might have left some temporary files or processes running, consuming system resources. Check for any unnecessary processes and clean up temporary files.
Configuration Issues:
- There might be configuration issues or mismatches that occurred during the upgrade. Ensure that all configurations are correctly set and that there are no conflicts.
Network Latency:
- Network issues can also cause delays. Ensure that the network connection between the MDS and the SmartUpdate server is stable and has low latency.
Jumbo Hotfixes:
- Ensure that you have applied the latest Jumbo Hotfix Accumulator for R81.20. These hotfixes often contain performance improvements and bug fixes that can resolve such issues.
Logs and Monitoring:
- Check the logs for any errors or warnings that might indicate the cause of the slowdown. Use monitoring tools to identify any bottlenecks in the system.

kamilazat · ‎2025-03-12

@the_rock @AkosBakos Instead of evstop/evstart, we rebooted the whole machine. Still the same.

HCP found two cpd core dumps on consecutive days. Maybe it's somewhat related?

At this point I'm more inclined to open a TAC ticket, but I'm also wide open to ideas.

the_rock · ‎2025-03-12

I would open a TAC case, good idea.

Andy

Are you a member of CheckMates?

SmartUpdate is extremely slow after MDS upgrade to R81.20