Snapshot / Policy verification - high CPU

S_E_ · ‎2021-09-28

Hi,

we really enjoy the feature 'add snapshot' on a MDS system and would like to run this frequently (2-4 times a week)
However, we recognized that taking a snapshot on a MDS really slows down the complete box.
According cpview 24 CPU's but only 1 at 100% usage, 23 are bored.

During this time,
-login via SmartConsole fails
-doing a policy verification took ages
-login via ssh to MDS took ages
Average load with top shows around 50, normal operation load is 2-5.
Healtcheck and other reports are fine.

During snapshot creation, this message appears frequently:

MDS1> show snapshots
Restore points:
---------------
SNAP1
AutoSnapShot885
Restore point now under creation:
---------------------------------
SNAP2_20092021 (32%)
NMSNAP0042 System is too busy, please try again in a few seconds.

Q:
- do you experience the same issue / behavior
- does R81.10 has the same behavior / will it solve the issue

Currently running Smart-1 MDS R80.40 T91, but same behavior was already seen with R80.30
Not sure if sk104788 also applies on R80.40

Regards

Timothy_Hall · ‎2021-09-29

This effect is almost certainly due to hard drive contention and not caused by only one CPU being utilized. If you run top while the snapshot is executing, you are likely to see a high waiting for I/O (wio) percentage reported. CPU usage by a process can be "niced" by reducing CPU priority (command nice), and so can I/O priority (ionice). Try this:

1) Start the snapshot

2) Confirm slow SmartConsole performance

3) There should be two processes running that are related to the snapshot: xfsdump and xfsrestore. Determine their two process IDs (PIDs) via top or ps.

4) Confirm that their current I/O priority is 0 (best effort FIFO) - ionice -p PID1 ; ionice -p PID2

5) Set their I/O priority to idle (lowest possible): ionice -p PID1 -c 3; ionice -p PID2 -c 3

6) Retest performance and see if it has improved while snapshot is running

There is probably a way to have the snapshot I/O priority set to idle every time a snapshot is invoked but that will almost certainly need to be done by Check Point. Note that this will make the snapshot take longer to complete (potentially MUCH longer).

Attend my Gateway Performance Optimization R81.20 course
CET (Europe) Timezone Course Scheduled for July 1-2

S_E_ · ‎2021-09-29

Hi,

Great.

Started now the first test in lab and will run later on prod devices.

Thanks a lot.

Regards

MDS-R8040> add snapshot perftest
Taking snapshot. You can continue working normally.
You can use the command 'show snapshots' to monitor creation progress.
MDS-R8040> exit

[Expert@MDS-R8040:0]# ps -aef | grep xfsd
admin 659 30308 0 16:18 ttyS0 00:00:00 grep --color=auto xfsd
admin 32433 31708 10 16:17 ? 00:00:05 /sbin/xfsdump -l 0 -F - /dev/vg_splat/lv_current_snap

[Expert@MDS-R8040:0]# ionice -p 32433
unknown: prio 0

[Expert@MDS-R8040:0]# ionice -p 32433 -c 3

[Expert@MDS-R8040:0]# ionice -p 32433
idle

Are you a member of CheckMates?

Snapshot / Policy verification - high CPU