Solved: Snapshots are not created after installing HFA Tak...

Stephan_Lache · ‎2022-07-31

Dear Check Mates,

i have an issue with creating snapshots on my security gateways (6600 Appliances)

After the HFA Take66 was installed successfully, the snapshot creation process runs

normal, but after 100% progress no snapshot is visible in the Gaia Portal>Snapshot Management.

Any ideas on this?

Thanks in advance.

Stephan

I added the log file.

ptuttle_2 · ‎2022-08-03

Hello;

We ran into this as well and the fix was to apply a new file provided by TAC "au.cpbak"

These are the steps we had to do.

Change the au.cpbak in this folder : /var/CPsnapshot/schemes/ with the one provided in the SFTP: au.cpbak
rm /tmp/.lock.backup_restore
umount /mnt/backup
rm -rf /mnt/backup
if /var/log/BKPAutoUpdater exist run:{}
unlink /var/log/BKPAutoUpdater
rm -rf /var/log/BKPAutoUpdater

View solution in original post

Stephan_Lache · ‎2022-07-31

After 2 reboots of the gateways the snapshot creation works like expected.

Paul_Gademsky · ‎2022-08-01

@Stephan_Lache

I encountered the same issue after putting hfa66 on a SmartEvent server and one member of a cluster (the SMS and other cluster member created a snapshot with a problem.

In talking to TAC, it appears that the devices may have been updating their IPS files and this caused the issue. When I tried it again after ~24 hours, it was successful with no other changes. Not saying it's the real answer, but at least it's a reason they have seen previously.

the_rock · ‎2022-08-01

Never had that issue on either sms or fw.

MarNeu · ‎2022-08-02

I discovered this issue already in Take61 ongoing.

After a long not very pleasent ticket with TAC (SR#6-0003273294) the found the solution by replacing the file /var/CPsnapshot/schemes/au.cpbak with a new one provided.

This solved the issue.

But since CheckPoint software management process is nothing but sad this is still not solved in Take 66 GA and not even on the upcoming resolved issues list for the next take.

FYI: snapshot from CLI (clish) did always work for me which was a good enough workaround.

clish > add snapshot-onetime name "NAME" description "TEST"

If someone from CheckPoint reads this, please address it internally.

Thanks a lot

_Val_ · ‎2022-08-03

@MarNeu let me see what I can do here... Please allow some delay with the internal inquiry.

Paul_Gademsky · ‎2022-08-03

You can add SR 6-0003349393 to the list (it's in pending status)

Gregory_Azratz · ‎2022-08-03

Hi @MarNeu

we are checking this issue.

Thanks,

Gregory

Gregory_Azratz · ‎2022-08-04

We are working on integrating the fix to future R81.10 jumbo.
the suggested WA with replacing the au.cpbak is correct, as the provided file by TAC contains the fix to the snapshot issue.

Thanks,
Gregory

Dov_Fraivert · ‎2022-08-10

The fix will be included in the next Jumbo of R81.10.

ptuttle_2 · ‎2022-08-10

Thanks for the update.

Stephan_Lache · ‎2022-08-03

Hi MarNeu,

thanks for the information,that it’s working with clish!

best

Stephan

ptuttle_2 · ‎2022-08-03

Hello;

We ran into this as well and the fix was to apply a new file provided by TAC "au.cpbak"

These are the steps we had to do.

Change the au.cpbak in this folder : /var/CPsnapshot/schemes/ with the one provided in the SFTP: au.cpbak
rm /tmp/.lock.backup_restore
umount /mnt/backup
rm -rf /mnt/backup
if /var/log/BKPAutoUpdater exist run:{}
unlink /var/log/BKPAutoUpdater
rm -rf /var/log/BKPAutoUpdater

ptuttle_2 · ‎2022-08-03

Well that was short lived. Just finished patching 8 gateways to Jumbo Take 66 and back to snapshots disappearing after 99% complete.

Looks like have to apply the fix again !!!

the_rock · ‎2022-08-03

Im just testing this on 2 lab fw's with take 66 and will update the results.

the_rock · ‎2022-08-03

Yea...sadly, same issue, so hopefully it gets fixed.

ptuttle_2 · ‎2022-08-03

Applied the fix and can take a snapshot now (or at least until the next jumbo) This problem also effects taking backups.

Does not seem to have the problem on the manager's though. Only the gateway's.

-pat

Daniel_ · ‎2022-08-08

So everyone is using a GA take without a useful snapshot function since (more then) one week? The only "solution" is to open a TAC?

I don't have a problem with a bug in a new software, but I have a problem if there is a solution for more then one week and every admin runs in this issue. Why you don't release a new GA version with this bug fixed? Running a GW without a snapshotfuncion is not an option.

MarNeu · ‎2022-08-08

The solution is there since 30th of June and the TAC ticket was closed with the following comment:

The change of file (au.cpbak) will be integrated in next jumbo HF, please let me know if you have any queries or are we good to close the SR?

Really unhappy with the situation...

_Val_ · ‎2022-08-08

What is the issue here, exactly? TAC is not managing Jumbo production, R&D is responsible for that. You do have a fix/workaround from TAC, to buy you time before the fix will be added to Jumbo, or do I miss something?

_Val_ · ‎2022-08-08

@Daniel_ Jumbp fix is called a Jumbo because it consolidates quite a few fixes. There is an extensive QA cycle to make sure that the package does not cause any degradation. It does take more than a week to run it and to prepare the release.

If you do need an immediate fix, go to TAC meanwhile.

This approach is not something new, we work in this manner for about two decades...

MarNeu · ‎2022-08-08

I'm sorry Val but if the QA cicle would be that intense that it takes more then 40 days for a small but important fix, I would expect that the snapshot function would be tested before releasing Take 61 or Take 66 GA...

For me (as well as others) that is definitly a service degradation.

_Val_ · ‎2022-08-08

I certainly understand your point of view. What I am trying to say, things like that are the exact reason why we should not rush into releasing new JHAs too fast.

abihsot__ · ‎2022-08-08

I agree, QA is just not sufficient. Whenever you have any problem, TAC pushes you to have the latest JHF, however I think that many customers becoming reluctant to go for latest one because of such basic bugs. And then the chicken-egg problem, relatively small number of deployments use latest JHF, which in turn does not highlight such bugs.

the_rock · ‎2022-08-09

I could not agree more, you said it exactly right. It would be so much better to take time in releasing these jumbo hotfixes, if these sort of bugs/problems wont exist.

dstradins · ‎2022-10-04

I think what would be more useful here, given its a crucial part of the backup/recovery process is if the au.cpbak file could be provided in a download somewhere within a SK doc - we shouldnt have to contact TAC to obtain that file in order to perform something as important as a snapshot backup....

Just following up on _Val_'s comments.... I know you have said there is a testing process, but surely this testing process should test snapshots, backups and config backups...

I too, like a lot of other admins, are resorting to deleting the lock file(s), rebooting then having a 50/50 chance of a snapshot working... if not, repeat... after 2-3 times you get a snapshot...!

The above is relatively 'OK' for those with HA gateway deployments but when on a single gateway those reboots are not appreciated...

I am finding it harder to justify 'choosing Checkpoint' - we do have other firewalls - and whilst they all do have issues in one form or another I've ALWAYS been able to back them up and restore them....!

Come on Checkpoint... you need to really do better than this.

jberg712 · ‎2022-08-04

We experienced this as well on our SmartEvent appliance.

I did some testing and it's strange because I can attempt the snapshot 2-3 times (which each time you have to reboot because you can't start another one because the appliance thinks it's still attempting to do a backup/snapshot), but then on the 3rd-4th time, it actually completes.

The snapshot logs end with these lines on the failures:

pigz: write error code 28
pigz: abort: write error on <stdout>
tar: snap_log_backup.tgz: Wrote only 8192 of 10240 bytes
tar: Error is not recoverable: exiting now

[Fri Jul 22 09:56:14 2022]: Info:'cleanup': /mnt/backup/tmp/snap_log_backup.tgz has been removed.

OR

pigz: write error code 28
pigz: abort: write error on <stdout>
tar: snap_log_backup.tgz: Cannot write: Broken pipe
tar: Error is not recoverable: exiting now

[Fri Jul 22 10:51:39 2022]: Info:'cleanup': /mnt/backup/tmp/snap_log_backup.tgz has been removed.

There's no other error mentioned

Are you a member of CheckMates?

Snapshots are not created after installing HFA Take66 for R81.10