Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Stephan_Lache
Participant
Jump to solution

Snapshots are not created after installing HFA Take66 for R81.10

Dear Check Mates,

i have an issue with creating snapshots on my security gateways (6600 Appliances)

After the HFA Take66 was installed successfully, the snapshot creation process runs

normal, but after 100% progress no snapshot is visible in the Gaia Portal>Snapshot Management.

Any ideas on this?

 

Thanks in advance.

Stephan

I added the log file.

0 Kudos
1 Solution

Accepted Solutions
ptuttle_2
Contributor

Hello;

We ran into this as well and the fix was to apply a new file provided by TAC   "au.cpbak"

These are the steps we had to do.

Change the au.cpbak in this folder : /var/CPsnapshot/schemes/ with the one provided in the SFTP: au.cpbak
rm /tmp/.lock.backup_restore
umount /mnt/backup
rm -rf /mnt/backup
if /var/log/BKPAutoUpdater exist run:{}
unlink /var/log/BKPAutoUpdater
rm -rf /var/log/BKPAutoUpdater

View solution in original post

0 Kudos
26 Replies
Stephan_Lache
Participant

After 2 reboots of the gateways the snapshot creation works like expected.

Paul_Gademsky
Employee
Employee

@Stephan_Lache 

I encountered the same issue after putting hfa66 on a SmartEvent server and one member of a cluster (the SMS and other cluster member created  a snapshot with a problem.

In talking to TAC, it appears that the devices may have been updating their IPS files and this caused the issue. When I tried it again after ~24 hours, it was successful with no other changes. Not saying it's the real answer, but at least it's a reason they have seen previously.

the_rock
Legend
Legend

Never had that issue on either sms or fw.

0 Kudos
MarNeu
Participant

I discovered this issue already in Take61 ongoing.

 

After a long not very pleasent ticket with TAC (SR#6-0003273294) the found the solution by replacing the file /var/CPsnapshot/schemes/au.cpbak with a new one provided.

This solved the issue.

But since CheckPoint software management process is nothing but sad this is still not solved in Take 66 GA and not even on the upcoming resolved issues list for the next take.

FYI: snapshot from CLI (clish) did always work for me which was a good enough workaround.

clish > add snapshot-onetime name "NAME" description "TEST"

 

If someone from CheckPoint reads this, please address it internally.

 

Thanks a lot

_Val_
Admin
Admin

@MarNeu let me see what I can do here... Please allow some delay with the internal inquiry. 

0 Kudos
Paul_Gademsky
Employee
Employee

You can add SR 6-0003349393 to the list (it's in pending status)

Gregory_Azratz
Employee
Employee

Hi @MarNeu 

we are checking this issue.

Thanks,

Gregory

 

(1)
Gregory_Azratz
Employee
Employee

We are working on integrating the fix to future R81.10 jumbo.
the suggested WA with replacing the au.cpbak is correct, as the provided file by TAC contains the fix to the snapshot issue.

Thanks,
Gregory

0 Kudos
Dov_Fraivert
Employee
Employee

The fix will be included in the next Jumbo of R81.10.

ptuttle_2
Contributor

Thanks for the update.

0 Kudos
Stephan_Lache
Participant

Hi MarNeu,

thanks for the information,that it’s working with clish!

best

Stephan

0 Kudos
ptuttle_2
Contributor

Hello;

We ran into this as well and the fix was to apply a new file provided by TAC   "au.cpbak"

These are the steps we had to do.

Change the au.cpbak in this folder : /var/CPsnapshot/schemes/ with the one provided in the SFTP: au.cpbak
rm /tmp/.lock.backup_restore
umount /mnt/backup
rm -rf /mnt/backup
if /var/log/BKPAutoUpdater exist run:{}
unlink /var/log/BKPAutoUpdater
rm -rf /var/log/BKPAutoUpdater

0 Kudos
ptuttle_2
Contributor

Well that was short lived.  Just finished patching 8 gateways to Jumbo Take 66 and back to snapshots disappearing after 99% complete.

Looks like have to apply the fix again !!!

0 Kudos
the_rock
Legend
Legend

Im just testing this on 2 lab fw's with take 66 and will update the results.

0 Kudos
the_rock
Legend
Legend

Yea...sadly, same issue, so hopefully it gets fixed.

0 Kudos
ptuttle_2
Contributor

Applied the fix and can take a snapshot now  (or at least until the next jumbo)  This problem also effects taking backups.

Does not seem to have the problem on the manager's though.  Only the gateway's.

-pat

0 Kudos
Daniel_
Advisor

So everyone is using a GA take without a useful snapshot function since (more then) one week? The only "solution" is to open a TAC?

I don't have a problem with a bug in a new software, but I have a problem if there is a solution for more then one week and every admin runs in this issue. Why you don't release a new GA version with this bug fixed? Running a GW without a snapshotfuncion is not an option.

MarNeu
Participant

The solution is there since 30th of June and the TAC ticket was closed with the following comment:

The change of file (au.cpbak) will be integrated in next jumbo HF, please let me know if you have any queries or are we good to close the SR?

Really unhappy with the situation...

_Val_
Admin
Admin

What is the issue here, exactly? TAC is not managing Jumbo production, R&D is responsible for that. You do have a fix/workaround from TAC, to buy you time before the fix will be added to Jumbo, or do I miss something?

0 Kudos
_Val_
Admin
Admin

@Daniel_ Jumbp fix is called a Jumbo because it consolidates quite a few fixes. There is an extensive QA cycle to make sure that the package does not cause any degradation. It does take more than a week to run it and to prepare the release.

If you do need an immediate fix, go to TAC meanwhile. 

This approach is not something new, we work in this manner for about two decades...

0 Kudos
MarNeu
Participant

I'm sorry Val but if the QA cicle would be that intense that it takes more then 40 days for a small but important fix, I would expect that the snapshot function would be tested before releasing Take 61 or Take 66 GA...

For me (as well as others) that is definitly a service degradation.

_Val_
Admin
Admin

I certainly understand your point of view. What I am trying to say, things like that are the exact reason why we should not rush into releasing new JHAs too fast. 

(1)
abihsot__
Advisor

I agree, QA is just not sufficient. Whenever you have any problem, TAC pushes you to have the latest JHF, however I think that many customers becoming reluctant to go for latest one because of such basic bugs. And then the chicken-egg problem, relatively small number of deployments use latest JHF, which in turn does not highlight such bugs.

the_rock
Legend
Legend

I could not agree more, you said it exactly right. It would be so much better to take time in releasing these jumbo hotfixes, if these sort of bugs/problems wont exist.

dstradins
Explorer

I think what would be more useful here, given its a crucial part of the backup/recovery process is if the au.cpbak file could be provided in a download somewhere within a SK doc - we shouldnt have to contact TAC to obtain that file in order to perform something as important as a snapshot backup....

Just following up on _Val_'s comments....  I know you have said there is a testing process, but surely this testing process should test snapshots, backups and config backups... 

I too, like a lot of other admins, are resorting to deleting the lock file(s), rebooting then having a 50/50 chance of a snapshot working... if not, repeat... after 2-3 times you get a snapshot...!

The above is relatively 'OK' for those with HA gateway deployments but when on a single gateway those reboots are not appreciated...

I am finding it harder to justify 'choosing Checkpoint' - we do have other firewalls - and whilst they all do have issues in one form or another I've ALWAYS been able to back them up and restore them....!

Come on Checkpoint...  you need to really do better than this.

0 Kudos
jberg712
Contributor

We experienced this as well on our SmartEvent appliance.

I did some testing and it's strange because I can attempt the snapshot 2-3 times (which each time you have to reboot because you can't start another one because the appliance thinks it's still attempting to do a backup/snapshot), but then on the 3rd-4th time, it actually completes.  

The snapshot logs end with these lines on the failures:

pigz: write error code 28
pigz: abort: write error on <stdout>
tar: snap_log_backup.tgz: Wrote only 8192 of 10240 bytes
tar: Error is not recoverable: exiting now

[Fri Jul 22 09:56:14 2022]: Info:'cleanup': /mnt/backup/tmp/snap_log_backup.tgz has been removed.

OR

pigz: write error code 28
pigz: abort: write error on <stdout>
tar: snap_log_backup.tgz: Cannot write: Broken pipe
tar: Error is not recoverable: exiting now

[Fri Jul 22 10:51:39 2022]: Info:'cleanup': /mnt/backup/tmp/snap_log_backup.tgz has been removed.

There's no other error mentioned

0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events