Hello,
Any help and feedback is much appreciated. I will start by saying that I do have a Checkpoint TAC case open for this but I'm looking into any avenue to get this resolved as quickly as possible.
Environment:
Management server is a VmWare virtual machine.
Manages about 11 Security Gateways along with Policies for FW, URL, APP, HTTPS INSPECTION, THREAT PREVENTION.
Upgrade performed from R77.30 to R80.10 in July 2017.
- Export was taken and imported into R80.10 along with all of the tools to verify pre-upgrade.
- Export was Imported into new VM with R80.10 Build 421 Jumbo Hotfix Take 15
- SmartConsole indicated some validation errors for some objects, but did not hinder operations i.e. policy installations, object creations no issue.
Issues begin:
Fast forward to 1st week of Oct 2017. The host ESX server the VM runs on was having maintenance performed during a change window. Normally services are gracefully shutdown on this VM, but not in this instance.
The Day after:
SmartConsole Login: Unable to connect to server
Debugs/Troubleshooting:
[Expert@MGTSERVER:0]# fw debug fwm on
Cannot signal process fwm (9388), make sure the process is running.: No such process
Check status of CPM process:
[Expert@MGTSERVER:0]# $MDS_FWDIR/scripts/cpm_status.sh
Check Point Security Management Server is during initialization
Check to see if server is up and ready to receive connections:
[Expert@MGTSERVER:0]# $MDS_FWDIR/scripts/server_status.sh
Checking server status. Please wait...
13:05:52,326 INFO com.checkpoint.management.cpm.Cpm.enableLocalSic:15 [main] - Enabling local sic. Setting cp.ssl_local.certificate.check=local
Failed to check status, cpm server is probably down
fw debug fwm on cd $MDS_FWDIR/scripts
]0;admin@MGTSERVER:/opt/CPsuite-R80/fw1/scripts [Expert@MGTSERVER:0]# ./cpm_debug.sh -t Login webservices crud Solr -s DEBUG
16:07:52,259 INFO com.checkpoint.management.cpm.Cpm.enableLocalSic:15 [main] - Enabling local sic. Setting cp.ssl_local.certificate.check=local
]0;admin@MGTSERVER:/opt/CPsuite-R80/fw1/scripts [Expert@MGTSERVER:0]# tail $MDS_FWDIR/conf/tdlog.cpm
#TOPIC-DEBUG:login:SEVERITY:DEBUG
log4j.logger.com.checkpoint.management.dleserver.coresvc.internal.LoginSvcImpl=DEBUG
#TOPIC-DEBUG:webservices:SEVERITY:DEBUG
log4j.logger.com.checkpoint.management.web_services.internal=DEBUG
#TOPIC-DEBUG:crud:SEVERITY:DEBUG
log4j.logger.com.checkpoint.management.web_services.dleserver.internal.ObjectCrudRemoteSvcImpl=DEBUG
log4j.logger.com.checkpoint.management.dleserver.coresvc.internal.ObjectCrudSvcImpl=DEBUG
#TOPIC-DEBUG:solr:SEVERITY:DEBUG
log4j.logger.com.checkpoint.management.object_store.fts=DEBUG
]0;admin@MGTSERVER:/opt/CPsuite-R80/fw1/scripts [Expert@MGTSERVER:0]# $MDS_FWDIR/conf/tdlog.cpm ./cpm_debug.sh -t Login webservices crud Solr -s DEBUG INFO
16:09:39,426 INFO com.checkpoint.management.cpm.Cpm.enableLocalSic:15 [main] - Enabling local sic. Setting cp.ssl_local.certificate.check=local
]0;admin@MGTSERVER:/opt/CPsuite-R80/fw1/scripts [Expert@MGTSERVER:0]#
]0;admin@MGTSERVER:/opt/CPsuite-R80/fw1/scripts [Expert@MGTSERVER:0]#
]0;admin@MGTSERVER:/opt/CPsuite-R80/fw1/scripts [Expert@MGTSERVER:0]# ./cpm_debug.sh -t Login webservices crud Solr -s INFO [Expert@MGTSERVER:0]# [23Ptail $MDS_FWDIR/conf/tdlog.cpm
#TOPIC-DEBUG:login:SEVERITY:INFO
log4j.logger.com.checkpoint.management.dleserver.coresvc.internal.LoginSvcImpl=INFO
#TOPIC-DEBUG:webservices:SEVERITY:INFO
log4j.logger.com.checkpoint.management.web_services.internal=INFO
#TOPIC-DEBUG:crud:SEVERITY:INFO
log4j.logger.com.checkpoint.management.web_services.dleserver.internal.ObjectCrudRemoteSvcImpl=INFO
log4j.logger.com.checkpoint.management.dleserver.coresvc.internal.ObjectCrudSvcImpl=INFO
#TOPIC-DEBUG:solr:SEVERITY:INFO
log4j.logger.com.checkpoint.management.object_store.fts=INFO
]0;admin@MGTSERVER:/opt/CPsuite-R80/fw1/scripts [Expert@MGTSERVER:0]#
]0;admin@MGTSERVER:/opt/CPsuite-R80/fw1/scripts [Expert@MGTSERVER:0]#
]0;admin@MGTSERVER:/opt/CPsuite-R80/fw1/scripts [Expert@MGTSERVER:0]#
]0;admin@MGTSERVER:/opt/CPsuite-R80/fw1/scripts [Expert@MGTSERVER:0]#
]0;admin@MGTSERVER:/opt/CPsuite-R80/fw1/scripts [Expert@MGTSERVER:0]# fw debug fwm on TDERROR_ALL_ALL=5
Cannot signal process fwm (3980), make sure the process is running.: No such process
]0;admin@MGTSERVER:/opt/CPsuite-R80/fw1/scripts [Expert@MGTSERVER:0]# fw debug fwm on TDERROR_DBG_OPT=time,host,prog,topic,pid,ti d
Cannot signal process fwm (3980), make sure the process is running.: No such process
]0;admin@MGTSERVER:/opt/CPsuite-R80/fw1/scripts [Expert@MGTSERVER:0]# tail cpinfo
I'm Rebuilding, disaster recovery mindset:
- Shutdown corrupted System
- Deployment New VM with R80.10 Build 421 Take 15.
- Initial configuration wizard Gaia config done with orginal IP and Hostname
- Copied Several Backups from before the Host Maintenance
-Check proceses that failed on the original VM (clean build without imports) proceses are OK
-Login with SmartConsole
- Loads with no issue (albeit its a blank console with nothing but defaults)
- Login to Gaia Web UI and import the backups to the repository
-Perform Restore of most current backup
- Restore succeeds, system reboots
- BACK TO SQUARE 1 (original issue is now replicated on the New VM, by the way if its not obvious, this was not the goal!)
Note at this point: A case from the past which has not yet been resolved seems to be related to what we are experiencing now. We had a case open with CP regarding the validation errors from when we first migrated over to R80.10. They asked for a migrate export so they could determine why. They(i'm not sure what tier of engineering support was working on this) informed us that they were unable to recreate within their environment with the exports provided. It lights the bulb and rings the bell now, but at the time operations were not affected because of it. It was 3 little warnings saying a certain object (which was no longer used in the policy) was causing a validation error.
I have some questions/facts, Panic Mode:
- Right now the gateways are logging locally , what happens if they run out of log space? (crashing gateways, or just overwrites in existing logs)
- I have backups created from the webui->system backups (there are NO snapshots in either the MGTSERVER or VMWARE)
- I have the pre R80 upgrade, migrate export
- I have gateways which have the latest policy installed.....
Worst Case if I'm SOL:
Is there any way to extract either the objects / rulebase from any of the above resources and import into a blank or clean install of the Management Server?
Thanks,
Syed