Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Syed_Ahmed
Participant

R80.10 SmartConsole: Unable to Connect to Server

Hello,

Any help and feedback is much appreciated. I will start by saying that I do have a Checkpoint TAC case open for this but I'm looking into any avenue to get this resolved as quickly as possible. 

Environment:

Management server is a VmWare virtual machine.

Manages about 11 Security Gateways along with Policies for FW, URL, APP, HTTPS INSPECTION, THREAT PREVENTION.

Upgrade performed from R77.30 to R80.10 in July 2017.

   - Export was taken and imported into R80.10 along with all of the tools to verify pre-upgrade.

    - Export was Imported into new VM with R80.10 Build 421 Jumbo Hotfix Take 15

   - SmartConsole indicated some validation errors for some objects, but did not hinder operations i.e. policy installations, object creations no issue.

Issues begin:

Fast forward to 1st week of Oct 2017. The host ESX server the VM runs on was having maintenance performed during a change window. Normally services are gracefully shutdown on this VM, but not in this instance.

The Day after:

SmartConsole Login: Unable to connect to server

Debugs/Troubleshooting:

[Expert@MGTSERVER:0]# fw debug fwm on
 Cannot signal process fwm (9388), make sure the process is running.: No such process

Check status of CPM process:
[Expert@MGTSERVER:0]# $MDS_FWDIR/scripts/cpm_status.sh
Check Point Security Management Server is during initialization

Check to see if server is up and ready to receive connections:
[Expert@MGTSERVER:0]# $MDS_FWDIR/scripts/server_status.sh
Checking server status. Please wait...
13:05:52,326 INFO com.checkpoint.management.cpm.Cpm.enableLocalSic:15 [main] - Enabling local sic. Setting cp.ssl_local.certificate.check=local
Failed to check status, cpm server is probably down 

fw debug fwm on                                                        cd $MDS_FWDIR/scripts
]0;admin@MGTSERVER:/opt/CPsuite-R80/fw1/scripts [Expert@MGTSERVER:0]# ./cpm_debug.sh -t Login webservices crud Solr -s DEBUG
16:07:52,259 INFO com.checkpoint.management.cpm.Cpm.enableLocalSic:15 [main] - Enabling local sic. Setting cp.ssl_local.certificate.check=local
]0;admin@MGTSERVER:/opt/CPsuite-R80/fw1/scripts [Expert@MGTSERVER:0]# tail $MDS_FWDIR/conf/tdlog.cpm

#TOPIC-DEBUG:login:SEVERITY:DEBUG
log4j.logger.com.checkpoint.management.dleserver.coresvc.internal.LoginSvcImpl=DEBUG
#TOPIC-DEBUG:webservices:SEVERITY:DEBUG
log4j.logger.com.checkpoint.management.web_services.internal=DEBUG
#TOPIC-DEBUG:crud:SEVERITY:DEBUG
log4j.logger.com.checkpoint.management.web_services.dleserver.internal.ObjectCrudRemoteSvcImpl=DEBUG
log4j.logger.com.checkpoint.management.dleserver.coresvc.internal.ObjectCrudSvcImpl=DEBUG
#TOPIC-DEBUG:solr:SEVERITY:DEBUG
log4j.logger.com.checkpoint.management.object_store.fts=DEBUG
]0;admin@MGTSERVER:/opt/CPsuite-R80/fw1/scripts [Expert@MGTSERVER:0]# $MDS_FWDIR/conf/tdlog.cpm                                                                                            ./cpm_debug.sh -t Login webservices crud Solr -s DEBUG               INFO
16:09:39,426 INFO com.checkpoint.management.cpm.Cpm.enableLocalSic:15 [main] - Enabling local sic. Setting cp.ssl_local.certificate.check=local
]0;admin@MGTSERVER:/opt/CPsuite-R80/fw1/scripts [Expert@MGTSERVER:0]#
]0;admin@MGTSERVER:/opt/CPsuite-R80/fw1/scripts [Expert@MGTSERVER:0]#
]0;admin@MGTSERVER:/opt/CPsuite-R80/fw1/scripts [Expert@MGTSERVER:0]# ./cpm_debug.sh -t Login webservices crud Solr -s INFO [Expert@MGTSERVER:0]#  [23Ptail $MDS_FWDIR/conf/tdlog.cpm

#TOPIC-DEBUG:login:SEVERITY:INFO
log4j.logger.com.checkpoint.management.dleserver.coresvc.internal.LoginSvcImpl=INFO
#TOPIC-DEBUG:webservices:SEVERITY:INFO
log4j.logger.com.checkpoint.management.web_services.internal=INFO
#TOPIC-DEBUG:crud:SEVERITY:INFO
log4j.logger.com.checkpoint.management.web_services.dleserver.internal.ObjectCrudRemoteSvcImpl=INFO
log4j.logger.com.checkpoint.management.dleserver.coresvc.internal.ObjectCrudSvcImpl=INFO
#TOPIC-DEBUG:solr:SEVERITY:INFO
log4j.logger.com.checkpoint.management.object_store.fts=INFO
]0;admin@MGTSERVER:/opt/CPsuite-R80/fw1/scripts [Expert@MGTSERVER:0]#
]0;admin@MGTSERVER:/opt/CPsuite-R80/fw1/scripts [Expert@MGTSERVER:0]#
]0;admin@MGTSERVER:/opt/CPsuite-R80/fw1/scripts [Expert@MGTSERVER:0]#
]0;admin@MGTSERVER:/opt/CPsuite-R80/fw1/scripts [Expert@MGTSERVER:0]#
]0;admin@MGTSERVER:/opt/CPsuite-R80/fw1/scripts [Expert@MGTSERVER:0]# fw debug fwm on TDERROR_ALL_ALL=5
Cannot signal process fwm (3980), make sure the process is running.: No such process
]0;admin@MGTSERVER:/opt/CPsuite-R80/fw1/scripts [Expert@MGTSERVER:0]# fw debug fwm on TDERROR_DBG_OPT=time,host,prog,topic,pid,ti d
Cannot signal process fwm (3980), make sure the process is running.: No such process
]0;admin@MGTSERVER:/opt/CPsuite-R80/fw1/scripts [Expert@MGTSERVER:0]# tail                          cpinfo

I'm Rebuilding, disaster recovery mindset:

- Shutdown corrupted System

- Deployment New VM with R80.10 Build 421 Take 15. 

- Initial configuration wizard Gaia config done with orginal IP and Hostname

- Copied Several Backups from before the Host Maintenance 

-Check proceses that failed on the original VM (clean build without imports) proceses are OK

-Login with SmartConsole 

   - Loads with no issue (albeit its a blank console with nothing but defaults)

- Login to Gaia Web UI and import the backups to the repository

-Perform Restore of most current backup

- Restore succeeds, system reboots

- BACK TO SQUARE 1 (original issue is now replicated on the New VM, by the way if its not obvious, this was not the goal!)

Note at this point: A case from the past which has not yet been resolved seems to be related to what we are experiencing now. We had a case open with CP regarding the validation errors from when we first migrated over to R80.10. They asked for a migrate export so they could determine why. They(i'm not sure what tier of engineering support was working on this) informed us that they were unable to recreate within their environment  with the exports provided. It lights the bulb and rings the bell now, but at the time operations were not affected because of it. It was 3 little warnings saying a certain object (which was no longer used in the policy) was causing a validation error.

I have some questions/facts, Panic Mode:

-   Right now the gateways are logging locally , what happens if they run out of log space? (crashing gateways, or just overwrites in existing logs)

- I have backups created from the webui->system backups (there are NO snapshots in either the MGTSERVER or VMWARE)

- I  have the pre R80 upgrade, migrate export 

- I  have gateways which have the latest policy installed.....

Worst Case if I'm SOL: 

Is there any way to extract either the objects / rulebase from any of the above resources and import into a blank or clean install of the Management Server?

 

Thanks,

Syed

21 Replies
Syed_Ahmed
Participant

Fyi something noteworthy would be that /var/log is mapped to a LUN on the SAN. Its never been a problem, but /var/log is a 3TB volume where all of our gateways centrally log to. It has been attached to the new vm after the restore process was completed. I just wanted to point out that it is a piece in the sequence of reconfiguration when the new VM was brought online.

Timothy_Hall
Champion
Champion

Please read this entire post and its warnings before taking any action.

Hmm I've seen something like this before, although not specifically in R80.10.  The fwm and cpm processes not getting off the ground is a major clue.  In R77.30 and earlier, on several occasions over the years I observed fwm bombing out on initialization because it could not find a valid certificate for the SMS and therefore itself.  Because it did not have a valid certificate to present, it would never allow GUI connections and would never start.  The debugs you provided seem to point towards a SMS SIC issue which tracks with what I've seen before.

The solution was to either generate a new SMS certificate or to simply reinitialize the entire ICA completely from cpconfig which would generate a new SMS certificate.  HOWEVER while it would allow access back into the SmartDashboard, it would immediately break SIC with all managed gateways, and could also cause a delayed or immediate failure of any Intranet site to site VPNs using certificates (due to CRL retrieval problems) if configured between the managed gateways.   SIC can be reset without causing an outage on the gateways with the steps in sk86521, and you can check the current state of the SMS certificate with the steps in sk62873.  Did the clock get screwed up on the SMS?  Any chance it thinks the SMS certificate is expired?

I'd strongly suggest proceeding under the guidance of Check Point support with these topics in mind before doing anything, messing around with the ICA and certificates (which is where I think your problem lies) can rapidly put you in a production-impacting catch-22 that will be very difficult to escape from.  Don't ask me how I know all this...

--
My book "Max Power: Check Point Firewall Performance Optimization"
now available via http://maxpowerfirewalls.com.

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
0 Kudos
Syed_Ahmed
Participant

Thank you for the insight Tim. I have never encountered anything of this nature. I will get with support regarding this information.

After contacting support I could:

-Bring up the VM in an isolated environment (which cannot communicate with any gateways in the production environment) with a win7 vm  and smartconsole installed.

- Generate a new SMS certificate or to simply reinitialize the entire ICA completely from cpconfig as suggested within this isolated environment

This would allow me to at least verify that Management server and its services are online after the SMS cerrtificate is generated and validate that the objects/policy are intact. 

Your thoughts regarding recreating in an isolated environment ?

Syed

Syed_Ahmed
Participant

TAC has been involved, its been a long day!

Notes from the case thus far:

The test VM loaded with system backup failed due to 2 cp_mgmt certificate, which is ongoing issue and has been addressed by R&D on TASK 73819
We proceed VM snapshot and tried to reset the SIC on test VM based on sk114236 

However, this step did not improve the situation anyhow. It no longer complain on SICname, but with another database expection "No ObjectStoreSession found" @Transcational

At this moment, the CPM is still down. 

Syed 8:45 P.M. 10/8/2017

0 Kudos
Syed_Ahmed
Participant


Still waiting on R&D to get back, but internal efforts have led to some progress... (2 steps forward 1 step back)

Quoting our update to Checkpoint support from a Valued Partner that has been working with us while we wait for Checkpoint Support to get back:

"This evening we performed the following restoration activities and report several observations here.


An isolated vmware test environment was established. A new R80.10 management server was installed and the R77.30 upgrade_export files from July 2017 were imported. This restored the environment to a known good state. Access to the management server was confirmed via ssh, webui and Smart Dashboard. We then removed two NAT rules associated with Dummy Management Server (a secondary management object with an external IP) and deleted that object.


The R80.10 management environment appeared to function fine in the isolated environment. We then moved the management server from test to prod. Once we did that access from the Smart Dashboard failed with a certificate revoked error. The error was observed on the Smart Dashboard login screen. cpca_client lscert -stat shows no valid management certificates.


Moving the R80.10 management server back to the test restored access from Smart Dashboard. The management server is now in test and functional for further troubleshooting (isolated from all gateways)."

Syed 1:15 A.M. 10/9/2017

Timothy_Hall
Champion
Champion

Thanks for the update.  Sounds like there are issues with the ICA and/or SMS certificate as I suspected.  It tends to be rather difficult to simply make a few edits to fix certificate issues like this, as certificates by their very nature are highly resistant to any kind of "tampering".  There is probably no way to avoid either regenerating a new certificate for the SMS itself, or even brutally resetting the ICA if you want to keep your current config and not restore a backup.  Either action will break SIC with all managed gateways.

Hopefully I'm wrong, please let us know how it goes.

--
My book "Max Power: Check Point Firewall Performance Optimization"
now available via http://maxpowerfirewalls.com.

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
Syed_Ahmed
Participant

I will reply with an update, we did reach a resolution.

Sent from VMware Boxer

0 Kudos
Syed_Ahmed
Participant

Thanks for the direction TIm. I pointed this out early in the call with support and referenced them to this posting. I ended up spinning a new VM in an ISOLATED environment. I also had another vm created that would have smartconsole, putty, WinSCP and tools that I deemed necessary such as  iso's for the clean install jumbo hotfix that we were on and our latest backup files. This made it easy for me to transfer back and forth within the isolated environment.

Long story short, it was SIC related. I will provide a more detailed briefing to follow. The bottom line is, we carried over a dummy object that was used for one of our externally managed gateways. This dummy object was created and an entry was created via GUIdbEdit to match this dummy object SIC name to the actual MGT Server SIC Name. The R&D team took the backups and exports we had and were able to replicate the issue in their environment. They ended up performing a series of cleanups by modifying the postgres DB to remove these entries for a duplicate SIC name for "cp_mgmt". Once this was cleaned up and services were verified to be running:

netstat -anp | grep fwm (should show established/listening, and not closed)

$FWDIR/scripts/cpm_status.sh (should show up and ready, previously stuck at initializing)

$FWDIR/scripts/server_status.sh(should also show server is ready for connections)

$FWDIR/log/cpm.elg was tailed during initialization to review for errors which the team then used to determine the underlying issue. I will post more detailed information soon.

Syed 10/10/17 2:30 P.M.

Timothy_Hall
Champion
Champion

Great, thanks for the update.  90% of effective troubleshooting is knowing the right place to look, so I'm glad I was able to assist with that part at least.  🙂

--
My book "Max Power: Check Point Firewall Performance Optimization"
now available via http://maxpowerfirewalls.com.

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
0 Kudos
Vladimir
Champion
Champion

Syed, this post has just saved my bacon or, at the very least, bunch of time:)

I've run the Take_56 update on my lab SMS and lost communication with it in spite of process reportedly completing successfully, according to CPUSE.

Interestingly enough,  $FWDIR/scripts/server_status.sh seem to prod the server into responding with no further troubleshooting required. It took a while to return:

[Expert@SMS8010:0]# $FWDIR/scripts/server_status.sh
Checking server status. Please wait...
14:58:10,313 INFO com.checkpoint.management.cpm.Cpm.enableLocalSic:15 [main] - Enabling local sic. Setting cp.ssl_local.certificate.check=local
Server is up and ready to receive connections
[Expert@SMS8010:0]#

But after that, no issues connecting from SmartConsole.

Thank you!

Syed_Ahmed
Participant

I'm glad the efforts that we took were able to assist you in your troubleshooting. Thanks for the feedback!

0 Kudos
Marvin_Castillo
Participant

I am also having the same issue on my test environment and I have run out of SK to make this work. 

0 Kudos
Masood_ahmad
Participant

Hello

Can someone please share any further feedback regarding this issue. I have my lab setup with SMS server and 2 Gateway cluster.I was prompted to for updates (R80.10 Jumbo Hotfix Accumulator General Availability Take 56). The reason i was tempted for update was my ssh connection to gateway was allowed and can see accepted in logs but was not able to establish ssh / Web UI.

Reading some of the Check Point articles, I was tempted to do download and install update on SMS and 2 member gateway clusters. After the update, I am getting message "Unable to connect to server" on SmartConsole.

Output of $MDS_FWDIR/scripts/server_status.sh shows "failed to check status, cpm server is probably down"

output of $MDS_FWDIR/scripts/cpm_status.sh shows Check Point Security Management Server is not running.

From the previous comments, i understand it SIC related issue. So i tried to run cpconfig on SMS with option 3 for GUI clients, but getting message "failed to connect to database"

I tried cpconfig on gateway cluster member with option 5 (re initialize communication) along with cpstop and cpstart but still cannot connect to smart console.

I also checked Unable to Connect to Server and noticed comment "Allow traffic on this port 19009 - for all clients and management servers.Though i am not sure how to check if 19009  is already allowed as i have lost my smartconsole

connection..

I cannot check GUI Clients from Web UI as i get message "Management server cannot be reached. Check you management configuration and try again.

0 Kudos
Masood_ahmad
Participant

Here is some more information to see if someone can assist. i have sent Support team email to see if they can assist.

Tried re setting SIC from SMS but no luck

xpert@A-SMS:0]# fwm sic_reset
***************** Warning: ****************
This operation will reset the Secure Internal Communication (SIC).
The internal Certificate Authority will be destroyed and ALL remote Check Point                                                                                         Components,
including VPN and Endpoint clients, will not be able to communicate.

In case of Endpoint & VPN clients, this action is not REVERSIBLE which means tha                                                                                        t clients
will lose connection with the Server and the only way to re-establish it can be                                                                                         done by
re-issuing all certificates (for VPN) or by the re-connect tool for Endpoint cli                                                                                        ents.

Server communication can be re-established if the following operations are imple                                                                                        mented:
1. Re-initialize the Internal Certificate Authority (use cpconfig).
2. Restart Check Point Services (cpstart, cpridstart).
3. Reset SIC on each Station that is managed by this Security Management Server.
4. Re-establish Trust with each Station that is managed by
   this Security Management Server.
*******************************************
This operation will stop all Check Point Services (cpstop)
Are you sure you want to reset? (y/n) [n] ? y

*** Checking IKE Certificates ***
 Failed to connect to NGM server

=================================================================

A-SMS> cpconfig
This program will let you re-configure
your Check Point Security Management Server configuration.


Configuration Options:
----------------------
(1)  Licenses and contracts
(2)  Administrator
(3)  GUI Clients
(4)  SNMP Extension
(5)  Random Pool
(6)  Certificate Authority
(7)  Certificate's Fingerprint
(8)  Automatic start of Check Point Products

(9) Exit

Enter your choice (1-9) :3

Configuring GUI Clients...
==========================
GUI Clients are trusted hosts from which
Administrators are allowed to log on to this Security Management Server.
Failed to connect to database

===================================================

A-SMS> show installer packages
**  ************************************************************************* **
**              Connection error. Packages list might be incomplete           **
**  ************************************************************************* **
**  ************************************************************************* **
**                                 Hotfixes                                   **
**  ************************************************************************* **
Display name                                      Status
R80.10 Jumbo Hotfix Accumulator General
    Availability (Take 56)                        Installed
HOTFIX_R80_10                                     Installed (Legacy)
**  ************************************************************************* **
**                                   HFAs                                     **
**  ************************************************************************* **
Display name                                      Status
Check Point CPinfo build 182 for R80 and
    R80.10                                        Installed
R80.10 SmartConsole Build 024                     Installed

----------------------------------

Further more, from SMS,if i try to ping A gateway cluster member, i get reply destination host unreachable. if i try to ping A gateway 02,  i get 100 % packet loss. However from A Gateway 01 and A Gateway 02, i can ping to SMS

From WebUI, i can bring up SMS 10.1.1.101. When i try to bring up A gateway 01, 10.1.1.1, i get message , site cannot be reached. if i try getting A gateway 02, 10.1.1.2, i see  A-GW-01 (though i would normally say A-GW-02

I have issued fw unloadlocal to unistall policy to see if SIC get established.

Masood_ahmad
Participant

Can someone please help?#

Vladimir
Champion
Champion

Masood,

Forget about gateways, your goal is to be able to get the SMS going.

If you are getting this:

$MDS_FWDIR/scripts/cpm_status.sh shows Check Point Security Management Server is not running.

It is indicative of exactly what it says: i.e. management server is not running, so there is nothing the Secure Internal Communication can help you with until it does.

If this is a lab environment and you have not yet invested too much time in configuring it, I'd suggest re-installing management server from scratch using ISO on sufficiently large virtual hard drive, so that there will be ample space to download and install HFAs, as well as allow for logging, backups and, possibly, snapshots.

During installation, you'll be prompted to specify SIC, write it down.

Once it is reinstalled, reset SIC on Gateways only using CLI.

 

Verify that your SmartConsole is the same or compatible version with the HFA of the SMS.

Connect to it, you should have no issues doing that at this point, you'll be prompted with new fingerprint, accept it.

Define new gateway objects specifying Open Server, Gaia and R80.10.

Establish SIC with them here:

Click "OK" to get out of "Trusted Communication" window and proceed to configure Networking executing "Get Interfaces with Topology".

Continue with cluster configuration.

Masood_ahmad
Participant

Hi Vladimir ,

Thanks for your reply. Yes this is lab environment. I had bit of discussion with someone about  this issue and I was asked to rebuild SMS. I will see if i can get cop of my SMS (which i guess i copied before installing policies etc). My understanding is if get original copy of SMS, i should not change name of my SMS otherwise i might have issue with SIC communication. My understanding is once SIC is established, it communicates with Gateway on the basis of name not on the basis of IP Address.

0 Kudos
Vladimir
Champion
Champion

No,IP is important. Look-up the SK describing implication of changing it or the hostname on SMS.

How to renew SIC after changing IP Address of Security Management Server 

How to change the IP Address of a Security Management 

Also,what are you referring to when you are saying "copy of my SMS"?

Are you copying an existing VM and trying to make it work in the lab environment? 

Masood_ahmad
Participant

Hi,

I am hopping my SMS VM which I cloned before updating (R80.10 Jumbo Hotfix Accumulator General Availability Take 56) should work.

0 Kudos
Vladimir
Champion
Champion

It should, but consider this:

If you are changing its IP, your license will stop working.

If you were using Centralized Licensing, and all of your gateway licenses were issued to the old IP of your SMS, those too will become unusable.

You cannot re-issue the licenses in the User Center to new IPs and maintain upgraded production environment running if you rely on any of the subscription blades.

So, if your goal is to fully emulate your production environment:

1. You'll have to obtain trial licenses for the lab implementation. You can request those from CP. I believe they will be able to provide you with at least one month or even more.

2. Change IPs as per SKs above.

Or: Place your lab environment behind static NAT using any number of solutions. I personally prefer VyOS, but you can use small Linux distros fro same purposes.

Alternatively, try adding second vNIC to your SMS and assigning it IP from different network. Leave the primary vNIC associated with the IP address to which license was issued on an isolated (Dead-End) port group (if using ESXi), or its analogues in Hyper-V. 

femi
Explorer

If the prerequiste API is not running, the CPM service will not start. It will be in "initialization state"

Try the following steps:  (Ver: R80.30)

Step 1

[Expert@cpfw-mds:0]# api status

API Settings:
---------------------
Accessibility: Require ip 127.0.0.1
Automatic Start: Enabled

Processes:

Name State PID More Information
-------------------------------------------------
API Stopped
CPM Starting 7434            Check Point Security Management Server is during initialization
FWM Started 10759
APACHE Started 4571

Port Details:
-------------------
JETTY Internal Port: 50276
APACHE Gaia Port: 443
Apache port retrieved from: httpd-ssl.conf


--------------------------------------------
Overall API Status: The API Server Is Not Running!
--------------------------------------------

Notes:
------------
To collect troubleshooting data, please run 'api status -s <comment>'

Step 2

[Expert@cpfw-mds:0]# mdsstop

Step 3

[Expert@cpfw-mds:0]# api start
2019-Nov-05 10:09:10 - Starting API...
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
2019-Nov-05 10:11:38 - API started successfully.

Step 4

[Expert@cpfw-mds:0]# mdsstart

Optional

[Expert@cpfw-mds:0]# api status

API Settings:
---------------------
Accessibility: Require ip 127.0.0.1
Automatic Start: Enabled

Processes:

Name State PID More Information
-------------------------------------------------
API Started 73002
CPM Started 74142                      Check Point Security Management Server is running and ready
APACHE Started 4571

Port Details:
-------------------
JETTY Internal Port: 50277
APACHE Gaia Port: 443


--------------------------------------------
Overall API Status: Started
--------------------------------------------

API readiness test SUCCESSFUL. The server is up and ready to receive connections

Notes:
------------
To collect troubleshooting data, please run 'api status -s <comment>'

0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events