Solved: Re: R81 SmartConsole Access issue after install of...

Jerry · ‎2023-02-10

Build 564 (new SC was tried), old 563 was attempted as well, same stuff, same error.

my Customer have tried everything, from restarting the CPD to restarting FWM and finally the whole device.

no luck, you have full firewall running, fwd make logs, cod works and your security is in place however, you've got no chance to open your SmartConsole with above error and you have no technical capabilities to bring the FWM alive again. Any clues from the R&D or yourself folks?

Below details of the environment (as usual).:

[Expert@cp:0]# cpstat os

Product Name: SVN Foundation
SVN Foundation Version String: R81
SVN Foundation Build Number: 995000045
SVN Foundation Status: OK
OS Name: Gaia
OS Major Version: 3
OS Minor Version: 10
OS Build Number: -
OS SP Major: -
OS SP Minor: -
OS Version Level:

[Expert@cp:0]# cpinfo -y all

This is Check Point CPinfo Build 914000231 for GAIA
[MGMT]
HOTFIX_R81_JUMBO_HF_MAIN Take: 81
[IDA]
No hotfixes..
[CPFC]
HOTFIX_TEX_ENGINE_R81_AUTOUPDATE
[FW1]
HOTFIX_PUBLIC_CLOUD_CA_BUNDLE_AUTOUPDATE
HOTFIX_NGM_DOCTOR_AUTOUPDATE
HOTFIX_WEBCONSOLE_AUTOUPDATE
HOTFIX_R81_JUMBO_HF_MAIN Take: 81
HOTFIX_R81_MTA Take: 4
HOTFIX_GOT_TPCONF_AUTOUPDATE
HOTFIX_GOT_TPCONF_MGMT_AUTOUPDATE
HOTFIX_TEX_ENGINE_R81_AUTOUPDATE

FW1 build number:
This is Check Point Security Management Server R81 - Build 017
This is Check Point's software version R81 - Build 037
kernel: R81 - Build 037
[SecurePlatform]
HOTFIX_R81_JUMBO_HF_MAIN Take: 81
HOTFIX_ENDER_V17_AUTOUPDATE

***

Any ideas on how to bring the SmartConsole back to live?

Jerry

Tal_Paz-Fridman · ‎2023-02-12

According to Appliance Support Life Cycle Timeline 13500 does not support R81 and has already reached end of support:

https://www.checkpoint.com/support-services/support-life-cycle-policy/#appliances-support

View solution in original post

Chris_Atkinson · ‎2023-02-12

Please refer:

https://community.checkpoint.com/t5/Partner-News-and-Updates/Remove-R81-support-for-2012-appliances/...

CCSM R77/R80/ELITE

View solution in original post

_Val_ · ‎2023-02-10

What do you see with "cpwd_admin list" and "$FWDIR/scripts/cpm_status.sh"?

Jerry · ‎2023-02-10

[Expert@cp:0]# $FWDIR/scripts/cpm_status.sh
Check Point Security Management Server is during initialization

Jerry

Jerry · ‎2023-02-10

[Expert@cp:0]# cpwd_admin list
APP PID STAT #START START_TIME MON COMMAND
CPVIEWD 27052 E 1 [11:58:29] 10/2/2023 N cpviewd
CPVIEWS 27069 E 1 [11:58:30] 10/2/2023 N cpview_services
CVIEWAPIS 27074 E 1 [11:58:30] 10/2/2023 N cpview_api_service
SXL_STATD 27077 E 1 [11:58:31] 10/2/2023 N sxl_statd
CPD 27092 E 1 [11:58:32] 10/2/2023 Y cpd
MPDAEMON 27122 E 1 [11:58:33] 10/2/2023 N mpdaemon /opt/CPshrd-R81/log/mpdaemon.elg /opt/CPshrd-R81/conf/mpdaemon.conf
TP_CONF_SERVICE 27159 E 1 [11:58:34] 10/2/2023 N tp_conf_service --conf=tp_conf.json --log=error
CI_CLEANUP 27233 E 1 [11:58:40] 10/2/2023 N avi_del_tmp_files
CIHS 27235 E 1 [11:58:40] 10/2/2023 N ci_http_server -j -f /opt/CPsuite-R81/fw1/conf/cihs.conf
FWD 27245 E 1 [11:58:40] 10/2/2023 N fwd
FWM 27254 E 1 [11:58:40] 10/2/2023 N fwm
STPR 27285 E 1 [11:58:40] 10/2/2023 N status_proxy
SPIKE_DETECTIVE 27295 E 1 [11:58:41] 10/2/2023 N spike_detective
CLOUDGUARD 27324 E 1 [11:58:41] 10/2/2023 N vsec_controller_start
RAD 27938 E 1 [11:59:02] 10/2/2023 N rad
CPM 28002 E 1 [11:59:05] 10/2/2023 N /opt/CPsuite-R81/fw1/scripts/cpm.sh -s
LPD 28443 E 1 [11:59:11] 10/2/2023 N lpd
WSDNSD 30680 E 1 [12:00:52] 10/2/2023 Y wsdnsd
RTMD 8137 E 1 [12:01:35] 10/2/2023 N rtmd
SOLR 0 T 0 [12:06:08] 10/2/2023 N java_solr
RFL 8298 E 1 [12:01:37] 10/2/2023 N LogCore
SMARTVIEW 8312 E 1 [12:01:37] 10/2/2023 N SmartView
INDEXER 8506 E 1 [12:01:38] 10/2/2023 N /opt/CPrt-R81/log_indexer/log_indexer
SMARTLOG_SERVER 8610 E 1 [12:01:41] 10/2/2023 N /opt/CPSmartLog-R81/smartlog_server
REPMAN 9004 E 1 [12:01:52] 10/2/2023 N java_repository_manager
DASERVICE 9029 E 1 [12:01:53] 10/2/2023 N DAService_script
AUTOUPDATER 9118 E 1 [12:01:55] 10/2/2023 N AutoUpdaterService.sh

Jerry

Jerry · ‎2023-02-10

[Expert@cp:0]# $FWDIR/scripts/ngm_start.sh
/bin/sh: /opt/CPsuite-R81/fw1/log/postgres.elg: Permission denied
pg_ctl: could not start server
Examine the log output.
pg_ctl: PID file "/opt/CPshrd-R81/database/postgresql/data/postmaster.pid" does not exist
Is server running?

Jerry

_Val_ · ‎2023-02-10

Nope, it is not running, you have an issue with the database. Any chance to roll back to the previous backup version?

Jerry · ‎2023-02-10

if that is the only option ... 😞

clash seem not started as well:

CLINFR0559 Server is not ready. Please try after couple of minutes.

Jerry

_Val_ · ‎2023-02-10

Sounds like you have HDD issues all over the place. I really hope you have an external backup to restore from.

Jerry · ‎2023-02-10

@_Val_ why would you say HDD means RAID issue? No ... all disks are fine !!! Also got a space just fine, I do not have an external backup of that customer but backup's on the CP itself. I can take them to the USB if that's needed sure I can.

Jerry

_Val_ · ‎2023-02-10

Let's see.

Your DB is not okay, and you cannot access CLISH. The file system failure is the most probable cause of both two. Try exporting the backup to external storage, and then try restoring it in the lab first. Chances are, if the file system is in ruins, your backup is too.

Jerry · ‎2023-02-10

[Executing:]# hcp -r all
Test name Status
============================================================
ARP Cache Limit...................................[PASSED]
Bond Health.......................................[PASSED]
Connectivity to UC................................[PASSED]
Core Dumps........................................[PASSED]
Disk Space........................................[WARNING]
File Descriptors..................................[PASSED]
Gaia DB...........................................[PASSED]
ICA expiration....................................[PASSED]
Interface Errors..................................[PASSED]
Kernel crash......................................[PASSED]
MTU...............................................[PASSED]
Memory Usage......................................[PASSED]
SSD Health........................................[PASSED]
Soft lockup.......................................[PASSED]
Software Version..................................[PASSED]
Transceivers Support..............................[PASSED]
Zombie processes..................................[PASSED]

Generating Topology...............................[Done]
Generating Story..................................[Done]
Generating Charts.................................[Done]

To view full report on this machine, run "hcp --show-last-full"

Jerry

_Val_ · ‎2023-02-10

None of these tests actually checks MGMT DB. I am surprised to see they show Gaia DB okay. Why cannot you run CLISH then?

Jerry · ‎2023-02-10

spot on. that's what I wonder since many hours Val. I don't know just CLISH does not seem to LOAD. That's all. And this is all after TAKE81 install on TAKE77 working for months on this platform.

Jerry

Jerry · ‎2023-02-10

[Executing:]# pvs
PV VG Fmt Attr PSize PFree
/dev/md2 vg_splat lvm2 a-- 433.44g 175.59g

Jerry

Jerry · ‎2023-02-10

@_Val_ is there any way to restore backup using none-clish environment? I do see they've got a backup but not Snapshoots without Gaia Portal (that stopped working as well). I'd like to recover the backup but got no CLISH nor Gaia Portal. Any hints?

Jerry

the_rock · ‎2023-02-10

Hey @Jerry .

Man, I hate to say this, but have to be factual. Your options here are very limited,but lets be positive and see if we can solve this together. So, here is the thing. @_Val_ is 100% correct...IF you vavigate to $FWDIR/scripts and run ./cpm_status.sh and it shows anything but management is up and ready and then you run api status command and it will most likely show failed at the bottom, smart console will NEVER work, sorry mate.

Having said this, I dont know why, but I cant see whole output of cpwd_admin list command. Does every process show as E and 1?

I really hope you have working backup that can be restored and here is why. My colleague and I worked with large client of ours and though they have S1C instance we manage, they also have seperate Azure CP mgmt server and one day, out of the blue, almost 2 years ago, we noticed we could not log in to smart console, so we did bunch of testing, verified processes and had exact same symptoms as you. Opened TAC case, they also verified same things as us, took few days, but could not come up with reason/solution, so thanks goodness this was environment where did not have to make many changes, since we did not feel like waiting for TAC any more, we got working backup, restored, installed same jumbo and bam, all was up and running.

Sadly, we were never provided with a reason as to why this happened. Thank God it only happened that one time...anyway, happy to do remote with you and see if I can assist any further.

Here is what those outputs would look like on working mgmt server (my R81.20 lab)

API Settings:
---------------------
Accessibility: Require local
Automatic Start: Enabled

Processes:

Name State PID More Information
-------------------------------------------------
API Started 648
CPM Started 648 Check Point Security Management Server is running and ready
FWM Started 653
APACHE Started 8480

Port Details:
-------------------
JETTY Internal Port: 59192
JETTY Documentation Internal Port: 54921
APACHE Gaia Port: 443

Profile:
-------------------
Machine profile: Large SMC env resources profile without SME
CPM heap size: 1280m

--------------------------------------------
Overall API Status: Started
--------------------------------------------

API readiness test SUCCESSFUL. The server is up and ready to receive connections

Notes:
------------
To collect troubleshooting data, please run 'api status -s <comment>'

[Expert@QUANTUM-MANAGEMENT:0]# cpwd_admin list
APP PID STAT #START START_TIME MON COMMAND
CPVIEWD 9318 E 1 [11:50:11] 3/2/2023 N cpviewd
CPVIEWS 9323 E 1 [11:50:12] 3/2/2023 N cpview_services
CPD 9345 E 1 [11:50:14] 3/2/2023 Y cpd
FWD 9702 E 1 [11:51:01] 3/2/2023 N fwd -n
FWM 653 E 1 [10:05:48] 5/2/2023 N fwm
FWMHA 681 E 1 [10:05:48] 5/2/2023 N fwmha -H
STPR 10215 E 1 [11:52:08] 3/2/2023 N status_proxy
CLOUDGUARD 10967 E 1 [11:52:34] 3/2/2023 N vsec_controller_start
CPM 648 E 1 [10:05:48] 5/2/2023 N /opt/CPsuite-R81.20/fw1/scripts/cpm.sh -s
SOLR 11504 E 1 [11:53:24] 3/2/2023 N java_solr
RFL 11881 E 1 [11:53:44] 3/2/2023 N LogCore
SMARTVIEW 12109 E 1 [11:54:03] 3/2/2023 N SmartView
INDEXER 12584 E 1 [11:54:36] 3/2/2023 N /opt/CPrt-R81.20/log_indexer/log_indexer -workingDir /opt/CPrt-R81.20/log_indexer/
SMARTLOG_SERVER 12754 E 1 [11:54:45] 3/2/2023 N /opt/CPSmartLog-R81.20/smartlog_server
REPMAN 12889 E 1 [11:54:50] 3/2/2023 N java_repository_manager
DASERVICE 13495 E 1 [11:55:24] 3/2/2023 N DAService_script
AUTOUPDATER 13616 E 1 [11:55:32] 3/2/2023 N AutoUpdaterService.sh
LPD 14904 E 1 [11:55:58] 3/2/2023 N lpd
CPSM 2704 E 1 [10:08:39] 5/2/2023 N cpstat_monitor
[Expert@QUANTUM-MANAGEMENT:0]# $FWDIR/scripts/./cpm_status.sh
Check Point Security Management Server is running and ready
[Expert@QUANTUM-MANAGEMENT:0]#

Best,
Andy
"Have a great day and if its not, change it"

Jerry · ‎2023-02-10

hi mate so here is the latest I could get from that R81;

[Expert@cp:0]# ./cpm_status.sh
Check Point Security Management Server is during initialization

[Expert@cp:0]# ./cpm_stop.sh
cpm wasnt stopped gracefully after 30 seconds. Force-stopping it (pid=14194)

[Expert@cp:0]# ./cpm_status.sh
Check Point Security Management Server is not running

[Expert@cp:0]# $FWDIR/scripts/ngm_start.sh
/bin/sh: /opt/CPsuite-R81/fw1/log/postgres.elg: Permission denied
pg_ctl: could not start server
Examine the log output.
pg_ctl: PID file "/opt/CPshrd-R81/database/postgresql/data/postmaster.pid" does not exist
Is server running?

--- this is where it gets complicated mate and that's the reason why the CPM does not really work, either Gaia Portal or CLISH does not work and I'm having backups in hand but unable to restore them without the CLISH working. AFAIK backups are not for BASH but CLISH hence I'm struggling and I'm close to the decision that the rebuild from scratch need to happen at some point. And this is all after installing the TAKE 81 on that R81 take 77 build. What a hell it is now ... Any idea of who to fix that stuff please let me know.

cpwd_admin list complete is here if you wish. it is completed and it isn't limited at all.

[Expert@cp:0]# cpwd_admin list
APP PID STAT #START START_TIME MON COMMAND
CPVIEWD 27015 E 1 [12:38:32] 10/2/2023 N cpviewd
CPVIEWS 27026 E 1 [12:38:32] 10/2/2023 N cpview_services
CVIEWAPIS 27031 E 1 [12:38:33] 10/2/2023 N cpview_api_service
SXL_STATD 27034 E 1 [12:38:33] 10/2/2023 N sxl_statd
CPD 27047 E 1 [12:38:34] 10/2/2023 Y cpd
MPDAEMON 27089 E 1 [12:38:35] 10/2/2023 N mpdaemon /opt/CPshrd-R81/log/mpdaemon.elg /opt/CPshrd-R81/conf/mpdaemon.conf
TP_CONF_SERVICE 27113 E 1 [12:38:37] 10/2/2023 N tp_conf_service --conf=tp_conf.json --log=error
CI_CLEANUP 27189 E 1 [12:38:43] 10/2/2023 N avi_del_tmp_files
CIHS 27191 E 1 [12:38:43] 10/2/2023 N ci_http_server -j -f /opt/CPsuite-R81/fw1/conf/cihs.conf
FWD 27201 E 1 [12:38:43] 10/2/2023 N fwd
FWM 16804 E 1 [12:52:47] 10/2/2023 N fwm
STPR 27266 E 1 [12:38:44] 10/2/2023 N status_proxy
SPIKE_DETECTIVE 27292 E 1 [12:38:46] 10/2/2023 N spike_detective
CLOUDGUARD 27326 E 1 [12:38:47] 10/2/2023 N vsec_controller_start
CPM 31581 E 2 [14:39:55] 10/2/2023 N /opt/CPsuite-R81/fw1/scripts/cpm.sh -s
RAD 27875 E 1 [12:38:59] 10/2/2023 N rad
LPD 28224 E 1 [12:39:05] 10/2/2023 N lpd
WSDNSD 17676 E 3 [14:00:02] 10/2/2023 Y wsdnsd
RTMD 8025 E 1 [12:41:29] 10/2/2023 N rtmd
SOLR 0 T 0 [12:46:08] 10/2/2023 N java_solr
RFL 8170 E 1 [12:41:30] 10/2/2023 N LogCore
SMARTVIEW 8184 E 1 [12:41:30] 10/2/2023 N SmartView
INDEXER 8382 E 1 [12:41:31] 10/2/2023 N /opt/CPrt-R81/log_indexer/log_indexer
SMARTLOG_SERVER 8478 E 1 [12:41:34] 10/2/2023 N /opt/CPSmartLog-R81/smartlog_server
REPMAN 8767 E 1 [12:41:49] 10/2/2023 N java_repository_manager
AUTOUPDATER 8843 E 1 [12:41:51] 10/2/2023 N AutoUpdaterService.sh

Jerry

the_rock · ‎2023-02-10

Lets do remote when you are free. Im in EST, just message me privately mate.

Best,
Andy
"Have a great day and if its not, change it"

Jerry · ‎2023-02-10

OK. Will do mate. I can do later on today but should you be EST and I'm GMT we could do that indeed!

Thanks.

Jerry

Jerry · ‎2023-02-10

[Expert@cp:0]# api status

API Settings:
---------------------
Accessibility: Require local
Automatic Start: Unknown

Processes:

Name State PID More Information
-------------------------------------------------
API Stopped 31581
CPM Starting 31581 Check Point Security Management Server is during initialization
FWM Started 16804
APACHE Stopped 0

Port Details:
-------------------
JETTY Internal Port: 0
JETTY Documentation Internal Port: 0
APACHE Gaia Port: 4434 (a non-default port)
When running mgmt_cli commands add '--port 4434'
When using web-services, add port 4434 to the URL

Profile:
-------------------
Machine profile: 24800-35800 without SME
CPM heap size: 3072m

Apache port retrieved from: httpd-ssl.conf

--------------------------------------------
Overall API Status: The API Server Is Not Running!
--------------------------------------------

API readiness test FAILED. The server is down and unable to receive connections!
()
Notes:
------------
To collect troubleshooting data, please run 'api status -s <comment>'
()

Jerry

the_rock · ‎2023-02-10

Just to update quick...Jerry and I had remote and lets see if we can try to put our brains "together" to find a solution : - ). So here is the main issue...since no matter what we do, we can NOT get to clish, or even run clish -c commands, as it always gives error that server is not ready, and even though winscp is accessible and backups are there, again, without web UI access, since apache is down and cpm is not running and cant even be restarted, IF we can get web UI access or clich working somehow, this can be solved easily with 1 of 2 backups present.

@Jerry will get into maintenance mode to see if there are any vialble options there without having to wipe out the whole thing. We will reconvene a bit later to see what happens.

Best,
Andy
"Have a great day and if its not, change it"

the_rock · ‎2023-02-10

Just to give another update, @Jerry and I had quick follow up afterwards and he mentioned would probably do rebuild this weekend or next week. Its definitely disappointing that installing jumbo 81 on top of R81 would cause this, because even install log file shows database errors and they match exactly with installation of jumbo hotfix.

Andy

Best,
Andy
"Have a great day and if its not, change it"

genisis__ · ‎2023-02-11

Could this have been due to a corrupted Jumbo? Really think TAC need to be all over this to determine root cause.

the_rock · ‎2023-02-11

Based on the messages we saw, Im fairly certain that to be true.

Best,
Andy
"Have a great day and if its not, change it"

Sorin_Gogean · ‎2023-02-11

Hello guys,

I'm having a Maintenance window this weekend and I'm first applying JHF81 to LOG and Management servers .
So after I've done this, I'll come back and let you know how things are.

I did a config back-up and a VM Snapshot - as my log and management are VM's - so will see if I encounter Jerry's issue.

Thank you,

PS: so as a quick check, if $FWDIR/scripts/cpm_status.sh and "cpwd_admin list" shows OK, I should consider I'm fine !?!?!??!

Sorin_Gogean · ‎2023-02-11

Log server just rebooted and seems OK for me:

the_rock · ‎2023-02-11

If status shows ok, then its safe to say its good.

Best,
Andy
"Have a great day and if its not, change it"

genisis__ · ‎2023-02-11

Is this on a VM or Checkpoint Appliance? I'm wondering if a snapshot can be taken and provided to TAC?
Clearly the priority would to be get back a working system, in which case rebuild and restore sounds like the only option unless it was on a VM; perhaps a VM snapshot was taken?

Sorin_Gogean · ‎2023-02-11

So, I'm R81 therefore my JHF is 81 - I have no 87, don't confuse packages and versions.

Please read the above posts properly :), Jerry's an appliance as he states RAID shows OK. Can't say for sure.....

For me till now the LOG server is OK, pending Management in 10 min or so 🙂

the_rock · ‎2023-02-11

Correct. Its an appliance, but configured as standalone, which really should not matter.

Best,
Andy
"Have a great day and if its not, change it"

Are you a member of CheckMates?

R81 SmartConsole Access issue after install of recent TAKE 81