Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Jerry
Mentor
Mentor
Jump to solution

R81 SmartConsole Access issue after install of recent TAKE 81

 

Screenshot 2023-02-10 at 10.59.38.png

Build 564 (new SC was tried), old 563 was attempted as well, same stuff, same error.

my Customer have tried everything, from restarting the CPD to restarting FWM and finally the whole device.

no luck, you have full firewall running, fwd make logs, cod works and your security is in place however, you've got no chance to open your SmartConsole with above error and you have no technical capabilities to bring the FWM alive again. Any clues from the R&D or yourself folks?

 

Below details of the environment (as usual).:

 

[Expert@cp:0]# cpstat os

Product Name: SVN Foundation
SVN Foundation Version String: R81
SVN Foundation Build Number: 995000045
SVN Foundation Status: OK
OS Name: Gaia
OS Major Version: 3
OS Minor Version: 10
OS Build Number: -
OS SP Major: -
OS SP Minor: -
OS Version Level:

[Expert@cp:0]# cpinfo -y all

This is Check Point CPinfo Build 914000231 for GAIA
[MGMT]
HOTFIX_R81_JUMBO_HF_MAIN Take: 81
[IDA]
No hotfixes..
[CPFC]
HOTFIX_TEX_ENGINE_R81_AUTOUPDATE
[FW1]
HOTFIX_PUBLIC_CLOUD_CA_BUNDLE_AUTOUPDATE
HOTFIX_NGM_DOCTOR_AUTOUPDATE
HOTFIX_WEBCONSOLE_AUTOUPDATE
HOTFIX_R81_JUMBO_HF_MAIN Take: 81
HOTFIX_R81_MTA Take: 4
HOTFIX_GOT_TPCONF_AUTOUPDATE
HOTFIX_GOT_TPCONF_MGMT_AUTOUPDATE
HOTFIX_TEX_ENGINE_R81_AUTOUPDATE

FW1 build number:
This is Check Point Security Management Server R81 - Build 017
This is Check Point's software version R81 - Build 037
kernel: R81 - Build 037
[SecurePlatform]
HOTFIX_R81_JUMBO_HF_MAIN Take: 81
HOTFIX_ENDER_V17_AUTOUPDATE

 

***

Any ideas on how to bring the SmartConsole back to live?

Jerry
0 Kudos
2 Solutions

Accepted Solutions
Tal_Paz-Fridman
Employee
Employee

According to Appliance Support Life Cycle Timeline 13500 does not support R81 and has already reached end of support:

https://www.checkpoint.com/support-services/support-life-cycle-policy/#appliances-support

 

 

View solution in original post

0 Kudos
60 Replies
_Val_
Admin
Admin

What do you see with "cpwd_admin list" and "$FWDIR/scripts/cpm_status.sh"?

 

0 Kudos
Jerry
Mentor
Mentor

[Expert@cp:0]# $FWDIR/scripts/cpm_status.sh
Check Point Security Management Server is during initialization

Jerry
0 Kudos
Jerry
Mentor
Mentor

[Expert@cp:0]# cpwd_admin list
APP PID STAT #START START_TIME MON COMMAND
CPVIEWD 27052 E 1 [11:58:29] 10/2/2023 N cpviewd
CPVIEWS 27069 E 1 [11:58:30] 10/2/2023 N cpview_services
CVIEWAPIS 27074 E 1 [11:58:30] 10/2/2023 N cpview_api_service
SXL_STATD 27077 E 1 [11:58:31] 10/2/2023 N sxl_statd
CPD 27092 E 1 [11:58:32] 10/2/2023 Y cpd
MPDAEMON 27122 E 1 [11:58:33] 10/2/2023 N mpdaemon /opt/CPshrd-R81/log/mpdaemon.elg /opt/CPshrd-R81/conf/mpdaemon.conf
TP_CONF_SERVICE 27159 E 1 [11:58:34] 10/2/2023 N tp_conf_service --conf=tp_conf.json --log=error
CI_CLEANUP 27233 E 1 [11:58:40] 10/2/2023 N avi_del_tmp_files
CIHS 27235 E 1 [11:58:40] 10/2/2023 N ci_http_server -j -f /opt/CPsuite-R81/fw1/conf/cihs.conf
FWD 27245 E 1 [11:58:40] 10/2/2023 N fwd
FWM 27254 E 1 [11:58:40] 10/2/2023 N fwm
STPR 27285 E 1 [11:58:40] 10/2/2023 N status_proxy
SPIKE_DETECTIVE 27295 E 1 [11:58:41] 10/2/2023 N spike_detective
CLOUDGUARD 27324 E 1 [11:58:41] 10/2/2023 N vsec_controller_start
RAD 27938 E 1 [11:59:02] 10/2/2023 N rad
CPM 28002 E 1 [11:59:05] 10/2/2023 N /opt/CPsuite-R81/fw1/scripts/cpm.sh -s
LPD 28443 E 1 [11:59:11] 10/2/2023 N lpd
WSDNSD 30680 E 1 [12:00:52] 10/2/2023 Y wsdnsd
RTMD 8137 E 1 [12:01:35] 10/2/2023 N rtmd
SOLR 0 T 0 [12:06:08] 10/2/2023 N java_solr
RFL 8298 E 1 [12:01:37] 10/2/2023 N LogCore
SMARTVIEW 8312 E 1 [12:01:37] 10/2/2023 N SmartView
INDEXER 8506 E 1 [12:01:38] 10/2/2023 N /opt/CPrt-R81/log_indexer/log_indexer
SMARTLOG_SERVER 8610 E 1 [12:01:41] 10/2/2023 N /opt/CPSmartLog-R81/smartlog_server
REPMAN 9004 E 1 [12:01:52] 10/2/2023 N java_repository_manager
DASERVICE 9029 E 1 [12:01:53] 10/2/2023 N DAService_script
AUTOUPDATER 9118 E 1 [12:01:55] 10/2/2023 N AutoUpdaterService.sh

Jerry
Jerry
Mentor
Mentor

[Expert@cp:0]# $FWDIR/scripts/ngm_start.sh
/bin/sh: /opt/CPsuite-R81/fw1/log/postgres.elg: Permission denied
pg_ctl: could not start server
Examine the log output.
pg_ctl: PID file "/opt/CPshrd-R81/database/postgresql/data/postmaster.pid" does not exist
Is server running?

Jerry
0 Kudos
_Val_
Admin
Admin

Nope, it is not running, you have an issue with the database. Any chance to roll back to the previous backup version?

0 Kudos
Jerry
Mentor
Mentor

if that is the only option ... 😞 

clash seem not started as well:

CLINFR0559 Server is not ready. Please try after couple of minutes.

Jerry
0 Kudos
_Val_
Admin
Admin

Sounds like you have HDD issues all over the place. I really hope you have an external backup to restore from.

0 Kudos
Jerry
Mentor
Mentor

@_Val_ why would you say HDD means RAID issue? No ... all disks are fine !!! Also got a space just fine, I do not have an external backup of that customer but backup's on the CP itself. I can take them to the USB if that's needed sure I can.

Jerry
0 Kudos
_Val_
Admin
Admin

Let's see.

Your DB is not okay, and you cannot access CLISH. The file system failure is the most probable cause of both two. Try exporting the backup to external storage, and then try restoring it in the lab first. Chances are, if the file system is in ruins, your backup is too.

Jerry
Mentor
Mentor

[Executing:]# hcp -r all
Test name Status
============================================================
ARP Cache Limit...................................[PASSED]
Bond Health.......................................[PASSED]
Connectivity to UC................................[PASSED]
Core Dumps........................................[PASSED]
Disk Space........................................[WARNING]
File Descriptors..................................[PASSED]
Gaia DB...........................................[PASSED]
ICA expiration....................................[PASSED]
Interface Errors..................................[PASSED]
Kernel crash......................................[PASSED]
MTU...............................................[PASSED]
Memory Usage......................................[PASSED]
SSD Health........................................[PASSED]
Soft lockup.......................................[PASSED]
Software Version..................................[PASSED]
Transceivers Support..............................[PASSED]
Zombie processes..................................[PASSED]

Generating Topology...............................[Done]
Generating Story..................................[Done]
Generating Charts.................................[Done]

To view full report on this machine, run "hcp --show-last-full"

Jerry
0 Kudos
_Val_
Admin
Admin

None of these tests actually checks MGMT DB. I am surprised to see they show Gaia DB okay. Why cannot you run CLISH then?

0 Kudos
(1)
Jerry
Mentor
Mentor

spot on. that's what I wonder since many hours Val. I don't know just CLISH does not seem to LOAD. That's all. And this is all after TAKE81 install on TAKE77 working for months on this platform.

Jerry
0 Kudos
Jerry
Mentor
Mentor

[Executing:]# pvs
PV VG Fmt Attr PSize PFree
/dev/md2 vg_splat lvm2 a-- 433.44g 175.59g

Jerry
0 Kudos
Jerry
Mentor
Mentor

@_Val_ is there any way to restore backup using none-clish environment? I do see they've got a backup but not Snapshoots without Gaia Portal (that stopped working as well). I'd like to recover the backup but got no CLISH nor Gaia Portal. Any hints?

Jerry
0 Kudos
the_rock
Legend
Legend

Hey @Jerry .

Man, I hate to say this, but have to be factual. Your options here are very limited,but lets be positive and see if we can solve this together. So, here is the thing. @_Val_ is 100% correct...IF you vavigate to $FWDIR/scripts and run ./cpm_status.sh and it shows anything but management is up and ready and then you run api status command and it will most likely show failed at the bottom, smart console will NEVER work, sorry mate. 

Having said this, I dont know why, but I cant see whole output of cpwd_admin list command. Does every process show as E and 1? 

I really hope you have working backup that can be restored and here is why. My colleague and I worked with large client of ours and though they have S1C instance we manage, they also have seperate Azure CP mgmt server and one day, out of the blue, almost 2 years ago, we noticed we could not log in to smart console, so we did bunch of testing, verified processes and had exact same symptoms as you. Opened TAC case, they also verified same things as us, took few days, but could not come up with reason/solution, so thanks goodness this was environment where did not have to make many changes, since we did not feel like waiting for TAC any more, we got working backup, restored, installed same jumbo and bam, all was up and running.

Sadly, we were never provided with a reason as to why this happened. Thank God it only happened that one time...anyway, happy to do remote with you and see if I can assist any further.

Here is what those outputs would look like on working mgmt server (my R81.20 lab)

Using username "admin".
Pre-authentication banner message from server:
| This system is for authorized use only.
End of banner message from server
admin@172.16.10.203's password:
Send automatic password
Access denied
admin@172.16.10.203's password:
Last login: Fri Feb 3 12:20:06 2023 from 172.16.10.103
[Expert@QUANTUM-MANAGEMENT:0]#
[Expert@QUANTUM-MANAGEMENT:0]# api sratus
Usage: api (start|stop|restart|status [-s <comment>]|reconf|logging (on|off|warn|info|debug)|debug (on|off)|throttling (on|off)|fingerprint) [-f (json|text)]
[Expert@QUANTUM-MANAGEMENT:0]#
[Expert@QUANTUM-MANAGEMENT:0]# api status

API Settings:
---------------------
Accessibility: Require local
Automatic Start: Enabled

Processes:

Name State PID More Information
-------------------------------------------------
API Started 648
CPM Started 648 Check Point Security Management Server is running and ready
FWM Started 653
APACHE Started 8480

Port Details:
-------------------
JETTY Internal Port: 59192
JETTY Documentation Internal Port: 54921
APACHE Gaia Port: 443

Profile:
-------------------
Machine profile: Large SMC env resources profile without SME
CPM heap size: 1280m

 

--------------------------------------------
Overall API Status: Started
--------------------------------------------

API readiness test SUCCESSFUL. The server is up and ready to receive connections

Notes:
------------
To collect troubleshooting data, please run 'api status -s <comment>'

[Expert@QUANTUM-MANAGEMENT:0]# cpwd_admin list
APP PID STAT #START START_TIME MON COMMAND
CPVIEWD 9318 E 1 [11:50:11] 3/2/2023 N cpviewd
CPVIEWS 9323 E 1 [11:50:12] 3/2/2023 N cpview_services
CPD 9345 E 1 [11:50:14] 3/2/2023 Y cpd
FWD 9702 E 1 [11:51:01] 3/2/2023 N fwd -n
FWM 653 E 1 [10:05:48] 5/2/2023 N fwm
FWMHA 681 E 1 [10:05:48] 5/2/2023 N fwmha -H
STPR 10215 E 1 [11:52:08] 3/2/2023 N status_proxy
CLOUDGUARD 10967 E 1 [11:52:34] 3/2/2023 N vsec_controller_start
CPM 648 E 1 [10:05:48] 5/2/2023 N /opt/CPsuite-R81.20/fw1/scripts/cpm.sh -s
SOLR 11504 E 1 [11:53:24] 3/2/2023 N java_solr
RFL 11881 E 1 [11:53:44] 3/2/2023 N LogCore
SMARTVIEW 12109 E 1 [11:54:03] 3/2/2023 N SmartView
INDEXER 12584 E 1 [11:54:36] 3/2/2023 N /opt/CPrt-R81.20/log_indexer/log_indexer -workingDir /opt/CPrt-R81.20/log_indexer/
SMARTLOG_SERVER 12754 E 1 [11:54:45] 3/2/2023 N /opt/CPSmartLog-R81.20/smartlog_server
REPMAN 12889 E 1 [11:54:50] 3/2/2023 N java_repository_manager
DASERVICE 13495 E 1 [11:55:24] 3/2/2023 N DAService_script
AUTOUPDATER 13616 E 1 [11:55:32] 3/2/2023 N AutoUpdaterService.sh
LPD 14904 E 1 [11:55:58] 3/2/2023 N lpd
CPSM 2704 E 1 [10:08:39] 5/2/2023 N cpstat_monitor
[Expert@QUANTUM-MANAGEMENT:0]# $FWDIR/scripts/./cpm_status.sh
Check Point Security Management Server is running and ready
[Expert@QUANTUM-MANAGEMENT:0]#

0 Kudos
Jerry
Mentor
Mentor

hi mate so here is the latest I could get from that R81;

[Expert@cp:0]# ./cpm_status.sh
Check Point Security Management Server is during initialization

[Expert@cp:0]# ./cpm_stop.sh
cpm wasnt stopped gracefully after 30 seconds. Force-stopping it (pid=14194)

[Expert@cp:0]# ./cpm_status.sh
Check Point Security Management Server is not running

[Expert@cp:0]# $FWDIR/scripts/ngm_start.sh
/bin/sh: /opt/CPsuite-R81/fw1/log/postgres.elg: Permission denied
pg_ctl: could not start server
Examine the log output.
pg_ctl: PID file "/opt/CPshrd-R81/database/postgresql/data/postmaster.pid" does not exist
Is server running?

 

--- this is where it gets complicated mate and that's the reason why the CPM does not really work, either Gaia Portal or CLISH does not work and I'm having backups in hand but unable to restore them without the CLISH working. AFAIK backups are not for BASH but CLISH hence I'm struggling and I'm close to the decision that the rebuild from scratch need to happen at some point. And this is all after installing the TAKE 81 on that R81 take 77 build. What a hell it is now ... Any idea of who to fix that stuff please let me know.

 

cpwd_admin list complete is here if you wish. it is completed and it isn't limited at all.

[Expert@cp:0]# cpwd_admin list
APP PID STAT #START START_TIME MON COMMAND
CPVIEWD 27015 E 1 [12:38:32] 10/2/2023 N cpviewd
CPVIEWS 27026 E 1 [12:38:32] 10/2/2023 N cpview_services
CVIEWAPIS 27031 E 1 [12:38:33] 10/2/2023 N cpview_api_service
SXL_STATD 27034 E 1 [12:38:33] 10/2/2023 N sxl_statd
CPD 27047 E 1 [12:38:34] 10/2/2023 Y cpd
MPDAEMON 27089 E 1 [12:38:35] 10/2/2023 N mpdaemon /opt/CPshrd-R81/log/mpdaemon.elg /opt/CPshrd-R81/conf/mpdaemon.conf
TP_CONF_SERVICE 27113 E 1 [12:38:37] 10/2/2023 N tp_conf_service --conf=tp_conf.json --log=error
CI_CLEANUP 27189 E 1 [12:38:43] 10/2/2023 N avi_del_tmp_files
CIHS 27191 E 1 [12:38:43] 10/2/2023 N ci_http_server -j -f /opt/CPsuite-R81/fw1/conf/cihs.conf
FWD 27201 E 1 [12:38:43] 10/2/2023 N fwd
FWM 16804 E 1 [12:52:47] 10/2/2023 N fwm
STPR 27266 E 1 [12:38:44] 10/2/2023 N status_proxy
SPIKE_DETECTIVE 27292 E 1 [12:38:46] 10/2/2023 N spike_detective
CLOUDGUARD 27326 E 1 [12:38:47] 10/2/2023 N vsec_controller_start
CPM 31581 E 2 [14:39:55] 10/2/2023 N /opt/CPsuite-R81/fw1/scripts/cpm.sh -s
RAD 27875 E 1 [12:38:59] 10/2/2023 N rad
LPD 28224 E 1 [12:39:05] 10/2/2023 N lpd
WSDNSD 17676 E 3 [14:00:02] 10/2/2023 Y wsdnsd
RTMD 8025 E 1 [12:41:29] 10/2/2023 N rtmd
SOLR 0 T 0 [12:46:08] 10/2/2023 N java_solr
RFL 8170 E 1 [12:41:30] 10/2/2023 N LogCore
SMARTVIEW 8184 E 1 [12:41:30] 10/2/2023 N SmartView
INDEXER 8382 E 1 [12:41:31] 10/2/2023 N /opt/CPrt-R81/log_indexer/log_indexer
SMARTLOG_SERVER 8478 E 1 [12:41:34] 10/2/2023 N /opt/CPSmartLog-R81/smartlog_server
REPMAN 8767 E 1 [12:41:49] 10/2/2023 N java_repository_manager
AUTOUPDATER 8843 E 1 [12:41:51] 10/2/2023 N AutoUpdaterService.sh

 

Jerry
0 Kudos
the_rock
Legend
Legend

Lets do remote when you are free. Im in EST, just message me privately mate.

Jerry
Mentor
Mentor

OK. Will do mate. I can do later on today but should you be EST and I'm GMT we could do that indeed!

Thanks.

 

Jerry

Jerry
Jerry
Mentor
Mentor

[Expert@cp:0]# api status

API Settings:
---------------------
Accessibility: Require local
Automatic Start: Unknown

Processes:

Name State PID More Information
-------------------------------------------------
API Stopped 31581
CPM Starting 31581 Check Point Security Management Server is during initialization
FWM Started 16804
APACHE Stopped 0

Port Details:
-------------------
JETTY Internal Port: 0
JETTY Documentation Internal Port: 0
APACHE Gaia Port: 4434 (a non-default port)
When running mgmt_cli commands add '--port 4434'
When using web-services, add port 4434 to the URL

Profile:
-------------------
Machine profile: 24800-35800 without SME
CPM heap size: 3072m

Apache port retrieved from: httpd-ssl.conf


--------------------------------------------
Overall API Status: The API Server Is Not Running!
--------------------------------------------

API readiness test FAILED. The server is down and unable to receive connections!
()
Notes:
------------
To collect troubleshooting data, please run 'api status -s <comment>'
()

Jerry
0 Kudos
the_rock
Legend
Legend

Just to update quick...Jerry and I had remote and lets see if we can try to put our brains "together" to find a solution : - ). So here is the main issue...since no matter what we do, we can NOT get to clish, or even run clish -c commands, as it always gives error that server is not ready, and even though winscp is accessible and backups are there, again, without web UI access, since apache is down and cpm is not running and cant even be restarted, IF we can get web UI access or clich working somehow, this can be solved easily with 1 of 2 backups present.

@Jerry will get into maintenance mode to see if there are any vialble options there without having to wipe out the whole thing. We will reconvene a bit later to see what happens.

the_rock
Legend
Legend

Just to give another update, @Jerry and I had quick follow up afterwards and he mentioned would probably do rebuild this weekend or next week. Its definitely disappointing that installing jumbo 81 on top of R81 would cause this, because even install log file shows database errors and they match exactly with installation of jumbo hotfix.

Andy

0 Kudos
genisis__
Leader Leader
Leader

Could this have been due to a corrupted Jumbo?  Really think TAC need to be all over this to determine root cause.

the_rock
Legend
Legend

Based on the messages we saw, Im fairly certain that to be true.

0 Kudos
Sorin_Gogean
Advisor

Hello guys,

 

I'm having a Maintenance window this weekend and I'm first applying JHF81 to LOG and Management servers .
So after I've done this, I'll come back and let you know how things are.

 

I did a config back-up and a VM Snapshot - as my log and management are VM's - so will see if I encounter Jerry's issue.

 

Thank you,

PS: so as a quick check, if $FWDIR/scripts/cpm_status.sh and  "cpwd_admin list" shows OK, I should consider I'm fine !?!?!??!

Sorin_Gogean
Advisor

Log server just rebooted and seems OK for me:
Capture.JPG

the_rock
Legend
Legend

If status shows ok, then its safe to say its good.

0 Kudos
genisis__
Leader Leader
Leader

Is this on a VM or Checkpoint Appliance?  I'm wondering if a snapshot can be taken and provided to TAC?
Clearly the priority would to be get back a working system, in which case rebuild and restore sounds like the only option unless it was on a VM; perhaps a VM snapshot was taken?

0 Kudos
Sorin_Gogean
Advisor

So, I'm R81 therefore my JHF is 81 - I have no 87, don't confuse packages and versions.

Please read the above posts properly :), Jerry's an appliance as he states RAID shows OK. Can't say for sure..... 

For me till now the LOG server is OK, pending Management in 10 min or so 🙂 

the_rock
Legend
Legend

Correct. Its an appliance, but configured as standalone, which really should not matter.

0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events