Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Adam276
Contributor
Jump to solution

'Firewall' is not responding SmartConsole alert after changing back to previous cluster version

I am having a problem moving my old secondary firewall back into place after having problems with a new secondary firewall hardware/software upgrade. I tried moving to R81.20 Take 90 on a cluster (secondary only first) and had a reboot issue when pushing policy (I will look into that later).  I am still working fine on the old R80.10 primary as I was only trying to get the secondary ready to switch to before experiencing the reboot problem.  FYI... The old and new hardware is Dell hardware/NICs that are on the Checkpoint open server HCL.

My main problem right now though is trying to get the the old secondary back functional (it is R80.10).  The cpri_d process is failing on boot. Because I had to change SIC in management for the new secondary hardware/software upgrade, I reset SIC on old secondary gateway and changed the version back to R80.10 in the management.  SIC tests successful from SmartConsole.  I can push policy and no errors on the policy push itself.  The secondary is showing red though.

This is on bootup of the system...

Starting the system...

                            Starting cpri_d:   FAILED

GUI shows green status for the primary(that system wasn't touched).  The secondary is showing a red alert though in SmartConsole gateways view.  I have tried rebooting, pushing policy again, etc but no change.

SmartConsole GUI shows this as an alert for this gateway.
'Firewall' is not responding. Verify that 'Firewall' is installed on the gateway.  "If 'Firewall' should not be installed verify that it is not selected in the Products list of the gateway (SmartConsole > Security Gateway > General Properties . Software Blades List).

Trying to start cprid manually gives this error...

[Expert@Ode-Fw21:0]# $CPDIR/bin/cpridstart
DIAGDIR: Undefined variable.

Looking at the /opt/CPsuite-R80/fw1/log/lpd.elg log file I see this...

[lpd 2962 4158621392]@Ode-Fw2[3 Mar 9:24:56] [init][INFO][logger.cpp:62 : initLogger] ####################################################
[lpd 2962 4158621392]@Ode-Fw2[3 Mar 9:24:56] [init][INFO][logger.cpp:63 : initLogger] welcome to lpd log!
[lpd 2962 4158621392]@Ode-Fw2[3 Mar 9:24:56] [main][ERROR][daemon_main.cpp:42 : main] LPDException: Failed to fetch DIAGDIR environment variable
[lpd 2962 4158621392]@Ode-Fw2[3 Mar 9:24:56] [main][INFO][daemon_main.cpp:59 : main] Exit code: 3
[lpd 6621 4157658832]@Ode-Fw2[3 Mar 9:25:55] [init][INFO][logger.cpp:62 : initLogger] ####################################################
[lpd 6621 4157658832]@Ode-Fw2[3 Mar 9:25:55] [init][INFO][logger.cpp:63 : initLogger] welcome to lpd log!
[lpd 6621 4157658832]@Ode-Fw2[3 Mar 9:25:55] [main][ERROR][daemon_main.cpp:42 : main] LPDException: Failed to fetch DIAGDIR environment variable
[lpd 6621 4157658832]@Ode-Fw2[3 Mar 9:25:55] [main][INFO][daemon_main.cpp:59 : main] Exit code: 3

cphaprob shows correct backup for the secondary.  Primary is active of course.

Anyone experience this?

No rules changes or object changes besides resetting SIC for the secondary member and changing version on the cluster object back to the old R80.10 version.

0 Kudos
1 Solution

Accepted Solutions
Adam276
Contributor

I setup a bare bones new management and cluster gateway VM lab (did not import my management data) with simple firewall policy (no VPN) and reproduced the exact problem with one of the gateway status as 'Firewall is not responding' error in the GUI.  If you change the R80.10 gateway cluster object version to R81.20, attempt the upgrade to the secondary, and then change the version back to R80.10, put the old secondary gateway back in place, One of the two gateways then get reported in SmartConsole gateway status (when you click on the gateway in the gateways view) will show the 'Firewall is not responding'.  In my lab the firewall that I didn't touch showed it so it would seem the change to R80.20 changes something in the parent cluster object that doesn't get changed back when you change the version back to R81.20 that causes this.

I did not attempt a snapshot revert of the firewall.  My thoughts on the 'Firewall is not  responding' pushed me to look at management because all I did on the old gateway was reset SIC and change the version in management.  A session history revert of the management changes to before the R80.10 cluster version was changed to R81.20 fixed the issue.  Nothing had to be done as far as pushing policy.  The management alert for the gateway went green about a minute later.  That result was the same in the lab and for the cluster that experienced the issue.

Now i get to move onto the other issue where pushing policy the first time to the secondary during hardware/software upgrade caused loss of connectivity to that gateway and multiple reboots of the new R81.20 secondary gateway that I put in place for the R80.10 migration to R81.20.  I didn't get to the primary because of that.  More work to do there... I won't post about that in this thread though.  I will create another thread if needed for that.  That was R81.20 take 90 so I might install latest HFA and try again or wait for the next official HFA because of reported IKE issues which would be critical for me.

EDIT: In addition, The errors that come up on boot and in that log were not related to the SmartConsole 'Firewall is not responding' alert of course taking in the info above.  The firewalls appear to be working just fine back on R80.10 after the session history revert and likely have had those errors for a while now and are likely on the primary firewall as well.

View solution in original post

0 Kudos
7 Replies
the_rock
Legend
Legend

Nothing comes up on support site about it, at least that I can find. Have you asked TAC about it? One suggestion...maybe console into it, run halt command, then unplug power cord, wait 30 seconds, plug it back in and see if that helps.

Andy

0 Kudos
Adam276
Contributor

I see the same log stuff in lpd.elg on the other old firewall so I assume that is not related.

I also don't see a cprid process running on the old primary either so maybe that also is not related.  It is possible on reboot the primary gives that same error and not be related also.  Not sure.  I don't want to touch the primary until I have a functional secondary though.

The main issue seems to be that SmartConsole shows the alert below for the second member.

'Firewall' is not responding. Verify that 'Firewall' is installed on the gateway.  "If 'Firewall' should not be installed verify that it is not selected in the Products list of the gateway (SmartConsole > Security Gateway > General Properties . Software Blades List).

I can't find anything about that yet either.  I might revert to a previous snapshot from yesterday on the old secondary that I took out of precaution.  The only thing done on it though was SIC and management version change back and forth so it doesn't seem likely that it will change anything.  I will still have to redo SIC again to get that going since SIC was reset in management recently for that cluster member.

The strange thing is it is complaining about firewall blade but cphaprob stat shows it in backup state which means the process checks that happen for HA state are at least ok.

0 Kudos
the_rock
Legend
Legend

I actually remember that EXACT error, but not for fw blade, rather for identity awareness. In the lab, it disappeared when I rebooted the fw, but I never logically understood why it came up in the first place and could not really open a TAC case, since its a lab.

Andy

0 Kudos
Adam276
Contributor

I setup a bare bones new management and cluster gateway VM lab (did not import my management data) with simple firewall policy (no VPN) and reproduced the exact problem with one of the gateway status as 'Firewall is not responding' error in the GUI.  If you change the R80.10 gateway cluster object version to R81.20, attempt the upgrade to the secondary, and then change the version back to R80.10, put the old secondary gateway back in place, One of the two gateways then get reported in SmartConsole gateway status (when you click on the gateway in the gateways view) will show the 'Firewall is not responding'.  In my lab the firewall that I didn't touch showed it so it would seem the change to R80.20 changes something in the parent cluster object that doesn't get changed back when you change the version back to R81.20 that causes this.

I did not attempt a snapshot revert of the firewall.  My thoughts on the 'Firewall is not  responding' pushed me to look at management because all I did on the old gateway was reset SIC and change the version in management.  A session history revert of the management changes to before the R80.10 cluster version was changed to R81.20 fixed the issue.  Nothing had to be done as far as pushing policy.  The management alert for the gateway went green about a minute later.  That result was the same in the lab and for the cluster that experienced the issue.

Now i get to move onto the other issue where pushing policy the first time to the secondary during hardware/software upgrade caused loss of connectivity to that gateway and multiple reboots of the new R81.20 secondary gateway that I put in place for the R80.10 migration to R81.20.  I didn't get to the primary because of that.  More work to do there... I won't post about that in this thread though.  I will create another thread if needed for that.  That was R81.20 take 90 so I might install latest HFA and try again or wait for the next official HFA because of reported IKE issues which would be critical for me.

EDIT: In addition, The errors that come up on boot and in that log were not related to the SmartConsole 'Firewall is not responding' alert of course taking in the info above.  The firewalls appear to be working just fine back on R80.10 after the session history revert and likely have had those errors for a while now and are likely on the primary firewall as well.

0 Kudos
Adam276
Contributor

I changed the title to reflect what was fixed in this thread.

0 Kudos
PhoneBoy
Admin
Admin

How did you perform the migration to R81.20 from R80.10?
The supported method requires going through an intermediate version (R80.40).

0 Kudos
Adam276
Contributor

It was not a 'migration' per say.  Management was already on R81.20.  The previous version of management was R80.40.  Management was upgraded a while ago.  The gateway cluster on R80.10 was a hardware and version upgrade with a fresh install of R81.20 and setup to match original hardware OS/Gaia config.  It was a version and hardware upgrade attempt at the same time.  I did a lab test of the upgrade of gateway cluster going from VMs on R80.10 to new VMs on R81.20 (basic access rules setup) and that worked perfectly in the lab... When doing it in production though, secondary didn't work as mentioned so I had to revert back and that is what triggered this thread's issue that I solved by reverting the management session history.  The 'Firewall is not responding' issue is not related to the secondary failure and reboot issues I am pretty sure.  The 'Firewall is not responding' seemed to be only when going back to the old version on the gateways after changing the version to R81.20 and then back to R80.10.

You can see my other post about the steps that I did in the VM lab for the gateway cluster upgrade but only got halfway through before needing to revert to previous version in production.
https://community.checkpoint.com/t5/Security-Gateways/Firewall-is-not-responding-SmartConsole-alert-...

0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events