Timothy Hall

fw_runfilter_ex(ctx id 0): function does not exist -1

Discussion created by Timothy Hall Champion on Feb 18, 2018
Latest reply on Apr 4, 2018 by Timothy Hall

This weekend I was on-call for a customer performing an in-place R77.30 to R80.10 upgrade of a ClusterXL cluster, and I wanted to share what we found out as it was very difficult to diagnose.  The customer already had a gateway they had upgraded successfully to R80.10 beforehand at their DR site.

 

Customer called and said that after upgrade of one ClusterXL member to R80.10 and failing over traffic to it, nothing would pass through the upgraded gateway even though policy had been installed.  I whipped out the trusty fw ctl zebug drop command and was greeted with screenfulls of this (which was also being dumped into /var/log/messages at a rapid rate):

 

Feb 17 08:47:00 2018 XXX kernel: [fw4_0];FW-1: fw_runfilter_ex(ctx id 0): function does not exist -1

Feb 17 08:47:00 2018 XXX kernel: [fw4_2];FW-1: fw_runfilter_ex(ctx id 0): function does not exist -1

 

Quick search of SecureKnowledge/CheckMates/CPUG yields a big goose egg on this error message.  Term "filter" appearing in it did seem to imply an issue with the Firewall blade, and disabling other blades such as IPS/TP didn't have any effect on the issue.  I assumed that something went wrong in the upgrade process (even though all the upgrade logs in /opt/CPInstLog looked good) and proceeded to do a fresh load of R80.10 plus Jumbo Take 70 and load configuration of Gaia config.  Pushed policy, everything looked good, customer completed their test plan.  Fresh-loaded other gateway with R80.10 Jumbo Take 70, everything looked good, failed over and customer completed their test plan.

 

While trying to clear up some policy installation warnings and give the customer a "clean & green" outcome, suddenly the issue came back (accompanied I might add by a raft of expletives uttered by myself and the customer).  No amount of rebooting (including a simultaneous reboot of both cluster members) could seem to make it go away, then suddenly it stopped right after a policy install and everything started working.  We changed the settings back that made it start working again to "re-break" it and installed policy again.  Still worked.  Hmm.

 

Customer makes some more changes and the issue comes back again, backing out the recent changes and reinstalling policy doesn't fix it.  I cry uncle at this point and after involving Check Point support (who was excellent by the way - kudos to Efim Bliacher) we find out that this is a known issue fixed in an ongoing take but not yet documented.  The conditions that lead to this situation are:

 

1) More than one set of R80.10 gateways/clusters being managed by the same SMS

and

2) Both gateways/clusters are using a different IPS/TP profile

and

3) Policy is pushed to 2 or more gateways/clusters in a single operation (in our case to both the cluster and DR firewall)

 

So the workaround was to only push to one set of firewalls at a time, and anytime during our testing when we happened to push to both sets of gateways simultaneously the issue would come back, regardless of any other changes we were making.  Apparently the root cause was the Inspection Settings (which were split out from the IPS blade in R80.10); during a simultaneous policy push elements of each gateway/cluster's Inspection Settings were used to inappropriately populate Implied Rules on the other gateway, thus causing the other gateway to have a reference to an object/function in its Implied Rules that didn't exist.  As many of the Implied Rules are always checked first, the error would essentially drop all new connections trying to start (but existing connections would continue).  This effect did have some cosmetic similarities to this SK: sk97704: Security Gateway may stop accepting new IPv4 connections when working with Dynamic Objects or with IPS protection 'Malicious IPs'

 

I had seen situations before where trying to push to more than one gateway/cluster at a time would cause policy installation failures due to IPS Profile conflicts, but never ones that would allow the policy to be installed and immediately cause a critical traffic-handling failure on the gateway/cluster.  Hopefully this writeup wasn't too long, and will help someone else.

 

--
Second Edition of my "Max Power" Firewall Book
Now Available at http://www.maxpowerfirewalls.com

Outcomes