cancel
Showing results for 
Search instead for 
Did you mean: 
Post a Question

Troubleshooting policy installation

Hi guys,

I've been studying sk101226 with regards to the process of installing process and how to troubleshoot it. However, I was wondering if there is a way for someone to troubleshoot the following components that the sk above does not go into further detail:

1) Smartconsole's Web Service. What is this process exactly and how would it appear if run ps auxw? Is it likely for it to fail and if so, how could we troubleshoot it?

2) The CPTA command that is invoked by fwm. Again, could anybody provide more detail into the Checkpoint Transfer Agent responsible for actually transfer the policy files to the gateway?

Many thanks in advance.

7 Replies
Admin
Admin

Re: Troubleshooting policy installation

The webservice process is called cpm.

You won't be able to connect via SmartConsole or the API if this process dies. Smiley Happy

Most of the logging is $FWDIR/log/cpm.elg and ps -auxw looks like:

See also: R80.x Security Management server main processes debugging 

admin     5383  0.1  0.8 3152404 136248 ?      Sl   Sep10  11:59 /opt/CPshrd-R80.20/jre_64/bin/java -D_vSEC=TRUE -Xdump:directory=/var/log/dump/usermode -Xdump:heap:events=gpf+user -Xdump:tool:none -Xdump:tool:events=gpf+abort+traceassert+corruptcache,priority=1,range=1..0,exec=javaCompress.sh vSEC %pid -Xdump:tool:events=systhrow,filter=java/lang/OutOfMemoryError,priority=1,range=1..0,exec=javaCompress.sh vSEC %pid -Xdump:tool:events=throw,filter=java/lang/OutOfMemoryError,exec=kill -9 %pid -Xaggressive -Xshareclasses:none -Xgc:scvTenureAge=1,noAdaptiveTenure -Dfwdir=/opt/CPsuite-R80.20/fw1 -Dlog4j.configuration=file:///opt/CPvsec-R80.20/lib/log4j.properties -cp /opt/CPvsec-R80.20/lib/*:/opt/CPsuite-R80.20/fw1/cpm-server/*:/opt/CPsuite-R80.20/fw1/VE/bin/* com.cp.cms_proxy.mainClass 127.0.0.1

admin    10084  0.6  1.3 3894880 227220 ?      Sl   Sep10  81:18 /opt/CPshrd-R80.20/jre_64/bin/java -D_CPM_SOLR=TRUE -Xmx1024m -Xms64m -Xgcpolicy:optavgpause -Djava.io.tmpdir=/opt/CPsuite-R80.20/fw1/tmp -Xaggressive -Xshareclasses:none -Xdump:heap:events=gpf+user -Xdump:directory=/var/log/dump/usermode -Xdump:tool:none -Xdump:tool:events=gpf+abort+traceassert+corruptcache,priority=1,range=1..0,exec=javaCompress.sh CPM_SOLR %pid -Xdump:tool:events=systhrow,filter=java/lang/OutOfMemoryError,priority=1,range=1..0,exec=javaCompress.sh CPM_SOLR %pid -Xdump:tool:events=throw,filter=java/lang/OutOfMemoryError,priority=1,exec=kill -9 %pid -Dsolr.solr.home=/opt/CPsuite-R80.20/fw1/Solr/solr/ -DNGM.SOLR.LOG.DIR=/opt/CPsuite-R80.20/fw1/log -Djava.util.logging.config.file=/opt/CPsuite-R80.20/fw1/Solr/etc/logging.properties -DSTART=/opt/CPsuite-R80.20/fw1/Solr/start.config -Djetty.home=/opt/CPsuite-R80.20/fw1/Solr/ -DSTOP.KEY=checkpointkey -DSTOP.PORT=8982 -Dpath=/opt/CPsuite-R80.20/fw1/cpm-server/java_is.jar:/opt/CPsuite-R80.20/fw1/cpm-server/java_sic.jar:/opt/CPshrd-R80.20/jars/jetty_assist.jar -jar /opt/CPsuite-R80.20/fw1/Solr/start.jar

admin    25158  1.2 31.1 7952412 5076076 ?     SNsl Sep10 151:09 /opt/CPshrd-R80.20/jre_64/bin/java -D_solr=TRUE -Xdump:directory=/var/log/dump/usermode -Xdump:heap:events=gpf+user -Xdump:tool:none -Xdump:tool:events=gpf+abort+traceassert+corruptcache,priority=1,range=1..0,exec=javaCompress.sh solr %pid -Xdump:tool:events=systhrow,filter=java/lang/OutOfMemoryError,priority=1,range=1..0,exec=javaCompress.sh solr %pid -Xdump:tool:events=throw,filter=java/lang/OutOfMemoryError,exec=kill -9 %pid -Xaggressive -Xshareclasses:none -Xgc:scvTenureAge=1,noAdaptiveTenure -Xmx4866m -Xms4866m -Dlog4j.configuration=file:/opt/CPrt-R80.20/conf/solr.log4j.properties -Dpath=/opt/CPrt-R80.20/jars/aspectjrt-1.7.0.jar:/opt/CPrt-R80.20/jars/commons-io-2.3.jar:/opt/CPrt-R80.20/jars/commons-lang-2.6.jar:/opt/CPrt-R80.20/jars/cxf-core-3.1.0.jar:/opt/CPrt-R80.20/jars/cxf-java2ws-plugin-3.1.0.jar:/opt/CPrt-R80.20/jars/cxf-rt-bindings-soap-3.1.0.jar:/opt/CPrt-R80.20/jars/cxf-rt-bindings-xml-3.1.0.jar:/opt/CPrt-R80.20/jars/cxf-rt-databinding-aegis-3.1.0.jar:/opt/CPrt-R80.20/jars/cxf-rt-databinding-jaxb-3.1.0.jar:/opt/CPrt-R80.20/jars/cxf-rt-frontend-jaxws-3.1.0.jar:/opt/CPrt-R80.20/jars/cxf-rt-frontend-simple-3.1.0.jar:/opt/CPrt-R80.20/jars/cxf-rt-javascript-3.1.0.jar:/opt/CPrt-R80.20/jars/cxf-rt-transports-http-3.1.0.jar:/opt/CPrt-R80.20/jars/cxf-rt-transports-http-jetty-3.1.0.jar:/opt/CPrt-R80.20/jars/cxf-rt-ws-addr-3.1.0.jar:/opt/CPrt-R80.20/jars/cxf-rt-ws-policy-3.1.0.jar:/opt/CPrt-R80.20/jars/cxf-rt-wsdl-3.1.0.jar:/opt/CPrt-R80.20/jars/cxf-tools-common-3.1.0.jar:/opt/CPrt-R80.20/jars/cxf-tools-java2ws-3.1.0.jar:/opt/CPrt-R80.20/jars/cxf-tools-validator-3.1.0.jar:/opt/CPrt-R80.20/jars/cxf-tools-wsdlto-core-3.1.0.jar:/opt/CPrt-R80.20/jars/cxf-tools-wsdlto-databinding-jaxb-3.1.0.jar:/opt/CPrt-R80.20/jars/cxf-tools-wsdlto-frontend-jaxws-3.1.0.jar:/opt/CPrt-R80.20/jars/java_is.jar:/opt/CPrt-R80.20/jars/java_sic.jar:/opt/CPrt-R80.20/jars/jaxb-xjc-2.2.11.jar:/opt/CPrt-R80.20/jars/jetty_assist.jar:/opt/CPrt-R80.20/jars/stax2-api-3.1.4.jar:/opt/CPrt-R80.20/jars/woodstox-core-asl-4.4.1.jar:/opt/CPrt-R80.20/jars/wsdl4j-1.6.3.jar:/opt/CPrt-R80.20/jars/xmlschema-core-2.2.1.jar:/opt/CPsuite-R80.20/fw1/cpm-server/jackson-annotations-2.5.0.jar:/opt/CPsuite-R80.20/fw1/cpm-server/jackson-core-2.5.0.jar:/opt/CPsuite-R80.20/fw1/cpm-server/jackson-databind-2.5.0.jar: -Dsolr.log=/opt/CPrt-R80.20/log/solr.log -jar start.jar /opt/CPrt-R80.20/conf/jetty.xml

Debugging policy installation in general, this is probably a good SK to start with: 'Installation failed. Reason: Load on Module failed - failed to load security policy' erro... 

In general, the problem usually isn't CPTA.

0 Kudos

Re: Troubleshooting policy installation

A million thanks for the detailed response Dameon!

Could you also please tell me what the atomic load actually is? I keep coming across it throughout this documentation without finding a definition.

0 Kudos

Re: Troubleshooting policy installation

I'll take a shot at this, the "Atomic load" of the policy on the gateway is also called the "commit" in some of Check Point's other documentation.  The term atomic in computer science means a non-interruptible operation that cannot ever be preempted by something else; other elements (drivers, programs, packets being routed, etc) of a system "see" the atomic operation appearing to complete instantaneously since they cannot interrupt it.  I assume in the case of a policy load this term means no packets can move through the firewall and must be queued while the atomic policy load is in progress.  The sequence according to my understanding:

1) After the gateway successfully receives the compiled policy via CPTA, it performs extra sanity checks on the compiled policy to ensure it is not about to push an invalid policy into the kernel which could be disastrous.

2) Once the sanity checks are complete, the atomic load begins and all traffic trying to pass through the gateway begins to queue and cannot pass while the atomic load is in progress.

3) The new policy is loaded into the INSPECT kernel instance(s) while traffic is still being queued; this process happens very quickly.  Chain sequences which are viewable with fw ctl chain are rebuilt, and may end up adding or removing chain modules from the sequences if blades were enabled/disabled since the last policy push. 

4) If enabled SecureXL is restarted as well and recalculates its various state tables based on the new policy.  All security server daemons running on the gateway are notified of the new policy by fwd and adjust their behavior accordingly, which could include restarting, stopping (if a feature was disabled) or starting (if a new feature was enabled).

4) At this point if "Connection Persistence" is set to "Rematch connections" on the gateway object (the default setting), a CPU-intensive rematch of all open connections against the new policy is performed to ensure that all current connections are still allowed by the new policy.  I'm not sure if this is performed in one big operation against all connections present in the connections table, or if it is performed packet-by-packet as they are released from the queues and forwarded, but this is where the atomic load ends one way or the other.

5) The rematch operation tends to be where the bulk of latency is encountered during a policy load, easily observable by a brief spike in latency if running a continuous ping.  On an overloaded firewall if this rematch operation takes too long, the queues can overflow and packet loss occurs.  RX-DRPs can occur as well during this period, due to the high CPU load incurred by the rematch process interfering with the timely emptying of interface ring buffers.

6) Once the rematch is complete, traffic flows normally.

--
Second Edition of my "Max Power" Firewall Book
Now Available at http://www.maxpowerfirewalls.com

"IPS Immersion Training" Self-paced Video Class
Now Available at http://www.maxpowerfirewalls.com

Re: Troubleshooting policy installation

Well, that is a definition which is definitely going to be copied and pasted somewhere! Thanks Tim!

0 Kudos

Re: Troubleshooting policy installation

Based on the valuable contributions of Tim and Dameon, along with the information provided by sk101226, sk115557 and the 3 Policy Installation videos of the Support Channel on youtube, I have come up with an admin guide which can be found below. The only reason I did that is because every one of the sources above contained snippets of information that, in my opinion, would have to be combined together in order for the entire process to make more sense. I know I'm beating this topic to death a bit but I'd rather consolidate my understanding of this procedure now rather than moving on without actually grasping the steps involved. 

As such, I would really appreciate any feedback I could get about the guide below and please feel free to either add or modify it:

Policy Installation Process Flow

 

When installing policy either from SmartConsole or the management API, the installation command is sent to the CPM daemon on the management server via Web Service. The CPM daemon listens on port 19009 (while legacy SmartDashboard is still running in the background and connects to FWM using port 18190). The CPM daemon dumps all relevant information from the PostgreSQL and SOLR databases into file format, a process which is known as “database dump”.

The information is then forwarded to the FWM daemon via the CPMI protocol on port 9009. The FWM daemon in turn carries out several tasks:

  • Verification - The information in the database is verified to comply with a number of rules specific to the application and package, for which policy installation is requested. If this verification fails, the process ends here and an error message is passed to the initiator.
  • Conversion – The database dump is converted into a new file format which is then placed inside the $FWDIR/conf/gw_policies/ACCESS_BLADE/ directory. After conversion, FWM invokes fw_loader to perform code generation and compilation.
  • Code generation and compilation - Policy is translated to the INSPECT language and compiled with the INSPECT compiler. The result of the code generation is a long string, containing resulting INSPECT source code. Once the compilation process is complete, a copy of the policy is then created inside the $FWDIR/state/<Gateway_Name> state directory.
  • The CPTA command is then invoked in order to send the policy to all applicable gateways. The FWM daemon sends its SIC certificate to the gateway which is received by the latter’s CPD daemon on $CPDIR/conf/sic_cert.p12. SIC is negotiated, and the policy files are then transferred to the Security Gateway which are received by the CPD daemon again that is listening on port 18191 for install policy connections. The CPD daemon receives the policy, verifies its integrity and then stores it into the Gateway’s local.upDBsqlite database. After the transfer from the Security Manager, the policy files are then kept inside the Gateway’s $FWDIR/state/_tmp/FW1/ directory.

The Commit/Atomic Load process is then initiated:

  1. After the gateway successfully receives the compiled policy via CPTA, it performs extra sanity checks on the compiled policy to ensure it is not about to push an invalid policy into the kernel which could be disastrous.
  2. Once the sanity checks are complete, the atomic load begins and all traffic trying to pass through the gateway begins to queue and cannot pass while the atomic load is in progress.
  3. The “Fw fetchlocal -d $FWDIR/state/_tmp/FW1” command is invoked to load the new policy into the INSPECT kernel instance(s) while traffic is still being queued; this process happens very quickly.  Chain sequences which are viewable with fw ctl chain are rebuilt, and may end up adding or removing chain modules from the sequences if blades were enabled/disabled since the last policy push. 
  4. If enabled SecureXL is restarted as well and recalculates its various state tables based on the new policy.  All security server daemons running on the gateway are notified of the new policy by fwd and adjust their behavior accordingly, which could include restarting, stopping (if a feature was disabled) or starting (if a new feature was enabled).
  5. At this point if "Connection Persistence" is set to "Rematch connections" on the gateway object (the default setting), a CPU-intensive rematch of all open connections against the new policy is performed to ensure that all current connections are still allowed by the new policy.
  6. CPD waits for fw_fetchlocal to complete the process and, if run successfully, CPD saves policy in the gateway’s $FWDIR/state/local/FW1/ permanent directory and it informs the Security Management Server of the installation command’s status which then performs a DB Load with the new info. The traffic queue is released and all of the packets are handled by the new security policy. If the policy installation fails, CPD sends an error to the server (Policy Load on module failed, Reason:…) and the event is recorded inside the $FWDIR/log/install_policy.elg file.

 

Re: Troubleshooting policy installation

Nice summary!  A few extra notes on the commit/atomic load:

Step 1: If the extra sanity checks fail it can cause the frustratingly generic "failed to load security policy on gateway/load on module failed" error, this can also happen in step 3 if there is insufficient free RAM for the commit operation.  The gateway kernel cannot use virtual/paged memory at all and must have enough free RAM to complete the commit operation.  There are quite a few SKs for troubleshooting this condition but I feel the best ones are sk84700 and sk33893.

Step 4: The full restart of SecureXL to recalculate its tables no longer happens during a R80.20 gateway commit, which should help substantially reduce the brief spike in latency incurred by a policy push.

If the very end of the policy load operation seems to hang for awhile in the SmartConsole at Finalizing Installation (99%),  the gateway is already done at that point and the SMS is just updating the log server with the resolved objects so that logs will show Check Point objects rather than IP’s, ports etc.  Please see Tomer Sole‌'s response in this thread for more info about this:

https://community.checkpoint.com/message/12847-re-policy-installation-stages?commentID=12847#comment... 

--
Second Edition of my "Max Power" Firewall Book
Now Available at http://www.maxpowerfirewalls.com

"IPS Immersion Training" Self-paced Video Class
Now Available at http://www.maxpowerfirewalls.com

Re: Troubleshooting policy installation

Thanks Tim!

0 Kudos