- Products
- Learn
- Local User Groups
- Partners
-
More
Celebrate the New Year
With CheckMates!
Value of Security
Vendor Self-Awareness
Join Us for CPX 360
23-24 February 2021
Important certificate update to CloudGuard Controller, CME,
and Azure HA Security Gateways
How to Remediate Endpoint & VPN
Issues (in versions E81.10 or earlier)
Mobile Security
Buyer's Guide Out Now
Important! R80 and R80.10
End Of Support around the corner (May 2021)
Last sunday (22.09) we upgraded firmware on our Checkpoint 5400 to v.80.30 and this night (26.09) this device has stopped to respond. As we see on our monitoring software the device stopped to respond to Ping at 23:23 (local time), same time it had less than 1% of free physical memory. At 1:00 the device back online by itself with a 7% of free physical memory, and next we manually rebooted it at 2:40 with a 75% of memory free.
So, all pointing to a memory leak on this device after the upgrade, because no any problem with any another device part (like CPU or other).
Product version Check Point Gaia R80.30
OS build 200 OS kernel version 2.6.18-92cpx86_64 OS edition 64-bit
Our devices configuration:
1) Two Checkpoint 5400 in HA mode
2) One node has 80.10, another 80.30
3) Node with 80.30 as Active Node
4) Services on 80.10 are stopped
In a clip:
1. Information from the monitoring system
Have you encountered a similar problem? How did you decide?
2. Logs -/ var/log/messages
Sep 27 00:59:00 2019 CPGW-1 kernel: [fw4_0];[192.168.0.122:43493 -> 178.140.2.238:443] [ERROR]: network_classifier_get_zone_by_ifnum: Failed to get ifindex for ifnum=-1
Sep 27 00:59:00 2019 CPGW-1 kernel: [fw4_0];[192.168.0.122:43493 -> 178.140.2.238:443] [ERROR]: network_classifier_notify_clob_by_ifnum: network_classifier_get_zone_by_ifnum failed
Sep 27 00:59:00 2019 CPGW-1 kernel: [fw4_0];[192.168.0.122:43493 -> 178.140.2.238:443] [ERROR]: network_classifier_notify_clob_by_dst_route: network_classifier_notify_clob_by_ifnum failed
Sep 27 00:59:00 2019 CPGW-1 kernel: [fw4_0];[192.168.0.122:43493 -> 178.140.2.238:443] [ERROR]: network_classifier_notify_clob_for_not_incoming_conn: network_classifier_notify_clob_by_dst_route failed
Sep 27 00:59:00 2019 CPGW-1 kernel: [fw4_0];[192.168.0.122:43493 -> 178.140.2.238:443] [ERROR]: network_classifiers_destination_zone_handle_post_syn_context: network_classifier_notify_clob_for_not_incoming_conn failed
Sep 27 00:59:00 2019 CPGW-1 kernel: [fw4_0];[192.168.0.122:43493 -> 178.140.2.238:443] [ERROR]: network_classifier_cmi_handler_match_cb: network_classifiers_destination_zone_handle_post_syn_context failed
Sep 27 00:59:00 2019 CPGW-1 kernel: [fw4_0];[192.168.0.122:43493 -> 178.140.2.238:443] [ERROR]: cmik_loader_fw_context_match_cb: match_cb for CMI APP 20 failed on context 359, executing context 366 and adding the app to apps in exception
Sep 27 00:59:00 2019 CPGW-1 kernel: [fw4_0];[192.168.0.122:43493 -> 178.140.2.238:443] [ERROR]: up_manager_cmi_handler_match_cb: connection not found
Sep 27 00:59:00 2019 CPGW-1 kernel: [fw4_0];[192.168.0.122:43493 -> 178.140.2.238:443] [ERROR]: up_manager_cmi_handler_match_cb: rc FALSE - rejecting conn [192.168.0.122:43493 -> 178.140.2.238:443, IPP 6]
Sep 27 00:59:00 2019 CPGW-1 kernel: [fw4_0];[192.168.0.122:43493 -> 178.140.2.238:443] [ERROR]: up_rulebase_should_drop_possible_on_SYN: conn dir 0, 192.168.0.122:43493 -> 178.140.2.238:443, IPP 6 required_4_match = 0x802, not expected required_4_match = 0x800
Sep 27 00:59:00 2019 CPGW-1 kernel: [fw4_1];mux_buf_create: ERROR: Failed allocate Mux buf.
Sep 27 00:59:00 2019 CPGW-1 kernel: [fw4_1];mux_write_raw_data: ERROR: Failed to create Mux buf.
Sep 27 00:59:00 2019 CPGW-1 kernel: [fw4_1];tls_mux_write: mux_write_raw_data failed
Sep 27 00:59:00 2019 CPGW-1 kernel: [fw4_1];mux_task_handler: ERROR: Failed to handle task. task=ffffc2003cf70e40, app_id=1, mux_state=ffffc20043256a50.
Sep 27 00:59:00 2019 CPGW-1 kernel: [fw4_1];mux_read_handler: ERROR: Failed to handle task queue. mux_opaque=ffffc20043256a50.
Sep 27 00:59:00 2019 CPGW-1 kernel: [fw4_1];mux_active_read_handler_cb: ERROR: Failed to forward data to Mux.
Sep 27 00:59:00 2019 CPGW-1 kernel: [fw4_1];[192.168.218.39:65323 -> 192.168.0.6:53] [ERROR]: cmik_loader_fw_context_match_cb: failed to allocate s_cmik_loader_match_params
Sep 27 00:59:00 2019 CPGW-1 kernel: [fw4_1];cmi_context_exec_from_non_stream: cmik_loader_fw_context_match_cb(context=352, app_id = -1, context_apps=15c0004) failed
Sep 27 00:59:00 2019 CPGW-1 kernel: [fw4_1];[192.168.218.39:65323 -> 192.168.0.6:53] [ERROR]: up_manager_fw_handle_first_packet: cmi_exec_from_first_packet() failed
Sep 27 00:59:00 2019 CPGW-1 kernel: [fw4_1];[192.168.218.39:65323 -> 192.168.0.6:53] [ERROR]: up_manager_fw_handle_first_packet: failed to execute first packet context
Sep 27 00:59:00 2019 CPGW-1 kernel: [fw4_1];mux_buf_create: ERROR: Failed allocate Mux buf.
Sep 27 00:59:00 2019 CPGW-1 kernel: [fw4_1];mux_write_raw_data: ERROR: Failed to create Mux buf.
Sep 27 00:59:00 2019 CPGW-1 kernel: [fw4_1];tls_mux_write: mux_write_raw_data failed
Sep 27 00:59:00 2019 CPGW-1 kernel: [fw4_1];mux_task_handler: ERROR: Failed to handle task. task=ffffc2003cf70e40, app_id=1, mux_state=ffffc2019cbca6f0.
Sep 27 00:59:00 2019 CPGW-1 kernel: [fw4_1];mux_read_handler: ERROR: Failed to handle task queue. mux_opaque=ffffc2019cbca6f0.
Sep 27 00:59:00 2019 CPGW-1 kernel: [fw4_1];mux_active_read_handler_cb: ERROR: Failed to forward data to Mux.
Sep 27 00:59:00 2019 CPGW-1 kernel: [fw4_0];FW-1: h_getvals: fw_kmalloc (496) failed
Sep 27 00:59:00 2019 CPGW-1 kernel: [fw4_1];tcp_input: failed to alloc pkt buf at line :1259
Sep 27 00:59:01 2019 CPGW-1 kernel: [fw4_0];FW-1: h_getvals: fw_kmalloc (496) failed
Sep 27 00:59:01 2019 CPGW-1 kernel: [fw4_1];pslip_get_buf: failed to alloc packet_buf
Sep 27 00:59:01 2019 CPGW-1 kernel: [fw4_1];psl_handle_packet: psl_allocate_packet_buf failed, len=264
Sep 27 00:59:01 2019 CPGW-1 kernel: [fw4_0];cpaq_cbuf_alloc_rcv_buf_info: buf_id=88362620 unable to allocate buffer sz=1712
Sep 27 00:59:01 2019 CPGW-1 kernel: [fw4_0];cphwd_handle_send_cphwd_stats: NULL cphwd_stats_buf buffer
Sep 27 00:59:01 2019 CPGW-1 kernel: [fw4_0];mux_write_raw_data: ERROR: Failed to allocate buf data.
Sep 27 00:59:01 2019 CPGW-1 kernel: [fw4_0];tls_mux_write: mux_write_raw_data failed
Sep 27 00:59:01 2019 CPGW-1 kernel: [fw4_0];mux_task_handler: ERROR: Failed to handle task. task=ffffc2003b40a370, app_id=1, mux_state=ffffc200417ca8a0.
Sep 27 00:59:01 2019 CPGW-1 kernel: [fw4_0];mux_read_handler: ERROR: Failed to handle task queue. mux_opaque=ffffc200417ca8a0.
Sep 27 00:59:01 2019 CPGW-1 kernel: [fw4_0];mux_active_read_handler_cb: ERROR: Failed to forward data to Mux.
Sep 27 00:59:01 2019 CPGW-1 kernel: [fw4_0];mux_write_raw_data: ERROR: Failed to allocate buf data.
Sep 27 00:59:01 2019 CPGW-1 kernel: [fw4_0];tls_mux_write: mux_write_raw_data failed
Sep 27 00:59:01 2019 CPGW-1 kernel: [fw4_0];mux_task_handler: ERROR: Failed to handle task. task=ffffc2003b40a4b0, app_id=1, mux_state=ffffc2003822b1e0.
Sep 27 00:59:01 2019 CPGW-1 kernel: [fw4_0];mux_read_handler: ERROR: Failed to handle task queue. mux_opaque=ffffc2003822b1e0.
Sep 27 00:59:01 2019 CPGW-1 kernel: [fw4_0];mux_active_read_handler_cb: ERROR: Failed to forward data to Mux.
Sep 27 00:59:01 2019 CPGW-1 kernel: [fw4_1];mux_write_raw_data: ERROR: Failed to allocate buf data.
Sep 27 00:59:01 2019 CPGW-1 kernel: [fw4_1];tls_mux_write: mux_write_raw_data failed
Sep 27 00:59:01 2019 CPGW-1 kernel: [fw4_1];mux_task_handler: ERROR: Failed to handle task. task=ffffc20052afe1b0, app_id=1, mux_state=ffffc2001e526c00.
Sep 27 00:59:01 2019 CPGW-1 kernel: [fw4_1];mux_read_handler: ERROR: Failed to handle task queue. mux_opaque=ffffc2001e526c00.
Sep 27 00:59:01 2019 CPGW-1 kernel: [fw4_1];mux_active_read_handler_cb: ERROR: Failed to forward data to Mux.
Going through something similar with new Open servers on R80.30. Memory gradually starts increasing to the point where the cluster fails over. TAC is having us run a memory leak test, which I will be doing over the weekend and having it run until the memory runs out again.
Ryan
Are you using R80.30 vanilla, or have you loaded any Jumbo HFAs? Do you have Priority Queues turned on? See these:
sk149413: R80.20 / R80.30 Security Gateway freezes when Priority Queue is enabled
Based on the logs you provided it looks like possibly a kernel memory leak, see this SK:
sk35496: How to detect a kernel memory leak on Security Gateway with SecurePlatform OS / Gaia OS
As stated before it looks like a memory leak.
There are some posts in the last weeks regarding R80.30 and R80.20 and similar behaviors.
I'm really eager to see which is the RCA of these memory leaks. Did you opened a SR with the TAC?
Also as advised, install the latest JHF (Take 50)
Hello!
We have a similar issue.
Node 1 openserverhardware - HP Proliant DL380G8p - Fresh installation R80.30 with Take 50 and deactivated priority queue -> off
Node 2 openserverhardware - HP Proliant DL380G8p - R80.10 (cphastop)
Installation R80.30 Sep. 26th 2019
Today Oct. 2nd 2019 in the early moring -> FW Kernel - Total 24.012 -> Used MB 24.012 -> Free 0
PRTG monitoring -> see attached file
To time we started a leak_detection on this R80.30 node and watch additionally whit the cli-command
while true; do ps -eo pid,tid,class,rtprio,stat,vsz,rss,comm --sort vsz; sleep 1; done
how much memory is consumed and compare it
Hello.
TAC gave a hotfix, we put it on this weekend. While the flight is normal, but a little time has passed.
It seems like a problem in the IPS, but this is not accurate.
I'll keep you informed
Hello! We have still the problem that the memory run full. Between we have sent two cpinfos of the leakdetection to checkpoint, but still no answer.
Between I manually switch daily the checkpointservices from clusternode 1 to 2 or 2 to 1 and restart the machine after the switch.
It is goot to hear, that checkpoint have a fix for you. Maybe we get also a fix in the next days. Please can you tell me, if the fix help to solve the problem.
Thanks
The patch did not help solve the problem, continue to understand
Hello,
we received this patch "fw1_wrapper_HOTFIX_R80_30_JHF_T111_010_MAIN_GA_FULL.tgz".
it helped to reduce the memory leaking by some percentage, but we encountered firewall crashes when we run NMAP scans...
the active member gets totaly unresponsive when we scanned the firewall with a "low nmap portscan"
is anybody else facing the same issues?
we are still investigating with TAC to find a solution.
best regards
Thomas.
We’ve had the same issue for a few months with a 5400 after upgrading to r80.30. Check Point support has been utterly useless with helping us and almost impossible to get ahold of sometimes. We upgraded from r80.20 to resolve an ssl issue which also took several months to find out was a bug that had no timeframe for a fix and was resolved in r80.30. Our 3200 don’t have this issue which is strange because they aren’t as powerful.
Has anyone had any luck resolving this? We’re a couple of days from tossing CP in the garbage and going with another solution. I can’t believe how bad support has gotten.
@Greg_Mandiola hi,
Your feedback is extremely valuable. I did not find any open SRs with the symptoms you have described. Please send me your SR via a personal message, thanks
Val
@Greg_Mandiola received and answered.
Hello
Waiting for a hotfix from R&D. I hope he helps.
Hello folks,
Have you got any update on this subject from CP?
Namely, we are having the same problem on 15600 Cluster with R80.30 Take 111 (downgraded from Take 140 on CP recomm.).
There is the case on CP sitting for us, but so far, there is no resolution.
Anything new on your side?
Thanks!
Hello
Similar problems in a customer. After upgrade to R80.20+JHF140
A memory peak related to Fw Kernel and Kmem Blocking.
Reviewing traffic in interfaces we see a strange pattern: a big increase in TX traffic in External interface but not reflected in other interface, so it seems not to be crossinf the firewall, but instead being generated by the fw
Have you had any reposnse from TAC? I have a ticket opened.
I wish it were.
But we have already 32GB Ram and getting 0 Free Fw Kernel Memory when the issue arises
Normally yes, around 80-85% Free Memory
We are still waiting for R&D. They narrowed it down to a kernel table called static_hh_dst_cache. It appears to consume all of the RAM until it is exhausted.
Ryan
About CheckMates
Learn Check Point
Advanced Learning
WELCOME TO THE FUTURE OF CYBER SECURITY