Messages of mux error on a cluster (active-standby...

Patricio_Gavila · ‎2019-06-13

Hi all,

I have a Lenovo System x3650 M5 (compatibility matrix) with GAIA r80.20 (jumboHF take 80) in distributed deployment. The server firmware is updated to the last level, and with the r77.30 version works great. I have many problems with the Internet, for example, images and Office 365 emails take too long to load, even when the user is in an unrestricted rule. This did not happen with r77.30. In active Gateway shows error messages in file /var/log/messages:

Jun 12 14:19:57 2019 FW-NODO1 kernel: [fw4_4];mux_task_handler: ERROR: Failed to handle task. task=ffffc20085221670, app_id=1, mux_state=ffffc20092970c00.
Jun 12 14:19:57 2019 FW-NODO1 kernel: [fw4_4];mux_soc_result_handler: ERROR: Failed to handle task queue. mux_opaque=ffffc20092970c00.
Jun 12 14:19:57 2019 FW-NODO1 kernel: [fw4_4];tls_main_send_record_layer_message: mux_soc_result_handler failed
Jun 12 14:19:58 2019 FW-NODO1 kernel: [fw4_4];mux_task_handler: ERROR: Failed to handle task. task=ffffc2008275e530, app_id=1, mux_state=ffffc2005f6a5c00.
Jun 12 14:19:58 2019 FW-NODO1 kernel: [fw4_4];mux_soc_result_handler: ERROR: Failed to handle task queue. mux_opaque=ffffc2005f6a5c00.
Jun 12 14:19:58 2019 FW-NODO1 kernel: [fw4_4];tls_main_send_record_layer_message: mux_soc_result_handler failed
Jun 12 14:19:58 2019 FW-NODO1 kernel: [fw4_4];mux_task_handler: ERROR: Failed to handle task. task=ffffc2011e77b7b0, app_id=1, mux_state=ffffc200d97bfc00.
Jun 12 14:19:58 2019 FW-NODO1 kernel: [fw4_4];mux_soc_result_handler: ERROR: Failed to handle task queue. mux_opaque=ffffc200d97bfc00.
Jun 12 14:19:58 2019 FW-NODO1 kernel: [fw4_4];tls_main_send_record_layer_message: mux_soc_result_handler failed
Jun 12 14:19:59 2019 FW-NODO1 kernel: [fw4_3];mux_task_handler: ERROR: Failed to handle task. task=ffffc200a775bfb0, app_id=1, mux_state=ffffc2027cc1a420.
Jun 12 14:19:59 2019 FW-NODO1 kernel: [fw4_3];mux_soc_result_handler: ERROR: Failed to handle task queue. mux_opaque=ffffc2027cc1a420.
Jun 12 14:19:59 2019 FW-NODO1 kernel: [fw4_3];tls_main_send_record_layer_message: mux_soc_result_handler failed
Jun 12 14:19:59 2019 FW-NODO1 kernel: [fw4_3];mux_task_handler: ERROR: Failed to handle task. task=ffffc200aa947b30, app_id=1, mux_state=ffffc200dffa5810.
Jun 12 14:19:59 2019 FW-NODO1 kernel: [fw4_3];mux_soc_result_handler: ERROR: Failed to handle task queue. mux_opaque=ffffc200dffa5810.
Jun 12 14:19:59 2019 FW-NODO1 kernel: [fw4_3];tls_main_send_record_layer_message: mux_soc_result_handler failed
Jun 12 14:20:00 2019 FW-NODO1 kernel: [fw4_2];mux_task_handler: ERROR: Failed to handle task. task=ffffc2007f670b30, app_id=1, mux_state=ffffc200c6950420.
Jun 12 14:20:00 2019 FW-NODO1 kernel: [fw4_2];mux_soc_result_handler: ERROR: Failed to handle task queue. mux_opaque=ffffc200c6950420.
Jun 12 14:20:00 2019 FW-NODO1 kernel: [fw4_2];tls_main_send_record_layer_message: mux_soc_result_handler failed
Jun 12 14:20:01 2019 FW-NODO1 kernel: [fw4_5];mux_task_handler: ERROR: Failed to handle task. task=ffffc20122ccdb70, app_id=1, mux_state=ffffc20068218810.
Jun 12 14:20:01 2019 FW-NODO1 kernel: [fw4_5];mux_soc_result_handler: ERROR: Failed to handle task queue. mux_opaque=ffffc20068218810.
Jun 12 14:20:01 2019 FW-NODO1 kernel: [fw4_5];tls_main_send_record_layer_message: mux_soc_result_handler failed
Jun 12 14:20:02 2019 FW-NODO1 kernel: [fw4_5];cpas_newconn_ex : called upon something other than tcp SYN. Aborting

My question is if anyone knows if it is possible to deactivate the mux?. Otherwise I will rollback to r77.30.

My concern is: because Check Point sells a poorly tested product and even more wants to force customers to migrate from r77.30 to r80, knowing that the r77.30 version is the best they have had in many years. The r80 version has too many problems, but even in cluster, the truth is impressive the failures of the product.

Thanks,

Patricio G.

_Val_ · ‎2019-06-14

Did you open an SR with TAC?

Patricio_Gavila · ‎2019-06-14

Dear Valeri,

TAC solutions is:

"Please try with the JHF ongoing Take_80. sk137592
There were improvements in that JHF and that error is covered in that patch.
First try this option, before going down to the R80.10 version, as they told me they have the intention to try."

Unacceptable for a production environment.

Regards,

Patricio

PhoneBoy · ‎2019-06-14

Perhaps there is something in this hotfix, but it's not obvious from the release notes.
I'll contact you privately to get the SR number here.

PhoneBoy · ‎2019-06-17

I double-checked with TAC and we indeed integrated a fix for a different customer into R80.20 JHF 80 for a similar issue.

G_W_Albrecht · ‎2019-06-14

My question is: How many production systems have you been upgrading from R77.30 to R80.20 yet ? Or is it this one installation only that makes you so sure this is a poorly tested product ?

Most customers have had some troubles during this transition as a lot is changed at the core and much is working better now - but mostly, these issues have their reason in poor configuration, as you have to revise a lot for R80.20 !

CCSP - CCSE / CCTE / CTPS / CCME / CCSM Elite / SMB Specialist

Patricio_Gavila · ‎2019-06-14

Dear,

With my IT team we did a previous migration to test the r80.20 version in an isolated environment, and everything went well in the tests, but when sending to production it shows the recurring errors in /var/log/messages. The testing environment worked for a month without problems. The physical servers are exactly the same in brand, model and PCI cards. There was no reason to think that there are so many problems in the production environment. I work in a finance company and therefore we can not be testing the production environment with risks.

Regards,

Patricio G.

_Jelle · ‎2019-08-19

@Patricio_Gavila are you still facing the errors? Or did TAC provide you with a solution? We face the same issues. Also we are struggling with memory leaks in R80.20 and R80.30...

Thomas_Eichelbu · ‎2019-09-26

Hello,

are they any news about this we see this on a fresh installed R80.30 Take 50 on an openserver ...

best regards
Thomas.

phlrnnr · ‎2020-02-17

I currently have a TAC case open for a similar issue on R80.30 / Jumbo 111.

I see this in the messages log (repeated continually when HTTPS Inspection is enabled. If HTTPS inspection is disabled, these messages go away):

Feb 14 11:49:16 2020 <removed> kernel: [fw4_10];tls_main_handle_ingress: malformed alert:
Feb 14 11:49:16 2020 <removed> kernel: [fw4_10]; 0: <00 00 00 00 00 00 00 01 d2 5f 8a fd b3 ac ed f4 ........._......
Feb 14 11:49:16 2020 <removed> kernel: [fw4_10]; 16: 0f 50 49 39 a7 d3 8b eb 0c 06> .PI9......
Feb 14 11:49:16 2020 <removed> kernel: [fw4_10];
Feb 14 11:49:16 2020 <removed> kernel: [fw4_10];mux_task_handler: ERROR: Failed to handle task. task=ffffc202095362b0, app_id=1, mux_state=ffffc2012eabc6f0.
Feb 14 11:49:16 2020 <removed> kernel: [fw4_10];mux_read_handler: ERROR: Failed to handle task queue. mux_opaque=ffffc2012eabc6f0.
Feb 14 11:49:16 2020 <removed> kernel: [fw4_10];mux_active_read_handler_cb: ERROR: Failed to forward data to Mux.

TAC says these messages are cosmetic and there is a hotfix that can be applied to get rid of the error messages.

phlrnnr · ‎2020-02-20

I've also been told by TAC that the fix is included in R80.30 JHFA 140, even though it is not listed in the 'resolved issues' section.

Kathleen_Murphy · ‎2020-09-02

We are on R80.30 JHF215 and still seeing this same syslog message.

abihsot__ · ‎2023-10-04

Did this go away at any incarnation of the gateway? These messages are still present on R81.10. It is long way since R80.20 or R80.30 mentioned in this thread.

Tal_Paz-Fridman · ‎2023-10-04

Can you paste the exact messages you are receiving.

Thanks

abihsot__ · ‎2023-10-04

Thanks for reply. For an untrained eye they look the same. R81.10 latest JHF

Oct 4 15:51:05 2023 HOSTNAME kernel: [fw4_2];ws_mux_host_only_active_finalize_read_handler: ERROR: stream[1] is empty. mux_stat=1.
Oct 4 15:51:05 2023 HOSTNAME kernel: [fw4_2];ws_mux_read_handler_from_main_ex: ERROR: Finalize callback failed. cdir=2, mux_stat=1.
Oct 4 15:51:05 2023 HOSTNAME kernel: [fw4_2];ws_mux_read_handler_from_main: ERROR: Failed to call read handler. ws_connection=ffffc90082a25370.
Oct 4 15:51:05 2023 HOSTNAME kernel: [fw4_2];mux_task_handler: ERROR: Failed to handle task. task=ffffc90096862558, app_id=4 (WS), mux_state=ffffc9008e213030, curr_side 0, prev_side 0.
Oct 4 15:51:05 2023 HOSTNAME kernel: [fw4_2];mux_read_handler: ERROR: Failed to handle task queue. mux_opaque=ffffc9008e213030.
Oct 4 15:51:05 2023 HOSTNAME kernel: [fw4_2];mux_active_read_handler_cb: ERROR: Failed to forward data to Mux.

enabled_blades
fw urlf appi identityServer SSL_INSPECT content_awareness mon

Tal_Paz-Fridman · ‎2023-10-10

Hi again,

I've talked to to the relevant owner in R&D owner and we are aware of this issue. It will be handled in upcoming JHFs.

Thanks

abihsot__ · ‎2023-10-10

Thanks for an update! You saved me one support ticket 🙂

Are you a member of CheckMates?

Messages of mux error on a cluster (active-standby) in r80.20