Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
support_suppor1
Participant

Message seen on /var/log/messages - "simi_reorder_enqueue_packet"

Hi there guys, I'm seeing this message  "simi_reorder_enqueue_packet" on /var/log/messages. Is this an indication traffic congestion? My network is  momentarily encountering intermittent application connectivity especially on VOIP. As usual, no drops are seen on tracker and zebug. 

Hope someone had encountered this.

25 Replies
PhoneBoy
Admin
Admin

It seems to suggest network congestion/fault of some sort from the few TAC cases opened on this error message.

It's probably worth opening one so we can investigate more closely.

Daniel_Westlund
Collaborator
Collaborator

I'll pass on my experience with this error in case it helps someone in the future.  Pushed out inline policy rules on Wednesday.  On Friday, DHCP broke.  It was going through the firewall, and we found the below in /var/log/messages and zdebug:

[Expert@CPFW2:0]# less /var/log/messages

Jan 18 12:41:38 2019 CPFW2 kernel: [SIM4];simi_reorder_enqueue_packet: reached the limit of maximum enqueued packets for conn:<*.*.*.*,67,*.*.*.*,67,17>, fw_key:<*.*.*.*,67,*.*.*.*,67,17> !

Before the inline policy, DHCP request were going through the firewall on a rule with DHCP specificed in the service, and replies were coming back on UDP 68 on a rule with Any as Dest and Service.  When that rule was changed to inline, I can look back in the logs and no longer see the UDP 68 replies.  Not sure why it didn't log on that Any rule because there was any Any allow inline app control rule at the bottom of it.  Anyway, after 1 and 1/2 days of UDP 68 traffic not getting through, this simi_reorder queue got full?

We made a return rule specifying udp 68, with Accept and no inline layers, and pushed it out.  That didn't fix it.  We had to uncheck UDP 67 and 68 from Sync, push policy, and force a failover.  That forced DHCP into a new connection, and then we could see logs from the UDP 68 DHCP reply traffic (from DHCP servers back to wireless controllers).  Don't know if that will permanently solve the issue or not.

Asking Check Point TAC for info about this queue, if there's a way to check it, and clear it out.  What it does, and why this happened.  I've got nothing back so far.

Thomas_Pataki
Participant

Hi there,

currently we have this issue too. It came up after a migration from R77.30 to R80.20 with latest ongoing jumbo. In our case its affecting CIFS transfers(but also other protocols). Instead of getting near line speed transfers (1GB/s) we have about 200Mbits/s , BUT: during high traffic CPU goes high on 2-3 cores to 100%, thus, IPS starts to bypass, etc. Currently a ticket is open, not solved actually...

Thomas_Pataki
Participant

interesting: when the cpu load is high, there is a process called: "cphwd_q_init_ke", which consumes 100% on one core. Anyone aware about this one? The other "cpu core eaters" are fw_worker_x processes...

enanosca
Explorer

I;m also seeing this behavior but do not find any info on docs. were you able to find anything related?

PID PPID CMD %MEM %CPU
5153 1 [fw_worker_1] 0.0 59.8
5154 1 [fw_worker_2] 0.0 59.2
5152 1 [fw_worker_0] 0.0 59.2
5156 1 [fw_worker_4] 0.0 59.0
5157 1 [fw_worker_5] 0.0 58.8
5155 1 [fw_worker_3] 0.0 55.7
7059 1 [cphwd_q_init_ke] 0.0 41.9
7058 1 [cphwd_q_init_ke] 0.0 9.1

0 Kudos
ChammiK
Participant

What does the cphwd_q_init_ke process do? I'm seeing this process going a little high at the moment.

Ole_Jakobsen
Contributor

I have created a SR with this problem and got an answer:

Put “simi_reorder_hold_udp_on_f2v=0” in simkern.conf and reboot.

 

My concern is that no one can tell me what "simi_reorder_hold_udp_on_f2v" does and what it affects. I want to know the consequences of applying this parameter. I can’t finde anything in support center or on google.

 

Do any of you know?

PhoneBoy
Admin
Admin

I'll see if I can get someone from R&D to comment.

Offhand, the name suggests it is something related to packet reordering of UDP packets prior to an acceleration decision being made.

Daniel_Westlund
Collaborator
Collaborator

I have been given this solution as well.  Have not implemented it.  I'll let you know if I have any issues.  I'll see if you find out what it's doing.

PhoneBoy
Admin
Admin

Looks like you got an answer in your SR. Smiley Happy

For everyone else reading along, I will explain.

  • Firewall and PPK (SecureXL) communication is asynchronous in R80.20 and above.
  • As a result, packets sometimes need to be held in PPK.
  • For UDP packets, it's generally ok to have some packets not arrive in the correct order.
  • This parameter tells PPK not to hold UDP packets.
Douglas_Araujo
Participant

Hi,

Any news on this? I just got the same message on a customer recently upgraded to R80.20. I had failed over to the other cluster member and resolved our problem, at least temporarily.

0 Kudos
PhoneBoy
Admin
Admin

If anyone has a TAC SR on this, please send to me in a private message.

0 Kudos
support_suppor1
Participant

Hi Sir,

You may refer to this case. Thanks. 

support_suppor1
Participant

0 Kudos
PhoneBoy
Admin
Admin

I said a private message Smiley Happy

In any case, this particular SR is waiting on you to provide debugs.

Did you happen to try the fix suggested in this thread?

genisis__
Leader Leader
Leader

I just had this on a R80.40 system with JHFA91, cause a total internet outage!  I've raised a TAC case. 

Do we have anyone from R&D able respond on this, its a pretty serious issue.

In my case we restored service after restarting the virtual switch that leads to the internet, clearly TAC will need to investigate and provide a fix ASAP.

phlrnnr
Advisor

It appears that Checkpoint has issued sk148432 for this issue.  The recommendation is to apply a hotfix.

Symptoms

  • The following drop is seen in /var/log/messages file:
    [kern];[tid_0];[SIM-];simi_reorder_enqueue_packet: reached the limit of maximum enqueued packets for conn

Cause

As part of the changes made in R80.20, SecureXL behave asynchronicaly with the Firewall. This means that Firewall and SecureXL might handle the connections separately. In order to avoid situation where SecureXL forward packets to the network before the FW finished the inspection, SecureXL will hold the packets in a queue, so we will have order for the packets.

This new mechanism is called: "simi_reorder".

In case that the queue is full, all the packets will be released and dropped. The main impact is for UDP packets, as TCP will never get the queue to be full.

Solution

Contact Check Point Support to get a Hotfix for this issue. 
A Support Engineer will make sure the Hotfix is compatible with your environment before providing the Hotfix. 
For faster resolution and verification please collect CPinfo files from the Security Management and Security Gateways involved in the case.

There are 2 SIM module kernel parameters which control this bevaviour (please refer to sk147392, on how to set the SIM parameters) :

  1. simi_reorder_max_packets
  2. simi_reorder_hold_udp_on_f2v

simi_reorder_max_packets is controlling the maximum packets which the Q can hold - it is less recommended to change it, as it will consume additional memory, depending on how much the Q is increased.

simi_reorder_hold_udp_on_f2v controls the mechanism for UDP packets, it's value could be 0 or 1. 1 will enable the simi_reorder mechanism for UDP packets, 0 will disable it, so all the packets will be sent for the network, and might have some asynchronism between FW and SecureXL.

The hotfix is changing the mechanism behavior, in a way that when the queue is full, the packets will be released and not dropped.

 

(1)
Thomas_Eichelbu
Advisor
Advisor

Hello, 

 

i have seen the same message on R80.20SP, a Maestro setup with 4 6500.
"kernel: [SIM4];simi_reorder_enqueue_packet: reached the limit of maximum enqueued packets for conn:"

CP TAC adviced me to run : fw ctl get int simi_reorder_hold_udp_on_f2v to check the values ... but this command doesnt work.

i get 
[Expert@NWATSBGFWCL01-ch01-03:0]# fw ctl get int simi_reorder_hold_udp_on_f2v
Get operation failed: failed to get parameter simi_reorder_hold_udp_on_f2v

[Expert@NWATSBGFWCL01-ch01-03:0]# fw ctl set int simi_reorder_hold_udp_on_f2v 0 -a
Set operation failed: failed to get parameter simi_reorder_hold_udp_on_f2v
set: Operation failed
Killed

[Expert@NWATSBGFWCL01-ch01-03:0]# fw ctl get int simi_reorder_hold_udp_on_f2v
Get operation failed: failed to get parameter simi_reorder_hold_udp_on_f2v

i checked this command on R80.10, R80.20 and R80.20SP, not working.
At the moment i dont see any business impact regarding this /var/log/ messages.

Did the CP TAC asked you to run any "fw ctl get"commands, what did they told you?

best regards
Thomas.

0 Kudos
Thomas_Eichelbu
Advisor
Advisor

ahh sorry i saw the TAC forget to set a "-a" after "# fw ctl get int simi_reorder_hold_udp_on_f2v"

sorry my fault now i get:

[Expert@NWATSBGFWCL01-ch01-03:0]# fw ctl get int simi_reorder_hold_udp_on_f2v -a
FW:
Get operation failed: failed to get parameter simi_reorder_hold_udp_on_f2v
PPAK 0: simi_reorder_hold_udp_on_f2v = 0
[Expert@NWATSBGFWCL01-ch01-03:0]#

so "0" is disabled mode ..

would be better to allow the reordering?

TAC told me to set "0" i will try to set it to "1" and check the outcome.

best regards
Thomas

0 Kudos
Duane_Monroe1
Participant

I am also experiencing issues related "simi"reorder"enqueue_packet".  Issue affects my Aruba Wireless APs that use UDP GRE to tunnel back to a Aruba Controller.  When policy is pushed, I have several APs reboot because the AP cannot contact the Aruba Controller.

I have applied the hotfix per SK148432.  Hotfix has improved this issue but I still see a few APs drop when I push policy.  If I run "fw ctl get int simi_reorder_hold_udp_on_f2v -a" on a gateway with applied hotfix I get "simi_reorder_hold_udp_on_f2v = 0".

Like I said, I am still having Aruba APs drop offline when I push policy which has a dramatic impact on business.  I am in communication with TAC on this issue.  Messages "simi"reorder"enqueue_packet" no longer appear in /messages.  TAC wanted to start looking at IPS but that feels like a out in left field approach.  

Anyone have thoughts on what I might be able to look at to find a smoking gun to provide to TAC?  

I assume that since simi_reorder_hold_udp_on_f2v = 0 that increasing simi_reorder_max_packets would not help?

Is there a way to monitor the SecureXL packets in queue?

Duane

 

ChammiK
Participant

I would probably look at MTU and SecureXL. Perhaps all the APs that disconnect have a larger MTU size than the rest? Or you could try to disable SecureXL for for a few of those APs to test as per sk104468.

0 Kudos
Chris_Wilson
Contributor

We have a problem that is kind of similar to yours that has to deal with our WLC's and the guest wireless network.  First the problem started with the simi_reorder_enqueue_packet error messages and when we failed over the firewall, wireless would break for us.  If we rebooted the anchor controllers behind the firewall or failed the firewall back to the original firewall, then wireless connectivity came back.  We added hotfix HOTFIX_R80_20_JHF_T103_BOA_717_MAIN, but this didn't help other than get rid of the error messages in the logs.

 

One thing I noticed this morning was if I start with FW-A active,  then switch to FW-B, is that with tcpdump running  on each firewall, FW-B shows the incoming packets going to it, but the return packets are coming back through FW-A. There are no blocked packets in logs and there is nothing in a fw ctl zdebug + drop on FW-A when these return packets are hitting it.

For another test, I tried pinging the WLC behind the firewall with a continuous ping and then did the failover, it worked as it should.  I checked arp before, during and after the failover, and no problems there.

0 Kudos
Jon_Fallon
Employee Alumnus
Employee Alumnus

Hi Duane, I'm running into a very similar issue you outlined above. Did you get any resolution to this?
0 Kudos
DT-ISP-DD
Explorer

I don't agree with "Cause", I have a similar issue with a sporadic disconnected GRE-Connection with TCP (HTTPS) packets in the tunnel. The tunnel lives sometimes longer than a week, sometimes a few hours. Failover doesn't interrupt the connection. The message is like the above:

simi_reorder_enqueue_packet: reached the limit of maximum enqueued packets for conn:<192.168.32.16,0,10.10.251.22,0,47>, fw_key:<192.168.32.16,0,10.10.251.22,0,47> !

Another strange issue is, this connection over IP proto 47 is mostly functioning, but never appears in log. Only fw monitor -F "0,0,0,0,47" or fw ctl debug -m kiss + htab_bl_infra shows the traffic.

System: R80.30 JHF215 on 5800 appliance

In the next possible maintenance window I will try to increase "simi_reorder_max_packets" to 2048 (now: 1024), maybe it helps.

 

regards

0 Kudos
FedericoMeiners
Advisor

Hello,

We had the same issue with a customer after upgrading to R80.20, TAC provided the solutions described in this post: Modify the following parameters in fwkern.conf

simi_reorder_max_packets=2048
simi_reorder_hold_udp_on_f2v=0

One very important that the TAC mentioned:

"The hotfix mentioned in the sk147392 for the errors above was supposed to be integrated to take_91. Unfortunately, I see that it is not integrated yet."

I was wondering, if ordering UDP packets has no utility: Why a flag that may cause issues is enabled by default on SecureXL from R80.20 onwards? 

____

____________
https://www.linkedin.com/in/federicomeiners/
0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events