Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Dilian_Chernev
Collaborator

Gateway is dropping packets every minute

Hello mates,

We are dealing with very weird issue these days -

Gateway is dropping traffic each minute , like 11:15:02, 11:16:02, 11:17:02... 

On each drop there are following lines in /var/log/messages:

Jun 19 10:08:03 2023 mdc-fw kernel: [fw4_3];fwmultik_f2p_cookie_outbound: fwmultik_f2p_packet_outbound Failed.
Jun 19 10:09:03 2023 mdc-fw kernel: [fw4_0];fwmultik_f2p_cookie_outbound: fwmultik_f2p_packet_outbound Failed.
Jun 19 10:10:03 2023 mdc-fw kernel: [fw4_11];fwmultik_f2p_cookie_outbound: fwmultik_f2p_packet_outbound Failed.
Jun 19 10:10:04 2023 mdc-fw kernel: [fw4_6];fwmultik_f2p_cookie_outbound: fwmultik_f2p_packet_outbound Failed.

Gateway was R80.40, when we experience the problem, now is upgraded to R81.10 JHF95, but still the same.

Device is CP 6800 and is pretty loaded - 240k connections, 3GB traffic and CPU about 65%

Service request is filled, but things are moving slowly, so just wanted to ask if someone had similar issue.

Thanks,
Dilian

0 Kudos
6 Replies
Timothy_Hall
Legend Legend
Legend

That message is indicating that the worker/instance core has completed its inspection and is trying to hand a packet back to a dispatcher core for outbound transmission, but that operation failed.  Please provide output of command enabled_blades and the Super Seven commands:  S7PAC - Super Seven Performance Assessment Commands

Seems unlikely to be a dispatcher code problem as the issue followed you through a major upgrade. 

It is possible that your NICs and/or bus are hanging up every minute (you would see a big spike in "hi" reported by top if so), and queued packets are backing up in the dispatchers trying to reach the NICs/bus until their queues are full and they can't accept any more.  Please also provide output of fwaccel stats -l and cpstat -f sensors os.   Any messages about NIC lockups in /var/log/messages?  Are all expansion NIC cards firmly seated?

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
0 Kudos
the_rock
Legend
Legend

I had seen that issue few times and every single time I fixed it by running cpconfig, disabling corexl, rebooting, re-enablin corexl and rebooting again. I know it probably cant be done during work hours, but thats what seemed to fix it.

Cheers,

Andy

0 Kudos
Dilian_Chernev
Collaborator

Thanks for the reply. 

Did you try this on cluster system?
When one member is with disabled CoreXL, which will be the active device after first reboot?

Just to be prepared 🙂

0 Kudos
the_rock
Legend
Legend

Every time I had to do this with customers, it was a cluster. Corexl state has nothing to do with cluster state, if you disable corexl and reboot current master, other member will become active, UNLESS, you have prempt mode enabled (as per below), which I would not recommend, due to traffic issues when failover happens.

Andy

 

Screenshot_1.png

0 Kudos
Dilian_Chernev
Collaborator

Unfortunately, this didn't solve our problem. It was related to SecureXL traffic, as we found out later.

Disabling CoreXL had a catastrophic effect - from 20 cores to 1, the machine was barely breathing.
Also ClusterXL is counting cores available, and machine with less cores is becoming active.

Thanks,
Dilian

0 Kudos
the_rock
Legend
Legend

I find that really odd, because I tested this exact scenario in my R81.20 lab (with a cluster) and never had that issue.

Andy

0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events