- CheckMates
- :
- Products
- :
- Quantum
- :
- Security Gateways
- :
- Gateway is dropping packets every minute
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Are you a member of CheckMates?
×- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Gateway is dropping packets every minute
Hello mates,
We are dealing with very weird issue these days -
Gateway is dropping traffic each minute , like 11:15:02, 11:16:02, 11:17:02...
On each drop there are following lines in /var/log/messages:
Jun 19 10:08:03 2023 mdc-fw kernel: [fw4_3];fwmultik_f2p_cookie_outbound: fwmultik_f2p_packet_outbound Failed.
Jun 19 10:09:03 2023 mdc-fw kernel: [fw4_0];fwmultik_f2p_cookie_outbound: fwmultik_f2p_packet_outbound Failed.
Jun 19 10:10:03 2023 mdc-fw kernel: [fw4_11];fwmultik_f2p_cookie_outbound: fwmultik_f2p_packet_outbound Failed.
Jun 19 10:10:04 2023 mdc-fw kernel: [fw4_6];fwmultik_f2p_cookie_outbound: fwmultik_f2p_packet_outbound Failed.
Gateway was R80.40, when we experience the problem, now is upgraded to R81.10 JHF95, but still the same.
Device is CP 6800 and is pretty loaded - 240k connections, 3GB traffic and CPU about 65%
Service request is filled, but things are moving slowly, so just wanted to ask if someone had similar issue.
Thanks,
Dilian
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
That message is indicating that the worker/instance core has completed its inspection and is trying to hand a packet back to a dispatcher core for outbound transmission, but that operation failed. Please provide output of command enabled_blades and the Super Seven commands: S7PAC - Super Seven Performance Assessment Commands
Seems unlikely to be a dispatcher code problem as the issue followed you through a major upgrade.
It is possible that your NICs and/or bus are hanging up every minute (you would see a big spike in "hi" reported by top if so), and queued packets are backing up in the dispatchers trying to reach the NICs/bus until their queues are full and they can't accept any more. Please also provide output of fwaccel stats -l and cpstat -f sensors os. Any messages about NIC lockups in /var/log/messages? Are all expansion NIC cards firmly seated?
now available at maxpowerfirewalls.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I had seen that issue few times and every single time I fixed it by running cpconfig, disabling corexl, rebooting, re-enablin corexl and rebooting again. I know it probably cant be done during work hours, but thats what seemed to fix it.
Cheers,
Andy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the reply.
Did you try this on cluster system?
When one member is with disabled CoreXL, which will be the active device after first reboot?
Just to be prepared 🙂
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Every time I had to do this with customers, it was a cluster. Corexl state has nothing to do with cluster state, if you disable corexl and reboot current master, other member will become active, UNLESS, you have prempt mode enabled (as per below), which I would not recommend, due to traffic issues when failover happens.
Andy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Unfortunately, this didn't solve our problem. It was related to SecureXL traffic, as we found out later.
Disabling CoreXL had a catastrophic effect - from 20 cores to 1, the machine was barely breathing.
Also ClusterXL is counting cores available, and machine with less cores is becoming active.
Thanks,
Dilian
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I find that really odd, because I tested this exact scenario in my R81.20 lab (with a cluster) and never had that issue.
Andy