Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Jan_Kleinhans
Advisor

Excessive RX-DRP Errors with new 19200 Appliance R81.20 T76 - rx_mbuf_allocation_errors

Hello.

Does anyone experience high RX-DRP with 19200 appliance or UPPAK mode? 

We are migrating from 23800 applicances to 19200 appliances. We have a bond with 2 10Gig interfaces. We try to use them in default UPPAK mode. 

We experience high packet loss and see the following outputs. The value of rx_mbuf_allocation_errors is very high.

Also we see them in usim_x86.elg:

Sep 16 09:54:00.743035 [uspace];[tid_20];[UPPAK];m_get: allocation failure
Sep 16 09:54:00.743042 [uspace];[tid_11];[UPPAK];cpfifo_port_allocate_mbufs: DPDK CPFIFO Out of mbuf
Sep 16 09:54:00.743053 [uspace];[tid_20];[UPPAK];cpfifo_port_allocate_mbufs: DPDK CPFIFO Out of mbuf
Sep 16 09:54:00.743062 [uspace];[tid_25];[UPPAK];m_get: allocation failure

Has anyone experienced the same problems?

fw-lan-02:0> show interface eth1-06
state on
mac-addr 00:1c:7f:47:d5:79
type ethernet
link-state link up
instance 0
mtu 1600
auto-negotiation on
speed 10G
ipv6-autoconfig Not configured
monitor-mode Not configured
duplex full
link-speed Not configured
comments
ipv4-address Not Configured
ipv6-address Not Configured
ipv6-local-link-address Not Configured

Statistics:
TX bytes:1032081410052 packets:1320654876 errors:0 dropped:0 overruns:0 carrier:0
RX bytes:2080913191732 packets:1898913014 errors:0 dropped:469570566880 overruns:0 frame:0

SD-WAN: Not Configured
fw-lan-02:0> show interface eth1-05
state on
mac-addr 00:1c:7f:47:d5:79
type ethernet
link-state link up
instance 0
mtu 1600
auto-negotiation on
speed 10G
ipv6-autoconfig Not configured
monitor-mode Not configured
duplex full
link-speed Not configured
comments
ipv4-address Not Configured
ipv6-address Not Configured
ipv6-local-link-address Not Configured

Statistics:
TX bytes:867994980572 packets:1246089903 errors:0 dropped:0 overruns:0 carrier:0
RX bytes:3210507271634 packets:2929561186 errors:0 dropped:325614418688 overruns:0 frame:0

 


fw-lan-02:0> show interface eth1-05 stats
NIC statistics:
ifs_ibytes_hi: 747
ifs_ibytes_lo: 2154174803
ifs_obytes_hi: 202
ifs_obytes_lo: 411585168
ifs_ipackets: 2929551259
ifs_opackets: 1246089890
ifs_imcasts: 0
ifs_omcasts: 0
ifs_noproto: 0
ifs_ibcasts: 0
ifs_obcasts: 0
ifs_linkchanges: 0
ife_ierrors: 0
ife_oerrors: 0
ife_iqdrops: 1351030752
ife_oqdrops: 0
iee_rx_missed: 0
rx_q0_packets: 251382073
rx_q1_packets: 135112608
rx_q2_packets: 108793630
rx_q3_packets: 99839085
rx_q4_packets: 401336685
rx_q5_packets: 170071613
rx_q6_packets: 99197122
rx_q7_packets: 85330872
rx_q8_packets: 247012275
rx_q9_packets: 184947473
rx_q10_packets: 58575962
rx_q11_packets: 128533076
rx_q12_packets: 188794790
rx_q13_packets: 88227332
rx_q14_packets: 15420806
rx_q15_packets: 11744531
rx_q16_packets: 86018139
rx_q17_packets: 181415839
rx_q18_packets: 11060951
rx_q19_packets: 13179181
rx_q20_packets: 103675380
rx_q21_packets: 152395019
rx_q22_packets: 6270141
rx_q23_packets: 9941505
rx_q24_packets: 31248572
rx_q25_packets: 28490326
rx_q26_packets: 6173843
rx_q27_packets: 4686960
rx_q28_packets: 9899463
rx_q29_packets: 9211374
rx_q30_packets: 1046387
rx_q31_packets: 518246
tx_q0_packets: 58670930
tx_q1_packets: 41331343
tx_q2_packets: 81738879
tx_q3_packets: 71306968
tx_q4_packets: 59565053
tx_q5_packets: 54689758
tx_q6_packets: 19504612
tx_q7_packets: 9201719
tx_q8_packets: 71205088
tx_q9_packets: 49443406
tx_q10_packets: 52837430
tx_q11_packets: 17829913
tx_q12_packets: 14668334
tx_q13_packets: 7148854
tx_q14_packets: 5049054
tx_q15_packets: 718883
tx_q16_packets: 47929073
tx_q17_packets: 38964650
tx_q18_packets: 102327176
tx_q19_packets: 64025125
tx_q20_packets: 63781423
tx_q21_packets: 113685215
tx_q22_packets: 23009375
tx_q23_packets: 6892868
tx_q24_packets: 39008854
tx_q25_packets: 51092222
tx_q26_packets: 32561960
tx_q27_packets: 17300861
tx_q28_packets: 14007626
tx_q29_packets: 11318509
tx_q30_packets: 4923407
tx_q31_packets: 351322
tx_q32_packets: 0
rx_good_packets: 2929551259
tx_good_packets: 1246089890
rx_good_bytes: 2154174803
tx_good_bytes: 411585168
rx_missed_errors: 349439
rx_errors: 0
tx_errors: 0
rx_mbuf_allocation_errors: 1351030752
rx_unicast_packets: 2928407049
rx_multicast_packets: 227793
rx_broadcast_packets: 1265876
rx_dropped_packets: 0
rx_unknown_protocol_packets: 20
tx_unicast_packets: 1246085440
tx_multicast_packets: 2114
tx_broadcast_packets: 2336
tx_dropped_packets: 0
tx_link_down_dropped: 0
rx_crc_errors: 0
rx_illegal_byte_errors: 0
rx_error_bytes: 0
mac_local_errors: 1
mac_remote_errors: 4
rx_len_errors: 0
tx_xon_packets: 0
rx_xon_packets: 0
tx_xoff_packets: 0
rx_xoff_packets: 0
rx_size_64_packets: 16848
rx_size_65_to_127_packets: 472689146
rx_size_128_to_255_packets: 114756770
rx_size_256_to_511_packets: 281422015
rx_size_512_to_1023_packets: 56548075
rx_size_1024_to_1522_packets: 2004467864
rx_size_1523_to_max_packets: 0
rx_undersized_errors: 0
rx_oversize_errors: 0
rx_mac_short_pkt_dropped: 0
rx_fragmented_errors: 0
rx_jabber_errors: 0
tx_size_64_packets: 18695856
tx_size_65_to_127_packets: 513383307
tx_size_128_to_255_packets: 81585332
tx_size_256_to_511_packets: 67269203
tx_size_512_to_1023_packets: 56242348
tx_size_1024_to_1522_packets: 508913844
tx_size_1523_to_max_packets: 0


fw-lan-02:0> show interface eth1-06 stats
NIC statistics:
ifs_ibytes_hi: 484
ifs_ibytes_lo: 2149301588
ifs_obytes_hi: 240
ifs_obytes_lo: 1289292410
ifs_ipackets: 1898913842
ifs_opackets: 1320655157
ifs_imcasts: 0
ifs_omcasts: 0
ifs_noproto: 0
ifs_ibcasts: 0
ifs_obcasts: 0
ifs_linkchanges: 0
ife_ierrors: 0
ife_oerrors: 0
ife_iqdrops: 1504863936
ife_oqdrops: 0
iee_rx_missed: 0
rx_q0_packets: 172579582
rx_q1_packets: 76303749
rx_q2_packets: 329678703
rx_q3_packets: 44063171
rx_q4_packets: 62528432
rx_q5_packets: 32901907
rx_q6_packets: 32105782
rx_q7_packets: 44094617
rx_q8_packets: 138524010
rx_q9_packets: 246588835
rx_q10_packets: 64388588
rx_q11_packets: 17778245
rx_q12_packets: 52839104
rx_q13_packets: 70066735
rx_q14_packets: 12962051
rx_q15_packets: 12331671
rx_q16_packets: 75711877
rx_q17_packets: 87279153
rx_q18_packets: 11239686
rx_q19_packets: 9997181
rx_q20_packets: 74352148
rx_q21_packets: 154468121
rx_q22_packets: 5606187
rx_q23_packets: 6953227
rx_q24_packets: 16999890
rx_q25_packets: 26051349
rx_q26_packets: 2846872
rx_q27_packets: 4501816
rx_q28_packets: 5036195
rx_q29_packets: 7462420
rx_q30_packets: 256202
rx_q31_packets: 416336
tx_q0_packets: 68053990
tx_q1_packets: 88242912
tx_q2_packets: 87581773
tx_q3_packets: 67629520
tx_q4_packets: 50364509
tx_q5_packets: 63239317
tx_q6_packets: 23207971
tx_q7_packets: 7366685
tx_q8_packets: 55568456
tx_q9_packets: 75073851
tx_q10_packets: 30182053
tx_q11_packets: 15754733
tx_q12_packets: 12537409
tx_q13_packets: 7374943
tx_q14_packets: 4010786
tx_q15_packets: 255981
tx_q16_packets: 71826289
tx_q17_packets: 53816038
tx_q18_packets: 92783569
tx_q19_packets: 51707007
tx_q20_packets: 54660427
tx_q21_packets: 64710197
tx_q22_packets: 25497062
tx_q23_packets: 10116747
tx_q24_packets: 102414788
tx_q25_packets: 62240084
tx_q26_packets: 30935190
tx_q27_packets: 14182433
tx_q28_packets: 14365301
tx_q29_packets: 9477528
tx_q30_packets: 4865081
tx_q31_packets: 612424
tx_q32_packets: 103
rx_good_packets: 1898913842
tx_good_packets: 1320655157
rx_good_bytes: 2149301588
tx_good_bytes: 1289292410
rx_missed_errors: 525223
rx_errors: 0
tx_errors: 0
rx_mbuf_allocation_errors: 1504864864
rx_unicast_packets: 1898601553
rx_multicast_packets: 141607
rx_broadcast_packets: 695905
rx_dropped_packets: 0
rx_unknown_protocol_packets: 0
tx_unicast_packets: 1320345103
tx_multicast_packets: 2129
tx_broadcast_packets: 307925
tx_dropped_packets: 0
tx_link_down_dropped: 0
rx_crc_errors: 0
rx_illegal_byte_errors: 0
rx_error_bytes: 0
mac_local_errors: 1
mac_remote_errors: 2
rx_len_errors: 0
tx_xon_packets: 0
rx_xon_packets: 0
tx_xoff_packets: 0
rx_xoff_packets: 0
rx_size_64_packets: 14981
rx_size_65_to_127_packets: 298458118
rx_size_128_to_255_packets: 112141659
rx_size_256_to_511_packets: 148274253
rx_size_512_to_1023_packets: 56594827
rx_size_1024_to_1522_packets: 1283955227
rx_size_1523_to_max_packets: 0
rx_undersized_errors: 0
rx_oversize_errors: 0
rx_mac_short_pkt_dropped: 0
rx_fragmented_errors: 0
rx_jabber_errors: 0
tx_size_64_packets: 32307730
tx_size_65_to_127_packets: 453244290
tx_size_128_to_255_packets: 87493234
tx_size_256_to_511_packets: 72191408
tx_size_512_to_1023_packets: 56382076
tx_size_1024_to_1522_packets: 619036419
tx_size_1523_to_max_packets: 0

 

0 Kudos
11 Replies
emmap
Employee
Employee

This will likely require a TAC case for investigation and debugging. 

0 Kudos
Chris_Atkinson
Employee Employee
Employee

Out of interest has the machine been rebooted since the MTU value was configured higher than 1500 bytes?

For context see: PRJ-53892, PMTR-101528

CCSM R77/R80/ELITE
0 Kudos
Jan_Kleinhans
Advisor

Yes the system have been rebooted nearly a hundred times 🙂

0 Kudos
Timothy_Hall
Legend Legend
Legend

Starting in Gaia 3.10, the RX-DRP counter does not always mean what you think it means, as not all RX-DRP's are "legit".  "Legit" would be the loss of an incoming frame due to insufficient ring buffer free space that the firewall wanted and needed to process.  Traffic bearing a non-IPv4 EtherType (or unconfigured VLAN tag) will be discarded by the NIC and RX-DRP incremented, but that is traffic that the firewall was not able to process anyway and "not legit".  sk166424: Number of RX packet drops on interfaces increases on a Security Gateway R80.30 and higher ...

Looking at your output there is no way all RX-DRPs reported are legit for eth1-05, as the frame loss rate is beyond 100%.  Looking at the stats for eth1-05, only these two counters are actually "legit" buffering drops/misses:

ife_iqdrops: 1351030752
rx_missed_errors: 349439

But even just considering these, the frame loss rate is still impossibly high.  This is either some kind of bug (possibly involving UPPAK which has its tendrils deep into the NIC driver) or someone has tampered with the default Multi-Queue settings, it should be set for Dynamic/Auto for all interfaces.  You can check this with mq_mng --show

If RX-DRP is constantly incrementing because you suspect it is constantly being splattered with some kind of "not legit" traffic, this can be verified very easily.  Simply run a tcpdump on the interface in question (filter doesn't matter), do the RX-DRPs stop incrementing?  Do they resume when the tcpdump is stopped?  If so the interface is indeed being hit with "not legit" traffic causing RX-DRP to go up.

Beyond that the TAC will need to get involved.

 

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
0 Kudos
Jan_Kleinhans
Advisor

Hello,

thanks for your knowledge. Multiqueue is configured Dynamic/Auto. At the moment the appliance is only standby, so no traffic on these interfaces. As we are experiencing problems when getting the machine active I can try your tcpdump test only in a maintenance window.

 

TAC case is already opened since yesterday morning. But there is no progress. The only "suggestion" is to increase ringbuffer size. This seems not to be possible with UPPAK. If I do so nothing happens.

Have you seen rx_mbuf_allocation_errors before?

Regards,

Jan

 

0 Kudos
Timothy_Hall
Legend Legend
Legend

My understanding is that mbufs are used to store frame data, which is then referred to by a descriptor stored in the ring buffer.

Depending on the size of the frame, multiple mbufs may be needed to store it.  I'm wondering if your non-standard 1600 MTU is causing an extra mbuf to be allocated for max-size frames beyond what would normally be needed for a standard 1500 MTU, and UPPAK is having a problem with this, maybe not properly or promptly free'ing that extra mbuf and then they run out?  However supposedly UPPAK supports an MTU up to 2002: sk181250: HCP report shows "Jumbo Frame Error" when SecureXL works in the User Space (UPPAK) mode  Might be interesting to see what the HCP Jumbo Frame test is saying on your gateway, or if any of the other HCP tests are complaining about anything.

Would be curious to see what happens if you set the MTU back to the standard 1500 as I suspect that will fix your problem.

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
0 Kudos
Jan_Kleinhans
Advisor

Changing the MTU to default 1500 did help. We use temporarily another gateway, so  it doesn't harm us at the moment but I hope that TAC will have a solution for the problem.

 

0 Kudos
Timothy_Hall
Legend Legend
Legend

Thanks for the follow-up and that seems to have to confirmed my suspicions.  User Space SecureXL (UPPAK) is a pretty radical change in the implementation of SecureXL on-par with the major changes made to SecureXL in R80.20.  There were definitely some growing pains when that happened in R80.20, and it seems the same is happening with UPPAK which I suspect is related to your problem.

One other option is to ask TAC if if SecureXL can be set back to the traditional KPPAK mode on your system and see if that helps, but this may not be possible or supported for 9000/19000/29000/Lightspeed appliances.

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
0 Kudos
emmap
Employee
Employee

It's supported, you can change the mode in the cpconfig menu.

https://support.checkpoint.com/results/sk/sk153832#TOC05

0 Kudos
Jan_Kleinhans
Advisor

Yes it would be possible. But then the acceleration feature of our 100GE Ports wouldn't work. At the moment they are not in use, but they will be in the near future.

 

0 Kudos
emmap
Employee
Employee

Traffic will still be accelerated in the 'traditional' SecureXL method, but yes it won't be using the new hardware capabilities. Ideally we need to resolve this, which will require investigation and remediation via TAC. Please let us know how you go with your TAC case.

0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events