Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
T_Sonnberger
Contributor
Jump to solution

Help wanted - troubleshooting potential interface / performance issues

Dear CPUG,

a few days ago, we have seen lots of buffering during an online meeting in our office.

After putting some investigation into it, our network team figured out, that on the internet switch port, connected to our external firewall interface, there were some output drops and pause inputs (see below) which might be caused by the firewall, being unable to process the traffic.

 

We run R80.30 on a 5000 Appliance with 32 GB Memory, running at ~40% CPU

The peak bandwidth we could see during that day, when the buffering happened was 1,5 Gbit/s - on a 10GB/s line and 5 Gbit/s internet speed...

So I can not imagine that this can't be handled by the firewall - however, I am looking for some ways to prove, that it is not the firewall...

 

I do not see any errors on said firewall interface in clish "show interface"


Statistics:
TX bytes:14422795901250 packets:16529204292 errors:0 dropped:0 overruns:0 carrier:0
RX bytes:10209799558665 packets:15177341589 errors:0 dropped:0 overruns:0 frame:0

 

Are there any commands which might show another issue?

 

Thanks in advance and BR,

Thomas

 

--- SWITCH INTERFACE ---

#sh int Te1/1/4

TenGigabitEthernet1/1/4 is up, line protocol is up (connected)

  Hardware is Ten Gigabit Ethernet, address is 

  Description: 

  MTU 1500 bytes, BW 10000000 Kbit/sec, DLY 10 usec,

     reliability 255/255, txload 19/255, rxload 33/255

  Encapsulation ARPA, loopback not set

  Keepalive not set

  Full-duplex, 10Gb/s, link type is auto, media type is SFP-10GBase-SR

  input flow-control is on, output flow-control is unsupported

  ARP type: ARPA, ARP Timeout 04:00:00

  Last input never, output 00:00:01, output hang never

  Last clearing of "show interface" counters 23:56:12

  Input queue: 0/2000/0/0 (size/max/drops/flushes); Total output drops: 86 ---------

  Queueing strategy: fifo

  Output queue: 0/40 (size/max)

  30 second input rate 1302386000 bits/sec, 169485 packets/sec

  30 second output rate 750619000 bits/sec, 139352 packets/sec

     5718861587 packets input, 5274589864997 bytes, 0 no buffer

     Received 60170 broadcasts (1433 multicasts)

     0 runts, 0 giants, 0 throttles

     0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored

     0 watchdog, 1433 multicast, 46 pause input -----------------------------------------------

     0 input packets with dribble condition detected

     4876618459 packets output, 3279140072750 bytes, 0 underruns

     Output 179073 broadcasts (0 multicasts)

     0 output errors, 0 collisions, 0 interface resets

     0 unknown protocol drops

     0 babbles, 0 late collision, 0 deferred

     0 lost carrier, 0 no carrier, 0 pause output

     0 output buffer failures, 0 output buffers swapped out

0 Kudos
1 Solution

Accepted Solutions
Timothy_Hall
Legend Legend
Legend

About 80% of your traffic is fully accelerated but with your 1/3 default split only one CPU can be used to process that 80% of your traffic.  The 40% total CPU number is averaging all 4 together, and probably does not reflect CPU 0 getting killed as the only SND.  Multi-Queue is not possible with only 1 SND.

Would suggest moving to a 2/2 split by reducing instances from 3 to 2 which would allow Multi-Queue to be used; you'll need to manually enable Multi-Queue on your busy interfaces after the core split change unless you are using the Gaia 3.10 version of R80.30.

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com

View solution in original post

10 Replies
the_rock
Legend
Legend

You can try ethtool command as well and cpview would give you lots of good info as well. Is there particular interface you are having issues with?

 

Andy

Chris_Atkinson
Employee Employee
Employee

Which model of 5000 series appliance and is multi-queue enabled?

Disabling flow-control is also something that can be investigated pending other findings.

CCSM R77/R80/ELITE
T_Sonnberger
Contributor

Hi Chris,

 

It's

Platform: PL-20-00
Model: Check Point 5600

Multi-Queue seems to be disabled

0 Kudos
the_rock
Legend
Legend
Timothy_Hall
Legend Legend
Legend

Please provide output of enabled_blades and the "Super Seven" commands.  Hopefully the firewall has not been rebooted since you experienced the issues so we can see what happened.

https://community.checkpoint.com/t5/Scripts/S7PAC-Super-Seven-Performance-Assessment-Commands/m-p/40...

The router counters you highlighted might be significant, but we need to look at the firewall first as those router counter values might as well be a single drop of water in a big lake.

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
T_Sonnberger
Contributor

Thank you all for the replies! I really appreciate the support.

 

The exact Model is:

Platform: PL-20-00
Model: Check Point 5600

The interface in Question is eth1-03 - however, compared to eth1-02/ eth1-01 this should serve much less traffic than the others facing to the internal network or DMZs

Here is the output of the firewall, that was active while the issue happened (gw5). Unfortunately, the firewall failed over soon after the event, so I have attached the output of the actual active FW as well (gw6).

 

[Expert@gw5:0]# enabled_blades
fw vpn urlf av appi ips identityServer SSL_INSPECT anti_bot vpn

[Expert@gw5:0]# fwaccel stat
+-----------------------------------------------------------------------------+
|Id|Name |Status |Interfaces |Features |
+-----------------------------------------------------------------------------+
|0 |SND |enabled |eth2,eth3,eth4,Sync,Mgmt,|
| | | |eth1-01,eth1-02,eth1-03 |Acceleration,Cryptography |
| | | | |Crypto: Tunnel,UDPEncap,MD5, |
| | | | |SHA1,NULL,3DES,DES,CAST, |
| | | | |CAST-40,AES-128,AES-256,ESP, |
| | | | |LinkSelection,DynamicVPN, |
| | | | |NatTraversal,AES-XCBC,SHA256 |
+-----------------------------------------------------------------------------+

Accept Templates : disabled by Firewall
Layer Network disables template offloads from rule #13
Throughput acceleration still enabled.
Drop Templates : disabled
NAT Templates : disabled by Firewall
Layer Network disables template offloads from rule #13
Throughput acceleration still enabled.

[Expert@gw5:0]# fwaccel stats -s
Accelerated conns/Total conns : 0/0 (0%)
Accelerated pkts/Total pkts : 79169306167/98610388965 (80%)
F2Fed pkts/Total pkts : 8275669186/98610388965 (8%)
F2V pkts/Total pkts : 278906653/98610388965 (0%)
CPASXL pkts/Total pkts : 37397/98610388965 (0%)
PSLXL pkts/Total pkts : 11165376215/98610388965 (11%)
QOS inbound pkts/Total pkts : 0/98610388965 (0%)
QOS outbound pkts/Total pkts : 0/98610388965 (0%)
Corrected pkts/Total pkts : 0/98610388965 (0%)

[Expert@gw5:0]# grep -c ^processor /proc/cpuinfo
4

[Expert@gw5:0]# fw ctl affinity -l -r
CPU 0: eth2 eth3 eth4 Sync Mgmt eth1-01 eth1-02 eth1-03
CPU 1: fw_2
usrchkd in.acapd cprid rad lpd topod wsdnsd mpdaemon vpnd pepd in.asessiond fwd pdpd in.pingd cpd cprid
CPU 2: fw_1
usrchkd in.acapd cprid rad lpd topod wsdnsd mpdaemon vpnd pepd in.asessiond fwd pdpd in.pingd cpd cprid
CPU 3: fw_0
usrchkd in.acapd cprid rad lpd topod wsdnsd mpdaemon vpnd pepd in.asessiond fwd pdpd in.pingd cpd cprid
All:

[Expert@gw5:0]# netstat -ni
Kernel Interface table
Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
Mgmt 1500 0 80129483 0 0 0 107954387 0 0 0 BMRU
Sync 1500 0 137406541 0 0 0 217289236 0 0 0 BMRU
eth1-01 1500 0 18533478320 0 1030150 0 12882294670 0 0 0 BMRU
eth1-02 1500 0 37678299521 0 241708 0 37142992315 0 0 0 BMRU
eth1-03 1500 0 25318615396 0 0 0 27537390124 0 0 0 BMRU
eth2 1500 0 4768790562 2 2329 2329 8472705408 0 0 0 BMRU
eth3 1500 0 862009675 0 0 0 1102186634 0 0 0 BMRU

[Expert@gw5:0]# fw ctl multik stat
ID | Active | CPU | Connections | Peak
----------------------------------------------
0 | Yes | 3 | 7636 | 58884
1 | Yes | 2 | 7538 | 56644
2 | Yes | 1 | 7756 | 58696


[Expert@gw5:0]# cpstat os -f multi_cpu -o 1

 

Processors load
---------------------------------------------------------------------------------
|CPU#|User Time(%)|System Time(%)|Idle Time(%)|Usage(%)|Run queue|Interrupts/sec|
---------------------------------------------------------------------------------
| 1| 0| 0| 100| 0| ?| 1325|
| 2| 0| 1| 99| 1| ?| 1325|
| 3| 1| 1| 98| 2| ?| 1325|
| 4| 1| 2| 97| 3| ?| 1325|
---------------------------------------------------------------------------------

======================================================================================================

 

[Expert@gw6:0]# fwaccel stat
+-----------------------------------------------------------------------------+
|Id|Name |Status |Interfaces |Features |
+-----------------------------------------------------------------------------+
|0 |SND |enabled |eth2,eth3,eth4,Sync,Mgmt,|
| | | |eth1-01,eth1-02,eth1-03 |Acceleration,Cryptography |
| | | | |Crypto: Tunnel,UDPEncap,MD5, |
| | | | |SHA1,NULL,3DES,DES,CAST, |
| | | | |CAST-40,AES-128,AES-256,ESP, |
| | | | |LinkSelection,DynamicVPN, |
| | | | |NatTraversal,AES-XCBC,SHA256 |
+-----------------------------------------------------------------------------+

Accept Templates : disabled by Firewall
Layer Network disables template offloads from rule #13
Throughput acceleration still enabled.
Drop Templates : disabled
NAT Templates : disabled by Firewall
Layer Network disables template offloads from rule #13
Throughput acceleration still enabled.
[Expert@gw6:0]# fwaccel stats -s
Accelerated conns/Total conns : 16862/26062 (64%)
Accelerated pkts/Total pkts : 52099672116/65805199783 (79%)
F2Fed pkts/Total pkts : 7033490038/65805199783 (10%)
F2V pkts/Total pkts : 202645903/65805199783 (0%)
CPASXL pkts/Total pkts : 29425/65805199783 (0%)
PSLXL pkts/Total pkts : 6672008204/65805199783 (10%)
QOS inbound pkts/Total pkts : 0/65805199783 (0%)
QOS outbound pkts/Total pkts : 0/65805199783 (0%)
Corrected pkts/Total pkts : 0/65805199783 (0%)
[Expert@gw6:0]# grep -c ^processor /proc/cpuinfo
4
[Expert@gw6:0]# fw ctl affinity -l -r
CPU 0: eth2 eth3 eth4 Sync Mgmt eth1-01 eth1-02 eth1-03
CPU 1: fw_2
mpdaemon pepd pdpd rad topod in.acapd in.asessiond lpd in.pingd cprid fwd usrchkd vpnd wsdnsd cprid cpd
CPU 2: fw_1
mpdaemon pepd pdpd rad topod in.acapd in.asessiond lpd in.pingd cprid fwd usrchkd vpnd wsdnsd cprid cpd
CPU 3: fw_0
mpdaemon pepd pdpd rad topod in.acapd in.asessiond lpd in.pingd cprid fwd usrchkd vpnd wsdnsd cprid cpd
All:
[Expert@gw6:0]# netstat -ni
Kernel Interface table
Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
Mgmt 1500 0 59443411 0 0 0 80442447 0 0 0 BMRU
Sync 1500 0 221146723 0 213 213 173510428 0 0 0 BMRU
eth1-01 1500 0 12325817578 0 547458 0 8615993166 0 0 0 BMRU
eth1-02 1500 0 25981253604 0 89046 0 25408674761 0 0 0 BMRU
eth1-03 1500 0 16681713676 0 0 0 18166382645 0 0 0 BMRU
eth2 1500 0 3596415052 0 121 121 6063612802 0 0 0 BMRU
eth3 1500 0 656401340 0 0 0 789934560 0 0 0 BMRU


[Expert@gw6:0]# fw ctl multik stat
ID | Active | CPU | Connections | Peak
----------------------------------------------
0 | Yes | 3 | 8889 | 54242
1 | Yes | 2 | 8902 | 52141
2 | Yes | 1 | 9096 | 54035
[Expert@gw6:0]# cpstat os -f multi_cpu -o 1

 

Processors load
---------------------------------------------------------------------------------
|CPU#|User Time(%)|System Time(%)|Idle Time(%)|Usage(%)|Run queue|Interrupts/sec|
---------------------------------------------------------------------------------
| 1| 0| 15| 85| 15| ?| 65868|
| 2| 2| 12| 86| 14| ?| 65868|
| 3| 2| 11| 86| 14| ?| 65868|
| 4| 4| 17| 79| 21| ?| 65868|
---------------------------------------------------------------------------------

 

Thank you so much in advance!

 

BR,

Thomas

Timothy_Hall
Legend Legend
Legend

About 80% of your traffic is fully accelerated but with your 1/3 default split only one CPU can be used to process that 80% of your traffic.  The 40% total CPU number is averaging all 4 together, and probably does not reflect CPU 0 getting killed as the only SND.  Multi-Queue is not possible with only 1 SND.

Would suggest moving to a 2/2 split by reducing instances from 3 to 2 which would allow Multi-Queue to be used; you'll need to manually enable Multi-Queue on your busy interfaces after the core split change unless you are using the Gaia 3.10 version of R80.30.

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
T_Sonnberger
Contributor

Dear Timothy,

thank you very much for the explanation and support!

 

We will have an upgrade to 81.10 tomorrow and will upgrade/ (fresh install 3.10) on the firewalls afterwards as well.

I will apply those changes then!

 

BR,

Thomas

0 Kudos
Chris_Atkinson
Employee Employee
Employee

Normally multi-queue would be enabled by default with the new kernel post upgrade.

But please check it as Tim has described above with respect to the SND assignment/allocation. 

CCSM R77/R80/ELITE
Timothy_Hall
Legend Legend
Legend

R81.10 will enable Multi-Queue by default on all interfaces that support it, and will also utilize Dynamic Split so it should adjust to a 2/2 split all on its own.

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events