Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Jan_Kleinhans
Collaborator

R80.40 High CPU usage on snd cores

Hello,

we are on R80.40 Take 67 using VSX VSLS.

We noticed that our SND Cores are always between 80 and 100 %. This didn't even changed when we added some cores to MultiQueue. We changed MutiQueue to 8 Queues.

At the moment there are 2 Cores at >80%. Is this a normal behaviour? We didn't seem to have had this issue with R80.30.

Best regards,

Jan

 

 

%Cpu0 : 0.0 us, 2.3 sy, 0.0 ni, 88.7 id, 0.0 wa, 0.7 hi, 8.3 si, 0.0 st
%Cpu1 : 0.0 us, 2.5 sy, 0.0 ni, 17.2 id, 0.0 wa, 0.4 hi, 80.0 si, 0.0 st
%Cpu2 : 0.0 us, 5.4 sy, 0.0 ni, 12.8 id, 0.0 wa, 0.4 hi, 81.4 si, 0.0 st
%Cpu3 : 0.0 us, 0.3 sy, 0.0 ni, 93.3 id, 0.0 wa, 0.3 hi, 6.0 si, 0.0 st
%Cpu4 : 10.1 us, 4.0 sy, 0.0 ni, 85.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu5 : 18.5 us, 4.0 sy, 0.0 ni, 76.8 id, 0.0 wa, 0.3 hi, 0.3 si, 0.0 st
%Cpu6 : 11.0 us, 5.0 sy, 0.0 ni, 83.7 id, 0.0 wa, 0.3 hi, 0.0 si, 0.0 st
%Cpu7 : 14.8 us, 4.9 sy, 0.0 ni, 79.9 id, 0.0 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu8 : 20.7 us, 5.0 sy, 0.0 ni, 73.6 id, 0.0 wa, 0.3 hi, 0.3 si, 0.0 st
%Cpu9 : 18.2 us, 4.7 sy, 0.0 ni, 76.4 id, 0.0 wa, 0.3 hi, 0.3 si, 0.0 st
%Cpu10 : 11.0 us, 6.3 sy, 0.0 ni, 82.0 id, 0.0 wa, 0.3 hi, 0.3 si, 0.0 st
%Cpu11 : 16.4 us, 5.4 sy, 0.0 ni, 77.5 id, 0.0 wa, 0.3 hi, 0.3 si, 0.0 st
%Cpu12 : 14.9 us, 5.3 sy, 0.0 ni, 78.9 id, 0.0 wa, 0.3 hi, 0.7 si, 0.0 st
%Cpu13 : 13.7 us, 5.3 sy, 0.0 ni, 80.7 id, 0.0 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu14 : 19.3 us, 6.6 sy, 0.0 ni, 73.1 id, 0.0 wa, 0.3 hi, 0.7 si, 0.0 st
%Cpu15 : 8.7 us, 5.0 sy, 0.0 ni, 86.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu16 : 0.0 us, 0.3 sy, 0.0 ni, 92.7 id, 0.0 wa, 0.3 hi, 6.6 si, 0.0 st
%Cpu17 : 0.0 us, 0.3 sy, 0.0 ni, 91.0 id, 0.0 wa, 0.3 hi, 8.4 si, 0.0 st
%Cpu18 : 0.0 us, 0.0 sy, 0.0 ni, 81.9 id, 0.0 wa, 0.7 hi, 17.4 si, 0.0 st
%Cpu19 : 0.0 us, 0.3 sy, 0.0 ni, 93.4 id, 0.0 wa, 0.3 hi, 6.0 si, 0.0 st
%Cpu20 : 7.9 us, 5.0 sy, 0.0 ni, 86.8 id, 0.0 wa, 0.3 hi, 0.0 si, 0.0 st
%Cpu21 : 19.3 us, 3.7 sy, 0.0 ni, 76.7 id, 0.0 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu22 : 7.6 us, 4.3 sy, 0.0 ni, 87.7 id, 0.0 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu23 : 5.0 us, 3.0 sy, 0.0 ni, 92.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu24 : 12.6 us, 1.3 sy, 0.0 ni, 85.4 id, 0.0 wa, 0.3 hi, 0.3 si, 0.0 st
%Cpu25 : 4.3 us, 2.7 sy, 0.0 ni, 93.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu26 : 11.0 us, 2.0 sy, 0.0 ni, 86.6 id, 0.0 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu27 : 9.0 us, 1.7 sy, 0.0 ni, 89.0 id, 0.0 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu28 : 26.2 us, 4.6 sy, 0.0 ni, 66.9 id, 0.0 wa, 0.3 hi, 2.0 si, 0.0 st
%Cpu29 : 16.7 us, 2.0 sy, 0.0 ni, 80.6 id, 0.0 wa, 0.3 hi, 0.3 si, 0.0 st
%Cpu30 : 12.1 us, 3.6 sy, 0.0 ni, 83.0 id, 0.0 wa, 0.3 hi, 1.0 si, 0.0 st
%Cpu31 : 26.9 us, 3.7 sy, 0.0 ni, 67.8 id, 0.0 wa, 0.7 hi, 1.0 si, 0.0 st

 

[Expert@fw-inet-01:0]# mq_mng -o -vv
Total 32 cores. Multiqueue 8 cores: 0,16,1,17,2,18,3,19
i/f type state mode cores
------------------------------------------------------------------------------------------------
Mgmt igb Up Auto (2/2)* 0(58),0(182)
eth1-01 ixgbe Up Manual (8/8) 0(70),1(71),2(74),3(75),16(76)
,17(77),18(78),19(79)
eth1-02 ixgbe Up Manual (8/8) 0(81),1(82),2(83),3(89),16(90)
,17(91),18(92),19(93)
eth3-01 ixgbe Up Manual (8/8) 0(65),1(66),2(95),3(96),16(97)
,17(98),18(99),19(100)
eth3-02 ixgbe Up Manual (8/8) 0(67),1(68),2(102),3(103),16(1
04),17(105),18(112),19(113)
* Management interface
------------------------------------------------------------------------------------------------
Mgmt <igb> max 2 cur 2
07:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network Connection (rev 03)
------------------------------------------------------------------------------------------------
eth1-01 <ixgbe> max 16 cur 8
87:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
------------------------------------------------------------------------------------------------
eth1-02 <ixgbe> max 16 cur 8
87:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
------------------------------------------------------------------------------------------------
eth3-01 <ixgbe> max 16 cur 8
04:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
------------------------------------------------------------------------------------------------
eth3-02 <ixgbe> max 16 cur 8
04:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)

core interfaces queue irq rx packets tx packets
------------------------------------------------------------------------------------------------
0 eth3-02 eth3-02-TxRx-0 67 5066 1743166
eth3-01 eth3-01-TxRx-0 65 3103828 5212580
eth1-01 eth1-01-TxRx-0 70 4297313 4704791
eth1-02 eth1-02-TxRx-0 81 5924371 3096900
Mgmt Mgmt-TxRx-0 58 120897 260792
Mgmt-TxRx-1 182 24383 29745
1 eth3-02 eth3-02-TxRx-1 68 33 1595748
eth3-01 eth3-01-TxRx-1 66 370712775 370857057
eth1-01 eth1-01-TxRx-1 71 3439626 7456325
eth1-02 eth1-02-TxRx-1 82 6387267 1973380
2 eth3-02 eth3-02-TxRx-2 102 0 1022588
eth3-01 eth3-01-TxRx-2 95 363411661 363079854
eth1-01 eth1-01-TxRx-2 74 3832140 4880679
eth1-02 eth1-02-TxRx-2 83 4962696 2656322
3 eth3-02 eth3-02-TxRx-3 103 2038 1322402
eth3-01 eth3-01-TxRx-3 96 2782309 4946975
eth1-01 eth1-01-TxRx-3 75 3354513 4134583
eth1-02 eth1-02-TxRx-3 89 4446265 1362053
16 eth3-02 eth3-02-TxRx-4 104 2024 1312522
eth3-01 eth3-01-TxRx-4 97 4133164 4228302
eth1-01 eth1-01-TxRx-4 76 3947081 6611134
eth1-02 eth1-02-TxRx-4 90 5499293 1598963
17 eth3-02 eth3-02-TxRx-5 105 0 1653859
eth3-01 eth3-01-TxRx-5 98 3609174 5663060
eth1-01 eth1-01-TxRx-5 77 4522628 7028742
eth1-02 eth1-02-TxRx-5 91 5599895 1303251
18 eth3-02 eth3-02-TxRx-6 112 19 1588342
eth3-01 eth3-01-TxRx-6 99 22066519 23223659
eth1-01 eth1-01-TxRx-6 78 4633862 4358520
eth1-02 eth1-02-TxRx-6 92 6150507 1791513
19 eth3-02 eth3-02-TxRx-7 113 4519 2089422
eth3-01 eth3-01-TxRx-7 100 4126068 4193200
eth1-01 eth1-01-TxRx-7 79 4193682 4987312
eth1-02 eth1-02-TxRx-7 93 5728181 1414670

 

[Expert@fw-inet-01:0]# fw ctl affinity -l
VS_0 fwk: CPU 4 5 6 7 8 9 10 11 12 13 14 15 20 21 22 23 24 25 26 27 28 29 30 31
VS_1 fwk: CPU 4 5 6 7 8 9 10 11 12 13 14 15 20 21 22 23 24 25 26 27 28 29 30 31
VS_2 fwk: CPU 4 5 6 7 8 9 10 11 12 13 14 15 20 21 22 23 24 25 26 27 28 29 30 31
VS_3 fwk: CPU 4 5 6 7 8 9 10 11 12 13 14 15 20 21 22 23 24 25 26 27 28 29 30 31
VS_4 fwk: CPU 4 5 6 7 8 9 10 11 12 13 14 15 20 21 22 23 24 25 26 27 28 29 30 31
VS_5 fwk: CPU 4 5 6 7 8 9 10 11 12 13 14 15 20 21 22 23 24 25 26 27 28 29 30 31
VS_6 fwk: CPU 4 5 6 7 8 9 10 11 12 13 14 15 20 21 22 23 24 25 26 27 28 29 30 31
VS_7 fwk: CPU 4 5 6 7 8 9 10 11 12 13 14 15 20 21 22 23 24 25 26 27 28 29 30 31
VS_8 fwk: CPU 4 5 6 7 8 9 10 11 12 13 14 15 20 21 22 23 24 25 26 27 28 29 30 31
VS_9 fwk: CPU 4 5 6 7 8 9 10 11 12 13 14 15 20 21 22 23 24 25 26 27 28 29 30 31
VS_22 fwk: CPU 4 5 6 7 8 9 10 11 12 13 14 15 20 21 22 23 24 25 26 27 28 29 30 31
VS_24 fwk: CPU 4 5 6 7 8 9 10 11 12 13 14 15 20 21 22 23 24 25 26 27 28 29 30 31
Interface Mgmt: has multi queue enabled
Interface Sync: has multi queue enabled
Interface eth3-01: has multi queue enabled
Interface eth3-02: has multi queue enabled
Interface eth1-01: has multi queue enabled
Interface eth1-02: has multi queue enabled

 

 

 

 

 

0 Kudos
12 Replies
Ilya_Yusupov
Employee
Employee

Hi @Jan_Kleinhans ,

 

according to your outputs we see that eth3-01 have major traffic:

 

eth3-01 eth3-01-TxRx-1 66 370712775 370857057

eth3-01 eth3-01-TxRx-2 95 363411661 363079854

 

eth3-01 ixgbe Up Manual (8/8) 0(65),1(66),2(95),3(96),16(97)
,17(98),18(99),19(100)

so from the above it looks CPU1 and CPU2 working hard.

can you tell which traffic is running on eth3-01? is it different traffic then it was in R80.30?

 

Thanks,

Ilya 

0 Kudos
Jan_Kleinhans
Collaborator

Hello,

traffic is the same as it was before. eth3-01 is part of a LACP bond with eth1-01.

 

Thanks,

Jan

0 Kudos
Ilya_Yusupov
Employee
Employee

can you share "cat /proc/net/bonding/<bond-name>" and from clish "show bonding groups"?

0 Kudos
Jan_Kleinhans
Collaborator

@Ilya_Yusupov 

 

[Expert@fw-inet-02:0]# cat /proc/net/bonding/bond1
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer3+4 (1)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 200
Down Delay (ms): 200

802.3ad info
LACP rate: slow
Min links: 0
Aggregator selection policy (ad_select): stable
System priority: 65535
System MAC address: 00:1c:7f:66:69:d8
Active Aggregator Info:
Aggregator ID: 1
Number of ports: 2
Actor Key: 15
Partner Key: 32773
Partner Mac Address: 00:23:04:ee:be:02

Slave Interface: eth1-01
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 1
Permanent HW addr: 00:1c:7f:66:69:d8
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
system priority: 65535
system mac address: 00:1c:7f:66:69:d8
port key: 15
port priority: 255
port number: 1
port state: 61
details partner lacp pdu:
system priority: 32667
system mac address: 00:23:04:ee:be:02
oper key: 32773
port priority: 32768
port number: 261
port state: 61

Slave Interface: eth3-01
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 4
Permanent HW addr: 00:1c:7f:66:42:1f
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
system priority: 65535
system mac address: 00:1c:7f:66:69:d8
port key: 15
port priority: 255
port number: 2
port state: 61
details partner lacp pdu:
system priority: 32667
system mac address: 00:23:04:ee:be:02
oper key: 32773
port priority: 32768
port number: 16645
port state: 61
[Expert@fw-inet-02:0]#

 

[Expert@fw-inet-02:0]# cat /proc/net/bonding/bond2
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer3+4 (1)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 200
Down Delay (ms): 200

802.3ad info
LACP rate: slow
Min links: 0
Aggregator selection policy (ad_select): stable
System priority: 65535
System MAC address: 00:1c:7f:66:69:d9
Active Aggregator Info:
Aggregator ID: 1
Number of ports: 2
Actor Key: 15
Partner Key: 32782
Partner Mac Address: 00:23:04:ee:be:02

Slave Interface: eth1-02
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 1
Permanent HW addr: 00:1c:7f:66:69:d9
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
system priority: 65535
system mac address: 00:1c:7f:66:69:d9
port key: 15
port priority: 255
port number: 1
port state: 61
details partner lacp pdu:
system priority: 32667
system mac address: 00:23:04:ee:be:02
oper key: 32782
port priority: 32768
port number: 270
port state: 61

Slave Interface: eth3-02
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 3
Permanent HW addr: 00:1c:7f:66:42:20
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
system priority: 65535
system mac address: 00:1c:7f:66:69:d9
port key: 15
port priority: 255
port number: 2
port state: 61
details partner lacp pdu:
system priority: 32667
system mac address: 00:23:04:ee:be:02
oper key: 32782
port priority: 32768
port number: 16654
port state: 61

0 Kudos
Timothy_Hall
Champion
Champion

Is your LACP bond supposed to be load sharing between eth3-01 and eth1-01?  It isn't, looks like pretty much everything is coming through eth3-01.  Any Remote Access VPNs in use?  If so see here: sk165853: High CPU usage on one CPU core when the number of Remote Access users is high

Beyond that you may have some heavy/elephant flows stuck to queues eth3-01-TxRx-1 and eth3-01-TxRx-2.

"Max Capture: Know Your Packets" Video Series
now available at http://www.maxpowerfirewalls.com
0 Kudos
Jan_Kleinhans
Collaborator

Hello @Timothy_Hall 

here is the configuration of the bonds on Checkpoint and Cisco Nexus side:

 

add bonding group 1
add bonding group 2
add bonding group 1 interface eth1-01
add bonding group 1 interface eth3-01
add bonding group 2 interface eth1-02
add bonding group 2 interface eth3-02
set bonding group 1 mode 8023AD
set bonding group 1 lacp-rate slow
set bonding group 1 xmit-hash-policy layer3+4
set bonding group 2 mode 8023AD
set bonding group 2 lacp-rate slow
set bonding group 2 xmit-hash-policy layer3+4

interface Ethernet1/5
description fw-inet-01 eth1-01
switchport mode trunk
switchport trunk allowed vlan 1,640-641,850,904-905,907,909,913-915,918-919,95
0-952,954-955,959-960,963-964,983,991-992
spanning-tree port type edge trunk
channel-group 5 mode active
no shutdown

interface port-channel5
speed 10000
description fw-inet-01 eth1-01
switchport mode trunk
switchport trunk allowed vlan 1,640-641,850,904-905,907,909,913-915,918-919,95
0-952,954-955,959-960,963-964,983,991-992
spanning-tree port type edge trunk
vpc 5

 

We added the kernel parameter cphwd_medium_path_qid_by_mspi 0 but this doesn't change anything.

As we have upgraded from R80.30, should we enable prioq which is disabled at the moment? 

Regards,

 

Jan

0 Kudos
HeikoAnkenbrand
Champion
Champion

Hi @Jan_Kleinhans 

As described by @Ilya_Yusupov , you can see that the mq cpu's are very busy. Ixgbe network card drivers support max 16 queues. You use 8 queues. Because your CoreXL instances are not fully utilized, so you can use more cores for MQ. I would use a 12/20 (first step) or 16/16 (second step) core distribution here.

0 Kudos
Ilya_Yusupov
Employee
Employee

I agree with @HeikoAnkenbrand  that you can add more cores to MQ but also i have suspicious as @Timothy_Hall mention that Bond LS is not working well.

0 Kudos
Jan_Kleinhans
Collaborator

We changed mulitqueue to 12 cores. Now the 2 High CPUs are at 70%, the other ones stay at < 20%.

LACP is configured as src-ip-dst on the Cisco Nexus switches and Layer3+4 on Checkpoint.

 

Regards,

Jan

0 Kudos
Jan_Kleinhans
Collaborator

Hello @HeikoAnkenbrand,

thanks for your response. But why are only 2 MQ-Cores at high CPU. The other ones are <20%.

 

0 Kudos
Ilya_Yusupov
Employee
Employee

@Jan_Kleinhans ,

 

i will contact you offline for further investigation.

Alex_Gilis
Advisor

Hello, was there ever a fix for this issue? I'm facing a situation now where a R80.40 Take 89 VSX is having 1 SND always around 100% and others around 10%.

0 Kudos