Re: iperf test speeds are different on internal an...

kamilazat · ‎2024-08-29

Hello all!

I've been playing with QoS to understand what goes on and how goes on. Then I noticed a weird behavior.

Here's my simple setup (all R81.20):

PC1 (192.168.1.0/24) -- GW1 ---(10.0.0.0/24)--- GW2 -- PC2 (192.168.4.0/24)

Everything is in an isolated VMware environment, so I don't care about security or systems breaking.

When I do iperf3 from PC1 to GW1 I'm getting speeds up to 1gbps. But when I do the same thing between GW1 and GW2 I get a maximum of 315mbps.

At first I thought I wasn't setting QoS properly. But when set a rule with 100mbps limit, everything works with a maximum of 100mbps. I tried parallel connections with iperf3 -c x.x.x.x -P 4, and with different -P values, but the result is always the same.

I looked at fw ctl chain to maybe see something and noticed that inbound and outbound names for QoS are different.

[Expert@GW1:0]# fw ctl chain
in chain (22):
0: -7fffffff (0000000000000000) (00000000) SecureXL stateless check (sxl_state_check)
1: -7ffffffe (0000000000000000) (00000000) SecureXL VPN before decryption (vpn_in_before_decrypt)
2: -7ffffffd (0000000000000000) (00000000) SecureXL VPN after decryption (vpn_in_after_decrypt)
3: 6 (0000000000000000) (00000000) SecureXL lookup (sxl_lookup)
4: 7 (0000000000000000) (00000000) SecureXL QOS inbound (sxl_qos_inbound)
5: 8 (0000000000000000) (00000000) SecureXL inbound (sxl_inbound)
6: 9 (0000000000000000) (00000000) SecureXL medium path streaming (sxl_medium_path_streaming)
7: 10 (0000000000000000) (00000000) SecureXL inline path streaming (sxl_inline_path_streaming)
8: 11 (0000000000000000) (00000000) SecureXL Routing (sxl_routing)
9: -7f800000 (00007fd748ef22e2) (ffffffff) IP Options Strip (in) (ipopt_strip)
10: - 1fffff8 (00007fd748ef2f40) (00000001) Stateless verifications (in) (asm)
11: - 1fffff7 (00007fd748ef23c0) (00000001) fw multik misc proto forwarding
12: 0 (00007fd748f0fb40) (00000001) fw VM inbound (fw)
13: 2 (00007fd748ef2a80) (00000001) fw SCV inbound (scv)
14: 4 (00007fd746bbff30) (00000003) QoS inbound offload chain module
15: 5 (00007fd748f26dc0) (00000003) fw offload inbound (offload_in)
16: 20 (00007fd748f26080) (00000001) fw post VM inbound (post_vm)
17: 100000 (00007fd748f05196) (00000001) fw accounting inbound (acct)
18: 22000000 (00007fd746bc1ab0) (00000003) QoS slowpath inbound chain mod (fg_sched)
19: 7f730000 (00007fd748ef324e) (00000001) passive streaming (in) (pass_str)
20: 7f750000 (00007fd748f43940) (00000001) TCP streaming (in) (cpas)
21: 7f800000 (00007fd748ef25c8) (ffffffff) IP Options Restore (in) (ipopt_res)
out chain (16):
0: -7f800000 (00007fd748ef22e2) (ffffffff) IP Options Strip (out) (ipopt_strip)
1: - 1fffff0 (00007fd748ef2dc0) (00000001) TCP streaming (out) (cpas)
2: - 1ffff50 (00007fd748ef324e) (00000001) passive streaming (out) (pass_str)
3: - 1f00000 (00007fd748ef2f40) (00000001) Stateless verifications (out) (asm)
4: 0 (00007fd748f0fb40) (00000001) fw VM outbound (fw)
5: 10 (00007fd748f26080) (00000001) fw post VM outbound (post_vm)
6: 15000000 (00007fd746bc0590) (00000003) QoS outbound offload chain modul (fg_pol)
7: 21000000 (00007fd746bc1ab0) (00000003) QoS slowpath outbound chain mod (fg_sched)
8: 7f000000 (00007fd748f05196) (00000001) fw accounting outbound (acct)
9: 7f700000 (00007fd748ef95d8) (00000001) TCP streaming post VM (cpas)
10: 7f800000 (00007fd748ef25c8) (ffffffff) IP Options Restore (out) (ipopt_res)
11: 7f900000 (0000000000000000) (00000000) SecureXL outbound (sxl_outbound)
12: 7fa00000 (0000000000000000) (00000000) SecureXL QOS outbound (sxl_qos_outbound)
13: 7fb00000 (0000000000000000) (00000000) SecureXL VPN before encryption (vpn_in_before_encrypt)
14: 7fc00000 (0000000000000000) (00000000) SecureXL VPN after encryption (vpn_in_after_encrypt)
15: 7fd00000 (0000000000000000) (00000000) SecureXL Deliver (sxl_deliver)

Maybe it is because the internal networks are set as "Internal" and the network between the gateways are set as "External" and the gateways are doing something that slows down the connection. I'm pretty sure that I don't know something about this and everything is working as expected.

So I would deeply appreciate if anyone can illuminate me on this.

Cheers!

AkosBakos · ‎2024-08-29

Hi @kamilazat

What blades are enabled on GW-s? Enabled blades are degradating the throughput. This is a known behaviour.
The CPU-s are highly utilized?
- What are the CPU utilization when you reach only the GW1, and what if you reach the GW2 through the GW1?
and the througput values.
- but you said that, there is no bottleneck in your demo LAB

I think, this is where the dog is buried.

cpview command is you friend, monitor the values during the iperf test.

Akos

----------------
\m/_(>_<)_\m/

kamilazat · ‎2024-08-29

Thank you!

1. Only firewall, QoS and Monitoring is enabled on both gateways. I enabled monitoring later to potentially find something out.

2. CPUs are highly utilized in both GW-GW and PC-GW scenarios. Here they are:

GW-GW (GW1 is server, cpview from it)

PC-GW (GW1 is again server)

The values are pretty close. All machines are vanilla installations, except GW1 has Take 76 and GW2 is without any JHF. But I'm not sure if that's the issue. I'm now updating that as well anyway.

It's a VMware environment. I still have RAM and CPU left on the host. Both GWs have 4GB RAM with 8 cores, and PCs 2GB RAM and 4 cores.

Please ask for more information 🙂

the_rock · ‎2024-08-29

I know in the "old days" of CP, lots of people would fix this by simply disabling corexl from cpconfig, rebooting, re-enable corexl, reboot again.

Not sure that necessarily might be needed in newer versions, but just wondering, can you maybe run top and ps -auxw commands, so we can see what exactly could be causing cpu problem?

Andy

AkosBakos · ‎2024-08-29

Hi @kamilazat

The 4 GB memory is not few a little bit? What does free -m say?

Akos

----------------
\m/_(>_<)_\m/

the_rock · ‎2024-08-29

I agree with @AkosBakos , 4 GB ram is NOT nearly enough, even if you had 8, I would say that is barely enough...

Yes, run below and send the output.

Andy

[Expert@CP-GW:0]# free -h
total used free shared buff/cache available
Mem: 22G 5.9G 11G 31M 5.6G 15G
Swap: 8.0G 0B 8.0G
[Expert@CP-GW:0]#

kamilazat · ‎2024-08-29

Sure.

GW1
[Expert@GW1:0]# free -h
total used free shared buff/cache available
Mem: 3.6G 2.0G 283M 30M 1.3G 721M
Swap: 7.7G 4.2M 7.7G

GW2

[Expert@GW3:0]# free -h
total used free shared buff/cache available
Mem: 3.6G 2.0G 230M 20M 1.4G 782M
Swap: 8.0G 1.2M 8.0G

And still I don't understand why I can get 1gbps when I initiate the test from PC, while between GWs it's 300mbps. Just updated the other GW as well btw, still the same.

the_rock · ‎2024-08-29

Less than 1 GB free memory? I hate to say this, but thats not nearly enough my friend : - (

kamilazat · ‎2024-08-29

I completely agree with you. Increased both machines to 10GB and redid the test.

Here's the free -h:

[Expert@GW1:0]# free -h
total used free shared buff/cache available
Mem: 9.5G 2.2G 4.4G 20M 2.9G 6.3G
Swap: 7.7G 0B 7.7G

Both machines show the same numbers.

I did watch -n 1 "ps -auxww" during the test and it didn't change.

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
admin 1 0.0 0.0 2632 704 ? Ss 21:58 0:01 init [3]
admin 2 0.0 0.0 0 0 ? S 21:58 0:00 [kthreadd]
admin 3 0.0 0.0 0 0 ? S 21:58 0:00 [kworker/0:0]
admin 4 0.0 0.0 0 0 ? S< 21:58 0:00 [kworker/0:0H]
admin 6 1.9 0.0 0 0 ? S 21:58 0:28 [ksoftirqd/0]
admin 7 0.0 0.0 0 0 ? S 21:58 0:00 [migration/0]
admin 8 0.0 0.0 0 0 ? S 21:58 0:00 [rcu_bh]
admin 9 0.2 0.0 0 0 ? S 21:58 0:03 [rcu_sched]
admin 10 0.0 0.0 0 0 ? S 21:58 0:00 [rcuob/0]
admin 11 0.0 0.0 0 0 ? S 21:58 0:00 [rcuos/0]
admin 12 0.0 0.0 0 0 ? S< 21:58 0:00 [lru-add-drain
]
admin 13 0.0 0.0 0 0 ? S 21:58 0:00 [watchdog/0]
admin 14 0.0 0.0 0 0 ? S 21:58 0:00 [watchdog/1]
admin 15 0.0 0.0 0 0 ? S 21:58 0:00 [migration/1]
admin 16 0.0 0.0 0 0 ? S 21:58 0:00 [ksoftirqd/1]
admin 18 0.0 0.0 0 0 ? S< 21:58 0:00 [kworker/1:0H]
admin 19 0.0 0.0 0 0 ? S 21:58 0:00 [rcuob/1]
admin 20 0.0 0.0 0 0 ? S 21:58 0:00 [rcuos/1]
admin 21 0.0 0.0 0 0 ? S 21:58 0:00 [watchdog/2]
admin 22 0.0 0.0 0 0 ? S 21:58 0:00 [migration/2]

And top showed:

It looks like all the traffic goes to f2f, which is understandable. iperf uses TCP traffic unless UDP is specified with -u. But with -u the bandwidth goes even lower (played with it to increase the rate but to no avail).

On the other hand, when PC is the iperf client the bandwidth is still close to 1gbps. I don't understand what changes.

AkosBakos · ‎2024-08-29

What are the outputs of this commads?

fwaccel stats -s

fwaccel stat

fw ctl affinity -l -r

Akos

----------------
\m/_(>_<)_\m/

kamilazat · ‎2024-08-29

[Expert@GW1:0]# fwaccel stats -s
Accelerated conns/Total conns : 9/9 (100%)
LightSpeed conns/Total conns : 0/9 (0%)
Accelerated pkts/Total pkts : 3670/10364 (35%)
LightSpeed pkts/Total pkts : 0/10364 (0%)
F2Fed pkts/Total pkts : 6694/10364 (64%)
F2V pkts/Total pkts : 142/10364 (1%)
CPASXL pkts/Total pkts : 0/10364 (0%)
PSLXL pkts/Total pkts : 0/10364 (0%)
CPAS pipeline pkts/Total pkts : 0/10364 (0%)
PSL pipeline pkts/Total pkts : 0/10364 (0%)
QOS inbound pkts/Total pkts : 5779/10364 (55%)
QOS outbound pkts/Total pkts : 5046/10364 (48%)
Corrected pkts/Total pkts : 0/10364 (0%)

Accept Templates : enabled
Drop Templates : disabled
NAT Templates : enabled
LightSpeed Accel : disabled

[Expert@GW1:0]# fw ctl affinity -l -r
CPU 0:
CPU 1:
CPU 2:
CPU 3: fw_4 (active)
cprid lpd mpdaemon fwd core_uploader fgd50 in.asessiond rtmd cprid cpd msgd
CPU 4: fw_3 (active)
cprid lpd mpdaemon fwd core_uploader fgd50 in.asessiond rtmd cprid cpd msgd
CPU 5: fw_2 (active)
cprid lpd mpdaemon fwd core_uploader fgd50 in.asessiond rtmd cprid cpd msgd
CPU 6: fw_1 (active)
cprid lpd mpdaemon fwd core_uploader fgd50 in.asessiond rtmd cprid cpd msgd
CPU 7: fw_0 (active)
cprid lpd mpdaemon fwd core_uploader fgd50 in.asessiond rtmd cprid cpd msgd
All:
Interface eth0: has multi queue enabled
Interface eth1: has multi queue enabled

Just increased the SND cores to test, but still the same.

the_rock · ‎2024-08-29

You are saying even with more ram, though now it shows much more free memory, result is the same?

Andy

kamilazat · ‎2024-08-29

True..

the_rock · ‎2024-08-29

What does cpview show now? Anything different than before you added the ram?

Andy

kamilazat · ‎2024-08-29

RAM doesn't change while doing the test. And CPU goes crazy as before. At this point I started thinking that it's a VMWare issue. Maybe it calculates the traffic in a different way when it comes to different machines, despite PC having only 2GB RAM and 4 cores. I wouldn't be surprised. I'll try rebooting the host.

the_rock · ‎2024-08-29

I agree, thats probably a good idea.

kamilazat · ‎2024-08-29

I couldn't find anything related to VMware either. Rebooted the host, restarted VMWare NAT service. Tomorrow I will use the lab in the company esxi and try cranking up the specs on my machines there. I will update when I get anything new.

Thank you for your prompt reaction. I appreciate it!

the_rock · ‎2024-08-29

No worries mate, we are here to help. Yes, please let us know how the testing goes and what results you get.

Andy

AkosBakos · ‎2024-08-29

Hi @kamilazat

Change the core and the memory

4 Cores, 8 GB.

Akos

----------------
\m/_(>_<)_\m/

the_rock · ‎2024-08-29

Personally, I would bump it to 6 cores/ at least 12 GB ram (16 if possible)

🙂

Andy

the_rock · ‎2024-08-29

Akos made all super valid points, I would certainly check those as well.

Andy

PhoneBoy · ‎2024-08-29

From what you've posted, RAM is probably not the issue (though agree at least 8GB should be used).
The issue appears to be you are stressing out the SND cores.

What hardware have you specified for the NICs in VMware?
You should be using vmxnet3 (instead of e1000), which has better performance.

See the following SK for VMware-related tuning: https://support.checkpoint.com/results/sk/sk169252

A "Super Seven" output during the test might also provide some additional insight: https://community.checkpoint.com/t5/Scripts/S7PAC-Super-Seven-Performance-Assessment-Commands/m-p/40...

kamilazat · ‎2024-08-29

Hi @PhoneBoy,

I attached the s7 output from both GWs.

@the_rock I tried the test at work just now, it's esxi and has more resources, I cranked up the specs but the difference between PC initiated traffic and other-GW initiated traffic is still there.

I'm testing the load using this command:
On PC: iperf3 -c 192.168.1.254 -P 9 -t 60

On GW2: ./iperf3-i386 -c 192.168.1.254 -P 9 -t 60

I noticed that when GW initiates iperf, the load on SND cores are not balanced.

But when the PC initiates it, the load becomes more balanced.

Maybe it's because I downloaded iperf3 to PC from https://files.budman.pw/ (the first link that comes when you google), and for GWs I used the link Tim Hall provided here. Although I doubt that there is much of a difference in traffic generation algorithm on these different versions for PC and linux, but I stand corrected.

the_rock · ‎2024-08-30

Let me review those files, but it seems like even cpview output would not differ much...

Andy

PhoneBoy · ‎2024-08-30

You can see this in the Super Seven output for GW2:

F2Fed pkts/Total pkts            : 395146/395146 (100%)

SecureXL does not accelerate traffic originating from the gateway.
Which is why you're seeing all the traffic on one core.
This is expected behavior.

kamilazat · ‎2024-08-30

I completely agree. But I'm still puzzled by the fact that that's not the case when I initiate the traffic from PC. Just created a Linux Mint machine to try with, and it also can see 1gbps speeds.

F2F only happens when iperf traffic is initiated from the other GW.

Why do you think could this be?

AkosBakos · ‎2024-08-30

I'm not 100% sure, but only transitive traffic can be accelerated as I know.
@the_rock @PhoneBoy will correct me if I'm wrong.

----------------
\m/_(>_<)_\m/

the_rock · ‎2024-08-30

yup

https://support.checkpoint.com/results/sk/sk32578

kamilazat · ‎2024-08-30

Ok, I read the whole article twice and the closest candidate that may imply 'transitive traffic' is:
"All packets that match a rule, whose source or destination is the Security Gateway itself."

When I initiate an iperf test from one GW to the other, the source and the destination are both GWs. Although I have only one any-any-accept rule, that traffic hits this rule and since the src and dst are both GWs, SecureXL gets disabled, because an entry is created in the connections table that has GW addresses on both src and dst positions.

And when I initiate iperf traffic from a non-GW machine (PC or Linux Mint in my case) traffic gets accelerated without problems, which results in high bandwidth.

Am I correct in my understanding?

And if yes, is there a way to bypass this (maybe adding this connection manually to fastaccel connections table or something)?

PhoneBoy · ‎2024-08-30

SecureXL was designed to accelerate traffic traversing the gateway only.
It was not designed to nor will it accelerate traffic that directly originates from or terminates on the gateway.

Are you a member of CheckMates?

iperf test speeds are different on internal and external for QoS testing