Re: iperf test speeds are different on internal an... - Page 2

kamilazat · ‎2024-08-29

Hello all!

I've been playing with QoS to understand what goes on and how goes on. Then I noticed a weird behavior.

Here's my simple setup (all R81.20):

PC1 (192.168.1.0/24) -- GW1 ---(10.0.0.0/24)--- GW2 -- PC2 (192.168.4.0/24)

Everything is in an isolated VMware environment, so I don't care about security or systems breaking.

When I do iperf3 from PC1 to GW1 I'm getting speeds up to 1gbps. But when I do the same thing between GW1 and GW2 I get a maximum of 315mbps.

At first I thought I wasn't setting QoS properly. But when set a rule with 100mbps limit, everything works with a maximum of 100mbps. I tried parallel connections with iperf3 -c x.x.x.x -P 4, and with different -P values, but the result is always the same.

I looked at fw ctl chain to maybe see something and noticed that inbound and outbound names for QoS are different.

[Expert@GW1:0]# fw ctl chain
in chain (22):
0: -7fffffff (0000000000000000) (00000000) SecureXL stateless check (sxl_state_check)
1: -7ffffffe (0000000000000000) (00000000) SecureXL VPN before decryption (vpn_in_before_decrypt)
2: -7ffffffd (0000000000000000) (00000000) SecureXL VPN after decryption (vpn_in_after_decrypt)
3: 6 (0000000000000000) (00000000) SecureXL lookup (sxl_lookup)
4: 7 (0000000000000000) (00000000) SecureXL QOS inbound (sxl_qos_inbound)
5: 8 (0000000000000000) (00000000) SecureXL inbound (sxl_inbound)
6: 9 (0000000000000000) (00000000) SecureXL medium path streaming (sxl_medium_path_streaming)
7: 10 (0000000000000000) (00000000) SecureXL inline path streaming (sxl_inline_path_streaming)
8: 11 (0000000000000000) (00000000) SecureXL Routing (sxl_routing)
9: -7f800000 (00007fd748ef22e2) (ffffffff) IP Options Strip (in) (ipopt_strip)
10: - 1fffff8 (00007fd748ef2f40) (00000001) Stateless verifications (in) (asm)
11: - 1fffff7 (00007fd748ef23c0) (00000001) fw multik misc proto forwarding
12: 0 (00007fd748f0fb40) (00000001) fw VM inbound (fw)
13: 2 (00007fd748ef2a80) (00000001) fw SCV inbound (scv)
14: 4 (00007fd746bbff30) (00000003) QoS inbound offload chain module
15: 5 (00007fd748f26dc0) (00000003) fw offload inbound (offload_in)
16: 20 (00007fd748f26080) (00000001) fw post VM inbound (post_vm)
17: 100000 (00007fd748f05196) (00000001) fw accounting inbound (acct)
18: 22000000 (00007fd746bc1ab0) (00000003) QoS slowpath inbound chain mod (fg_sched)
19: 7f730000 (00007fd748ef324e) (00000001) passive streaming (in) (pass_str)
20: 7f750000 (00007fd748f43940) (00000001) TCP streaming (in) (cpas)
21: 7f800000 (00007fd748ef25c8) (ffffffff) IP Options Restore (in) (ipopt_res)
out chain (16):
0: -7f800000 (00007fd748ef22e2) (ffffffff) IP Options Strip (out) (ipopt_strip)
1: - 1fffff0 (00007fd748ef2dc0) (00000001) TCP streaming (out) (cpas)
2: - 1ffff50 (00007fd748ef324e) (00000001) passive streaming (out) (pass_str)
3: - 1f00000 (00007fd748ef2f40) (00000001) Stateless verifications (out) (asm)
4: 0 (00007fd748f0fb40) (00000001) fw VM outbound (fw)
5: 10 (00007fd748f26080) (00000001) fw post VM outbound (post_vm)
6: 15000000 (00007fd746bc0590) (00000003) QoS outbound offload chain modul (fg_pol)
7: 21000000 (00007fd746bc1ab0) (00000003) QoS slowpath outbound chain mod (fg_sched)
8: 7f000000 (00007fd748f05196) (00000001) fw accounting outbound (acct)
9: 7f700000 (00007fd748ef95d8) (00000001) TCP streaming post VM (cpas)
10: 7f800000 (00007fd748ef25c8) (ffffffff) IP Options Restore (out) (ipopt_res)
11: 7f900000 (0000000000000000) (00000000) SecureXL outbound (sxl_outbound)
12: 7fa00000 (0000000000000000) (00000000) SecureXL QOS outbound (sxl_qos_outbound)
13: 7fb00000 (0000000000000000) (00000000) SecureXL VPN before encryption (vpn_in_before_encrypt)
14: 7fc00000 (0000000000000000) (00000000) SecureXL VPN after encryption (vpn_in_after_encrypt)
15: 7fd00000 (0000000000000000) (00000000) SecureXL Deliver (sxl_deliver)

Maybe it is because the internal networks are set as "Internal" and the network between the gateways are set as "External" and the gateways are doing something that slows down the connection. I'm pretty sure that I don't know something about this and everything is working as expected.

So I would deeply appreciate if anyone can illuminate me on this.

Cheers!

kamilazat · ‎2024-08-30

When I set iperf as a server on the GW and initiate the test from PC, doesn't the traffic terminate on GW?

PhoneBoy · ‎2024-08-30

"When I set iperf as a server on the GW" what EXACTLY does this mean?
You're running an actual server on the gateway to "receive" iperf traffic?
It's possible we now accelerate traffic terminating on the gateway itself as I know we made some performance enhancements for Visitor Mode in R81 (Remote Access related, which DOES terminate on the gateway).

As a separate question, I'm curious why the gateways are endpoints in these tests at all.
In the real world, you would never generate traffic from the gateway itself (outside of maintenace-type operations).

kamilazat · ‎2024-08-30

I actually started playing with iperf in order to understand QoS policies better.

I put iperf3 to both GWs by using the link Tim Hall provided here. Then I ran iperf3 on GW1 as a server by using -s flag. And then I ran iperf3 test from GW2 as a client by using -c flag (and of course -P for parallel traffic).

Again, this is solely for playing to see what happens when you do such thing. Pointless? Pretty much yes, since, as you also said, in a real world scenario this is virtually not applicable. But a game is a game 🙂

the_rock · ‎2024-08-30

You got it!

Is there way to bypass? Yes sir, 100%

You can use below

https://support.checkpoint.com/results/sk/sk156672

PhoneBoy · ‎2024-08-30

SecureXL is not capable of accelerating traffic originating from or terminating on the gateway.
Which means this will not work.

AkosBakos · ‎2024-08-30

Don't forget the security! With this you make a Layer 3 switch in this relation. But in LAB it's Ok. 🙂

----------------
\m/_(>_<)_\m/

kamilazat · ‎2024-08-30

Alright! Beautiful that I'm coming to an understanding. Now, I carried the linux machine to the external network and conducted the test between PC (still internal) and linux (now external). As the traffic traverses the gateway, the test shows high bandwidth as expected.

On the other hand, test between gateways are still low bandwidth despite I added all possible combinations to fast_accel table. Let me paste the current config:

[Expert@GW1:0]# cat /opt/CPsuite-R81.20/fw1/conf/fw_fast_accel_export_configuration.conf
fw ctl fast_accel enable
echo importExportMechanism | fw ctl fast_accel delete_all
echo importExportMechanism | fw ctl fast_accel add 192.168.1.254/32 192.168.1.253/32 5201 any
echo importExportMechanism | fw ctl fast_accel add 192.168.1.254/24 192.168.1.253/24 any any
echo importExportMechanism | fw ctl fast_accel add 192.168.1.253/32 192.168.1.254/32 5201 any
echo importExportMechanism | fw ctl fast_accel add 192.168.1.0/24 192.168.1.0/24 5201 any
echo importExportMechanism | fw ctl fast_accel add 192.168.1.253/24 192.168.1.254/24 any any

Since the test between GWs are done via their internal interfaces (.253 and .254) and iperf uses port 5201, I tried to add all possible traffic into it. But still it doesn't help with the bandwidth there. I guess that's not possible. Would you confirm that that's not possible?

Timothy_Hall · ‎2024-08-30

Not possible. You cannot fast_accel traffic that would otherwise go F2F/slowpath; you can configure the rules but they will simply not do anything. Only Medium Path traffic (passive or active streaming) can be forced to the fastpath by fast_accel.

Attend my 60-minute "Be your Own TAC: Part Deux" Presentation
Exclusively at CPX 2025 Las Vegas Tuesday Feb 25th @ 1:00pm

the_rock · ‎2024-08-31

Hey mate,

Ok, here is trivial example I use with people all the time when explaining securexl and its NOT intended to make anyone feel diminished, for the lack of the better term. So say you have 2 mountains and short bridge between them thats wide say 10 metres, about 30 feet. If you were to throw a stone from the bridge itself to any of 2 mountains, it wont be "accelerated", but if you were to throw a stone from one mountain to the other, TRAVERSING the bridge, than it would be "accelerated"

Does that sort of make sense @kamilazat ?

Best,

Andy

PhoneBoy · ‎2024-08-30

As I said above, SecureXL does not accelerate traffic originating from the gateway.
When you run iperf from the gateway itself, it is originating the traffic, which is not accelerated.
When you run the iperf traffic from PC1, the traffic is NOT originating from the gateway, thus can be accelerated (and thus SND utilization is more balanced).

Hope that's clear.

Timothy_Hall · ‎2024-08-30

OK I've looked through everything and there are two factors at play here:

1) The Dynamic Dispatcher

2) Multi-Queue and Receive Side Scaling (RSS)

Dynamic Dispatcher:

For your PC->GW scenario, multiple streams are being launched to the gateway itself. These will go F2F, however as the first iperf stream starts it slams the worker instance core assigned by the Dynamic Dispatcher. The next stream gets assigned to a different worker instance core because the first assigned one is now very busy. So the ultimate effect with multiple F2F iperf streams they get reasonably balanced across the numerous worker instance cores on the GW increasing throughput to 900Mbps.

For your GW-GW scenario, the gateway initiating the iperf will only be able to use one worker core I believe as I don't think the iperf binary is multi-threaded, and traffic initiated by the gateway itself can only be handled by one worker instance core. The target gateway will use the Dynamic Dispatcher to spread the streams across multiple worker instances, but that doesn't change the one-core bottleneck on the sending gateway which is why you can only go about 300Mbps. I'm vaguely remembering that traffic initiated F2F by the gateway itself is always handled on worker instance 0 or perhaps the highest-numbered instance only, but I may be mistaken. I think if you look at the CPU utilization on the *sending* gateway for the GW-GW scenario you'll see a single worker instance slammed to 100%.

Multi-Queue/RSS:

As Dameon said you need to be using the vmxnet3 driver and not e1000 which is usually the default in virtualized environments, and also e1000 will be getting deprecated in R82. vmxnet3 supports Multi-Queue which allows balancing of traffic between SND cores and is the primary mechanism trying to keep the load balanced between SND cores, while the Dynamic Dispatcher handles balancing for the worker instance cores. If you are using e1000 only one SND core can be used for a single interface no matter how many SND instances you have which is why trying to add more didn't help. ethtool -i ethX will show you what driver is being used, it is almost certainly e1000. However the SNDs do not appear to be your primary bottleneck for the GW-GW scenario. The cpview screen Advanced → SecureXL → Network-per-CPU can be used to view how well the SNDs are staying balanced by traffic load via Multi-Queue assuming it is enabled.

Attend my 60-minute "Be your Own TAC: Part Deux" Presentation
Exclusively at CPX 2025 Las Vegas Tuesday Feb 25th @ 1:00pm

kamilazat · ‎2024-09-03

First off, apologies for the late reply. Thank you for your patience and for all your answers!

@the_rock Let me check if I understand your description properly. As long as the traffic goes 'through' the gateway, it is a traversing traffic and is subject to acceleration. If this is the case, then I believe it's safe to expect that the traffic that goes from the internal PC to a host on the external network will be accelerated. I still need your confirmation on this, anyway.

@Timothy_Hall I needed to format my whole PC and set up the lab from scratch, which was good for making sure that all the drivers in all the machines have vmxnet3 drivers and have enough hardware resources for processes.

I believed I understood everything you guys have explained and became self-assured... until I did another test from the internal PC to a linux machine in the external network.

Again, any test between the hosts (both internal and external) and the GW gives around 1Gbps results, and test between the PC and linux gives around 150Mbps. I checked, and only one SND core activates during the traffic. Firewall is any-any-Accept. Full vanilla setup with R81.20 JHF T76.

Am I wrong in understanding that the test between PC and linux 'is' a traversing traffic?

the_rock · ‎2024-09-03

That would make sense to me, yes.

Andy

Timothy_Hall · ‎2024-09-03

Yes your test would be considered traversing or transiting traffic. While the vmxnet3 driver supports the use of Multi-Queue, only one SND can be used in some cases. This is a limitation I found in VMWare Workstation (you didn't mention which type of VMWare you are using), and as far as I can tell is because VMWare Workstation itself does not actually support Receive Side Scaling (RSS). This should not be an issue in ESX or other virtualized environments. I encountered this while setting up the labs for my Gateway Performance Optimization Course.

Attend my 60-minute "Be your Own TAC: Part Deux" Presentation
Exclusively at CPX 2025 Las Vegas Tuesday Feb 25th @ 1:00pm

Are you a member of CheckMates?

iperf test speeds are different on internal and external for QoS testing