- CheckMates
- :
- Products
- :
- General Topics
- :
- Multiple cores for medium path traffic
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Are you a member of CheckMates?
×- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Multiple cores for medium path traffic
Hi,
I'm doing some throughput test on a vSEC gateway in network mode (basically just a VM with GAiA installed afaik), on a NSX/ESXi environment .
The test is done with a basic setup, one gateway and two Ubuntu VM's acting as client / server. To measure througput I'm using Iperf (TCP, basic settings).
The problem is when I enable both IPS and Application Awareness. With both blades enabled I'm only able to get a throughput around 1.5 Gbps. With just one of the blades its around 5 Gbps, without any blades (except FW) its 6 Gbps, which seems to be a driver limit (e1000 vs only 4.5 Gbps on VMXNET3)
I have tried to play around with the core allocation, but without luck. There are no difference if the fw workers have a dedicated core, or are able to use all available cores.
According to fwaccel stats -s, above 90% of the traffic hits PXL.
So my question is, is it possible to split the IPS and APP awareness processes to different CPU's or just load-share the PXL part even more?
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
With a single iperf stream like that, all packets for that stream's connection must be processed on the same Firewall Worker core regardless of whether the Dynamic Dispatcher is on or not. Letting the packets get handled by multiple workers would raise of the specter of out-of-order delivery which is a complete disaster from a TCP performance perspective.
Try a IPS profile of Default_Protection (called "Optimized" in R80+) which may help but the presence of APCL guarantees all that traffic will go Medium path. Also make sure you do not have a an explicit Accept cleanup rule at the bottom of the APCL policy, and avoid using "Any" in the Source/Dest of any APCL rule, use explicit network objects in the Source and "Internet" in the Destination. That's about it.
--
My book "Max Power: Check Point Firewall Performance Optimization"
now available via http://maxpowerfirewalls.com.
March 27th with sessions for both the EMEA and Americas time zones
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Not sure how that would be possible since IPS and Application Control use the same infrastructure behind the scenes.
In any case, what version of code are we talking here, since you didn't mention that in the post?
What is the configuration of the vSEC instance (number of vCores, ram allocated, etc)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Phoneboy,
Sorry forgot to mention that, its the R77.30 release, build 060 (basic OVF version).
I have tested with a different amount of vCPU's and memory, currently I have 4 vCPU and 4096GB RAM.
Currently CPU allocation (I know the allocation is messy, but I tried to isolate fw_3 and make it share CPU2 and CPU3):
fw ctl affinity -l -v -r
CPU 0: eth0 (irq 75) eth4 (irq 83) eth1 (irq 99) eth2 (irq 115) eth3 (irq 123) eth6 (irq 91) eth5 (irq 107)
fw_0
mpdaemon vpnd rad fwd cprid cpd
CPU 1: fw_1 fw_2
CPU 2: fw_3
CPU 3: fw_3
All:
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, CPU allocation does not look great at all. You are mixing SXL (accelerated) traffic on cpu0 with fwk. Not a great idea. Not in traditional firewall at least. I haven't worked much with vSEC so I won't say too much but try separating interfaces from fwk instances and allocate more cores if you can. CP is all about CPU after all..
Out of curiousity, do you have CPU stats when it maxes out? All flat out? Check if dynamic core allocation is enabled in your version of R77.30. If you can add two more CPUs to your VM and then use 0&1 as generic cores and 2-5 as statically allocated cores for fwk0-3.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I know the allocation is crap for a production firewall, but this is only used for test purpose.
I could assign more cores, but I doubt it would help me, because it's only fw_worker_3 which is using the CPU.
Top during a Iperf run, its random if its CPU2 or CPU3 which is used, but always only one of them, so it does some kind of sharing, but it seems like one session can bring it to the ground, if it matches all blades:
[Expert@mazcptest01:0]# top
top - 09:50:32 up 4 days, 53 min, 2 users, load average: 1.00, 0.52, 0.22
Tasks: 129 total, 5 running, 124 sleeping, 0 stopped, 0 zombie
Cpu0 : 0.0%us, 0.3%sy, 0.0%ni, 76.4%id, 0.0%wa, 1.3%hi, 21.9%si, 0.0%st
Cpu1 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu2 : 0.0%us, 0.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi,100.0%si, 0.0%st
Cpu3 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 3907592k total, 2878280k used, 1029312k free, 219768k buffers
Swap: 2128604k total, 0k used, 2128604k free, 1398696k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4495 admin 18 0 0 0 0 R 99 0.0 10:38.32 fw_worker_3
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
> I could assign more cores, but I doubt it would help me, because it's only fw_worker_3 which is using the CPU.
Load the latest GA jumbo HFA onto the gateway, then turn on the Dynamic Dispatcher (fw ctl multik set_mode 9), reboot the system and try your test again. Dynamic Dispatcher is enabled by default on R80.10+ gateway but off by default in R77.30.
--
My book "Max Power: Check Point Firewall Performance Optimization"
now available via http://maxpowerfirewalls.com.
March 27th with sessions for both the EMEA and Americas time zones
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Tim,
I did try the Dynamic Dispatcher on both mode 9 and 4, no big difference at all - at least not with a single Iperf stream.
But to be sure, I just installed the latest HFA:
Was:
Check_Point_R77_30_JUMBO_HF_1_Bundle_T225_FULL.tgz
Now:
Check_Point_R77_30_JUMBO_HF_1_Bundle_T292_FULL.tgz
But it's still the same, CPU at 100% and around 1.5 Gbps throughput, but if I use multiple streams, I utilise more CPU's now! (not sure that were the case with take 225, but it might be)
Now I'm able to get around 2.6 Gbps in total, and I guess it would be higher with more CPU's/workers.
Do you guys think its possible to get more than 1.5 Gbps "per core" with IPS+APP enabled?
Btw. I already got your book by my side, great work learnt a lot form it!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I think 1.5 Gbps per core WITH those blades is more then I would expect.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
With a single iperf stream like that, all packets for that stream's connection must be processed on the same Firewall Worker core regardless of whether the Dynamic Dispatcher is on or not. Letting the packets get handled by multiple workers would raise of the specter of out-of-order delivery which is a complete disaster from a TCP performance perspective.
Try a IPS profile of Default_Protection (called "Optimized" in R80+) which may help but the presence of APCL guarantees all that traffic will go Medium path. Also make sure you do not have a an explicit Accept cleanup rule at the bottom of the APCL policy, and avoid using "Any" in the Source/Dest of any APCL rule, use explicit network objects in the Source and "Internet" in the Destination. That's about it.
--
My book "Max Power: Check Point Firewall Performance Optimization"
now available via http://maxpowerfirewalls.com.
March 27th with sessions for both the EMEA and Americas time zones
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
IPS and APP profile are already as described (which you also mention in Max Power ).
But, I managed to get above 5Gbps (6Gbps is the limit with the e1000 driver) with 8 Cores, 6 fw workers (one core each) and 1 core for each interface used. Iperf were also set to use 6 parallel streams.
I got what I needed, 1.5 Gbps per Core is OK, and it seem that the dynamic dispatcher does it job OK. Its a bit random how well the connections are shared between the cores, but if I eg. uses 12 streams, all fw workers are doing its job.
Iperf output:
root@smokeping01:~# iperf -c 192.168.1.3 -p 8080 -t 20 -P 12
------------------------------------------------------------
Client connecting to 192.168.1.3, TCP port 8080
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[ 14] local 192.168.2.3 port 35138 connected with 192.168.1.3 port 8080
[ 5] local 192.168.2.3 port 35118 connected with 192.168.1.3 port 8080
[ 4] local 192.168.2.3 port 35120 connected with 192.168.1.3 port 8080
[ 6] local 192.168.2.3 port 35122 connected with 192.168.1.3 port 8080
[ 7] local 192.168.2.3 port 35124 connected with 192.168.1.3 port 8080
[ 8] local 192.168.2.3 port 35126 connected with 192.168.1.3 port 8080
[ 10] local 192.168.2.3 port 35128 connected with 192.168.1.3 port 8080
[ 11] local 192.168.2.3 port 35130 connected with 192.168.1.3 port 8080
[ 9] local 192.168.2.3 port 35132 connected with 192.168.1.3 port 8080
[ 13] local 192.168.2.3 port 35136 connected with 192.168.1.3 port 8080
[ 12] local 192.168.2.3 port 35134 connected with 192.168.1.3 port 8080
[ 3] local 192.168.2.3 port 35116 connected with 192.168.1.3 port 8080
[ ID] Interval Transfer Bandwidth
[ 5] 0.0-20.0 sec 1.03 GBytes 441 Mbits/sec
[ 6] 0.0-20.0 sec 1.22 GBytes 525 Mbits/sec
[ 3] 0.0-20.0 sec 1.16 GBytes 500 Mbits/sec
[ 14] 0.0-20.0 sec 722 MBytes 303 Mbits/sec
[ 4] 0.0-20.0 sec 1.12 GBytes 480 Mbits/sec
[ 8] 0.0-20.0 sec 1.16 GBytes 499 Mbits/sec
[ 10] 0.0-20.0 sec 1.12 GBytes 482 Mbits/sec
[ 13] 0.0-20.0 sec 780 MBytes 327 Mbits/sec
[ 12] 0.0-20.0 sec 785 MBytes 329 Mbits/sec
[ 7] 0.0-20.0 sec 1.11 GBytes 476 Mbits/sec
[ 11] 0.0-20.0 sec 1.16 GBytes 497 Mbits/sec
[ 9] 0.0-20.0 sec 988 MBytes 414 Mbits/sec
[SUM] 0.0-20.0 sec 12.3 GBytes 5.27 Gbits/sec
I guess that's all. Thanks for the help and replys!
