Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Luis_Miguel_Mig
Advisor

ips bypass

I have started to get ips bypass alerts  since I upgraded to r80.40 take 91. I didn't use to get IPS bypass events in take 87.
There is almost not traffic in my environment - 20 concurrent tcp sessions coming from one host I use for testing/browsing - and the cpu is idle most of the time.

 

I have 6 cores - 3 workers. The average cpu is 2%, occasionally goes to 20% but looking at cpview I have notices spikes that match the IPS bypass alerts - see below.


I am certain the issue has to something to do with take 91 but I was wondering if there is a way to get more verbose logging to see what is going on when the cpu usage goes over the threshold.


 I  am running  URL filtering, Anti bot , Antivirus and IPS enabled. I have disabled HTTPS inspection recently. 
I am getting about 90% of traffic through the slow path.

 

 

Spikes |
|--------------------------------------------------------------------------------------------------------------------------------------------------|
| CPU Spikes |
|--------------------------------------------------------------------------------------------------------------------------------------------------|
| Overview (last minute): |
| |
| Total Spikes: 3 |
| Average Spike Duration (Sec): 11 |
| Average Spike Usage: 95% |
| ------------------------------------------------------------------------------------------------------------------------------------------------ |
| Top Spikes (last minute): |
| |
| Start Time CPU Spike Duration (Sec) Average Usage |
| 18Feb2021 9:07:36 5 25 100% |
| 18Feb2021 9:08:41 5 5 93% |
| 18Feb2021 9:08:51 2 5 92% |
|

0 Kudos
17 Replies
shais
Employee
Employee

Hi,
We have a number of ways to understand this change, will appreciate doing a remote session with you to understand the issue -  I will contact you directly to arrange a remote session.

Please open an SR as well 

Luis_Miguel_Mig
Advisor

Thanks. 
I would say the securexl is okay. The traffic that goes through the slow path is expected, right?

[Expert@fw1:0]# fwaccel stats -p
F2F packets:
--------------
Violation Packets Violation Packets
-------------------- --------------- -------------------- ---------------
pkt has IP options 0 ICMP miss conn 0
TCP-SYN miss conn 0 TCP-other miss conn 50
UDP miss conn 197 other miss conn 0
VPN returned F2F 0 uni-directional viol 0
possible spoof viol 0 TCP state viol 0
SCTP state affecting 0 out if not def/accl 0
bridge, src=dst 0 routing decision err 0
sanity checks failed 0 fwd to non-pivot 0
broadcast/multicast 0 cluster message 10289
cluster forward 0 chain forwarding 0
F2V conn match pkts 0 general reason 0

shais
Employee
Employee

You mentioned that most of your traffic is the slow path - this can trigger the IPS bypass as it will cause a high load on the CPU

The statistics you showed above mean you don't have any violations in SecureXL which is good but it's unrelated to the slow path.
You can see the slow path rate at "fwaccel stats -s" 

Luis_Miguel_Mig
Advisor

When my testing vm is down and therefore there is only mgmt traffic meaning (gateways cluster messages, ntp, dns, snmp, syslog, http request to the checkpoint cloud through the proxy, etc) almost 100% of the traffic is not accelerated. Is this behavior expected? Should any of this traffic be accelerated?

When I browse a bit with my test vm I see the accelerated packets increase. See below


 

testing vm  down

[Expert@fw1:0]# fwaccel stats -s
Accelerated conns/Total conns : 0/0 (0%)
Accelerated pkts/Total pkts : 0/3302 (0%)
F2Fed pkts/Total pkts : 3302/3302 (100%)
F2V pkts/Total pkts : 0/3302 (0%)
CPASXL pkts/Total pkts : 0/3302 (0%)
PSLXL pkts/Total pkts : 0/3302 (0%)
CPAS pipeline pkts/Total pkts : 0/3302 (0%)
PSL pipeline pkts/Total pkts : 0/3302 (0%)
CPAS inline pkts/Total pkts : 0/3302 (0%)
PSL inline pkts/Total pkts : 0/3302 (0%)
QOS inbound pkts/Total pkts : 0/3302 (0%)
QOS outbound pkts/Total pkts : 0/3302 (0%)
Corrected pkts/Total pkts : 0/3302 (0%)
[Expert@hqfw2b:0]# fwaccel stat
+---------------------------------------------------------------------------------+
|Id|Name |Status |Interfaces |Features |
+---------------------------------------------------------------------------------+
|0 |SND |enabled |eth0,eth2,eth3,eth5,eth6 |Acceleration,Cryptography |
| | | | |Crypto: Tunnel,UDPEncap,MD5, |
| | | | |SHA1,NULL,3DES,DES,AES-128, |
| | | | |AES-256,ESP,LinkSelection, |
| | | | |DynamicVPN,NatTraversal, |
| | | | |AES-XCBC,SHA256,SHA384 |
+---------------------------------------------------------------------------------+

Accept Templates : enabled
Drop Templates : disabled
NAT Templates : enabled

 

testing vm is up

[Expert@hqfw2b:0]# fwaccel stats -s
Accelerated conns/Total conns : 0/97 (0%)
Accelerated pkts/Total pkts : 4215/9061 (46%)
F2Fed pkts/Total pkts : 4846/9061 (53%)
F2V pkts/Total pkts : 110/9061 (1%)
CPASXL pkts/Total pkts : 0/9061 (0%)
PSLXL pkts/Total pkts : 4215/9061 (46%)
CPAS pipeline pkts/Total pkts : 0/9061 (0%)
PSL pipeline pkts/Total pkts : 0/9061 (0%)
CPAS inline pkts/Total pkts : 0/9061 (0%)
PSL inline pkts/Total pkts : 0/9061 (0%)
QOS inbound pkts/Total pkts : 0/9061 (0%)
QOS outbound pkts/Total pkts : 0/9061 (0%)
Corrected pkts/Total pkts : 0/9061 (0%)

 

 

0 Kudos
Luis_Miguel_Mig
Advisor

Just reading at sk32578
When SecureXL is enabled, all packets should be accelerated, except packets that match the following conditions:
All packets that match a rule, whose source or destination is the Security Gateway itself.

So I guess in my environment with only one user establishing connections, the percentage of  accelerated traffic is expected to be low.
And if this user is down, then pretty much 100% of the packets should be non accelerated.
I guess it would still be interesting to double check it if it is possible.

0 Kudos
shais
Employee
Employee

When your testing VM is down, the traffic you have is only local connections - this is a slow path (by design)

So it looks like you don't have any issue here related to SecureXL but indeed something triggers a high load which cause IPS to enter bypass - we will continue offline to analyze it

Timothy_Hall
Champion
Champion

Generally enabling the IPS bypass feature is not a good idea.  When monitoring the CPUs if even one of them hits the CPU % threshold, IPS functions on ALL CPUs are bypassed.   This was fine when firewalls only had a few cores, but not really appropriate with many cores. Really the IPS Bypass feature should average the CPU utilization of all the workers when making the decision of whether to bypass.  See here:

sk107334: IPS Bypass is triggered even when CPU utilization is not over the defined threshold

As @shais said it looks like something in T91 is causing occasional high CPU and triggering the IPS bypass; so the IPS bypass activating is just a symptom of your problem but not the cause.  Normally the next step is to figure out in what mode the CPU spikes are (kernel vs. process space - us/sy/si/hi in top); you can use sar for that but it looks like the spikes are too short for sar to reliably pick up.  You'll have to catch whatever it is "in the act" with top, or look in the spike detective logs here: /var/log/spike_detective/spike_detective.log

 

New 2021 IPS/AV/ABOT Immersion Self-Guided Video Series
now available at http://www.maxpowerfirewalls.com
0 Kudos
Luis_Miguel_Mig
Advisor

Yeah absolutely.
/var/log/spike_detective/spike_detective.log doesn't say too much though. Just the duration of the spike and the core id.
Sar seems to capture stats only every 10 mins and the spikes last between 10 and 20 secs.

 

 

0 Kudos
Zolo
Participant

@Luis_Miguel_MigI have the same problem, did you find the root cause?

0 Kudos
Luis_Miguel_Mig
Advisor

It happens when the gateway loads the antibot/antivirus signatures  at the times where it is scheduled in the smartconsole configuration. You can reproduce it with fw load_sigs.

0 Kudos
Timothy_Hall
Champion
Champion

This is expected behavior but it only spikes a single core, so the chances of affecting traffic handling are pretty low: sk174347: Software blade updates may cause single CPU spikes

New 2021 IPS/AV/ABOT Immersion Self-Guided Video Series
now available at http://www.maxpowerfirewalls.com
0 Kudos
Luis_Miguel_Mig
Advisor

Yeah, it is a known issue now and it can affect only the traffic going through the firewall instance/cpu core were the signature loading process is running. 
So if you have 4 example 4 cores/fw instances, 25% of the traffic can be affected.

0 Kudos
Zolo
Participant

@Timothy_Hall: Thank you for the SK. But the customer's problem is that the IPS is going to Bypass and the traffic is not inspected by IPS for 1 minute because of a litle signature update on 72 core appliance with low network traffic. BTW the Anti-Bot/Anti-Virus Blades are off, only the IPS blade is on (and FW of course).

Is this an expected behavior and we can't change it, or maybe I can turn off signature updates somehow?

0 Kudos
Zolo
Participant

@Luis_Miguel_Mig: Thank you !!! That was my guess, but I did not find how to reproduce by hand. Thanks again 😊

0 Kudos
Luis_Miguel_Mig
Advisor

I was wondering if we could use   affinity settings to make this process run in a specific cpu core.  I have more cpu cores than fw workers

0 Kudos
Timothy_Hall
Champion
Champion

You could cause affinity to do that, but it won't matter to the IPS Bypass feature as all it takes is one saturated core (regardless of type) for IPS to get disabled.  The IPS Bypass feature was a good idea in the days when firewalls only had 1-2 cores, not so much in today's world.

New 2021 IPS/AV/ABOT Immersion Self-Guided Video Series
now available at http://www.maxpowerfirewalls.com
0 Kudos
Luis_Miguel_Mig
Advisor

That would be okay with me. I don't mind to get a IPS bypass. I may disable the IPS bypass feature altogether.
But how could set the affinity of the fw process to a specific core so fw load_sigs run on a core free of fw_workers?

0 Kudos