Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Luis_Miguel_Mig
Advisor

ips bypass

I have started to get ips bypass alerts  since I upgraded to r80.40 take 91. I didn't use to get IPS bypass events in take 87.
There is almost not traffic in my environment - 20 concurrent tcp sessions coming from one host I use for testing/browsing - and the cpu is idle most of the time.

 

I have 6 cores - 3 workers. The average cpu is 2%, occasionally goes to 20% but looking at cpview I have notices spikes that match the IPS bypass alerts - see below.


I am certain the issue has to something to do with take 91 but I was wondering if there is a way to get more verbose logging to see what is going on when the cpu usage goes over the threshold.


 I  am running  URL filtering, Anti bot , Antivirus and IPS enabled. I have disabled HTTPS inspection recently. 
I am getting about 90% of traffic through the slow path.

 

 

Spikes |
|--------------------------------------------------------------------------------------------------------------------------------------------------|
| CPU Spikes |
|--------------------------------------------------------------------------------------------------------------------------------------------------|
| Overview (last minute): |
| |
| Total Spikes: 3 |
| Average Spike Duration (Sec): 11 |
| Average Spike Usage: 95% |
| ------------------------------------------------------------------------------------------------------------------------------------------------ |
| Top Spikes (last minute): |
| |
| Start Time CPU Spike Duration (Sec) Average Usage |
| 18Feb2021 9:07:36 5 25 100% |
| 18Feb2021 9:08:41 5 5 93% |
| 18Feb2021 9:08:51 2 5 92% |
|

0 Kudos
8 Replies
shais
Employee
Employee

Hi,
We have a number of ways to understand this change, will appreciate doing a remote session with you to understand the issue -  I will contact you directly to arrange a remote session.

Please open an SR as well 

0 Kudos
Luis_Miguel_Mig
Advisor

Thanks. 
I would say the securexl is okay. The traffic that goes through the slow path is expected, right?

[Expert@fw1:0]# fwaccel stats -p
F2F packets:
--------------
Violation Packets Violation Packets
-------------------- --------------- -------------------- ---------------
pkt has IP options 0 ICMP miss conn 0
TCP-SYN miss conn 0 TCP-other miss conn 50
UDP miss conn 197 other miss conn 0
VPN returned F2F 0 uni-directional viol 0
possible spoof viol 0 TCP state viol 0
SCTP state affecting 0 out if not def/accl 0
bridge, src=dst 0 routing decision err 0
sanity checks failed 0 fwd to non-pivot 0
broadcast/multicast 0 cluster message 10289
cluster forward 0 chain forwarding 0
F2V conn match pkts 0 general reason 0

0 Kudos
shais
Employee
Employee

You mentioned that most of your traffic is the slow path - this can trigger the IPS bypass as it will cause a high load on the CPU

The statistics you showed above mean you don't have any violations in SecureXL which is good but it's unrelated to the slow path.
You can see the slow path rate at "fwaccel stats -s" 

0 Kudos
Luis_Miguel_Mig
Advisor

When my testing vm is down and therefore there is only mgmt traffic meaning (gateways cluster messages, ntp, dns, snmp, syslog, http request to the checkpoint cloud through the proxy, etc) almost 100% of the traffic is not accelerated. Is this behavior expected? Should any of this traffic be accelerated?

When I browse a bit with my test vm I see the accelerated packets increase. See below


 

testing vm  down

[Expert@fw1:0]# fwaccel stats -s
Accelerated conns/Total conns : 0/0 (0%)
Accelerated pkts/Total pkts : 0/3302 (0%)
F2Fed pkts/Total pkts : 3302/3302 (100%)
F2V pkts/Total pkts : 0/3302 (0%)
CPASXL pkts/Total pkts : 0/3302 (0%)
PSLXL pkts/Total pkts : 0/3302 (0%)
CPAS pipeline pkts/Total pkts : 0/3302 (0%)
PSL pipeline pkts/Total pkts : 0/3302 (0%)
CPAS inline pkts/Total pkts : 0/3302 (0%)
PSL inline pkts/Total pkts : 0/3302 (0%)
QOS inbound pkts/Total pkts : 0/3302 (0%)
QOS outbound pkts/Total pkts : 0/3302 (0%)
Corrected pkts/Total pkts : 0/3302 (0%)
[Expert@hqfw2b:0]# fwaccel stat
+---------------------------------------------------------------------------------+
|Id|Name |Status |Interfaces |Features |
+---------------------------------------------------------------------------------+
|0 |SND |enabled |eth0,eth2,eth3,eth5,eth6 |Acceleration,Cryptography |
| | | | |Crypto: Tunnel,UDPEncap,MD5, |
| | | | |SHA1,NULL,3DES,DES,AES-128, |
| | | | |AES-256,ESP,LinkSelection, |
| | | | |DynamicVPN,NatTraversal, |
| | | | |AES-XCBC,SHA256,SHA384 |
+---------------------------------------------------------------------------------+

Accept Templates : enabled
Drop Templates : disabled
NAT Templates : enabled

 

testing vm is up

[Expert@hqfw2b:0]# fwaccel stats -s
Accelerated conns/Total conns : 0/97 (0%)
Accelerated pkts/Total pkts : 4215/9061 (46%)
F2Fed pkts/Total pkts : 4846/9061 (53%)
F2V pkts/Total pkts : 110/9061 (1%)
CPASXL pkts/Total pkts : 0/9061 (0%)
PSLXL pkts/Total pkts : 4215/9061 (46%)
CPAS pipeline pkts/Total pkts : 0/9061 (0%)
PSL pipeline pkts/Total pkts : 0/9061 (0%)
CPAS inline pkts/Total pkts : 0/9061 (0%)
PSL inline pkts/Total pkts : 0/9061 (0%)
QOS inbound pkts/Total pkts : 0/9061 (0%)
QOS outbound pkts/Total pkts : 0/9061 (0%)
Corrected pkts/Total pkts : 0/9061 (0%)

 

 

0 Kudos
Luis_Miguel_Mig
Advisor

Just reading at sk32578
When SecureXL is enabled, all packets should be accelerated, except packets that match the following conditions:
All packets that match a rule, whose source or destination is the Security Gateway itself.

So I guess in my environment with only one user establishing connections, the percentage of  accelerated traffic is expected to be low.
And if this user is down, then pretty much 100% of the packets should be non accelerated.
I guess it would still be interesting to double check it if it is possible.

0 Kudos
shais
Employee
Employee

When your testing VM is down, the traffic you have is only local connections - this is a slow path (by design)

So it looks like you don't have any issue here related to SecureXL but indeed something triggers a high load which cause IPS to enter bypass - we will continue offline to analyze it

0 Kudos
Timothy_Hall
Champion
Champion

Generally enabling the IPS bypass feature is not a good idea.  When monitoring the CPUs if even one of them hits the CPU % threshold, IPS functions on ALL CPUs are bypassed.   This was fine when firewalls only had a few cores, but not really appropriate with many cores. Really the IPS Bypass feature should average the CPU utilization of all the workers when making the decision of whether to bypass.  See here:

sk107334: IPS Bypass is triggered even when CPU utilization is not over the defined threshold

As @shais said it looks like something in T91 is causing occasional high CPU and triggering the IPS bypass; so the IPS bypass activating is just a symptom of your problem but not the cause.  Normally the next step is to figure out in what mode the CPU spikes are (kernel vs. process space - us/sy/si/hi in top); you can use sar for that but it looks like the spikes are too short for sar to reliably pick up.  You'll have to catch whatever it is "in the act" with top, or look in the spike detective logs here: /var/log/spike_detective/spike_detective.log

 

"Max Capture: Know Your Packets" Video Series
now available at http://www.maxpowerfirewalls.com
0 Kudos
Luis_Miguel_Mig
Advisor

Yeah absolutely.
/var/log/spike_detective/spike_detective.log doesn't say too much though. Just the duration of the spike and the core id.
Sar seems to capture stats only every 10 mins and the spikes last between 10 and 20 secs.

 

 

0 Kudos