Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Kaspars_Zibarts
Employee Employee
Employee
Jump to solution

First impressions R80.30 on gateway - one step forward one (or two back)

Ok, we were finally "forced" to go ahead and upgrade our gateways from R80.10 to R80.30 for fairly small things - we wanted to be ale to use O365 Updatable Object (instead of home grown scripts) and improve Domain (FQDN) object performance issues when all FWK cores were making DNS queries causing a lot of alerts (see https://community.checkpoint.com/t5/General-Topics/Domain-objects-in-R80-10-spamming-DNS/m-p/19786)

Positive things - upgrades were smooth and painless - both on regular gateways and VSX.

All regular gateways seems to be performing as before, but I have to be honest that they are "over-dimensioned" and having rather powerfull HW for the job - 5900 with 16 cores.

VSX though threw couple of surprises.

SXL medium path usage. CPU jumped from <30% to above 50% on the busiest VS that only has FW and IA blades enabled. Ok, there is also VPN but only one connection:

image.png

 

 

 

 

 

 

I haven't spent enough time digging into it but for some reason 1/3 of all connections took medium path whereas before in R80.10 it was nearly all fully accelerated. And most of it was HTTPS (95%) with next most used LDAP-SSL (2%)

I used the SXL fast accelerator feature (thanks @HeikoAnkenbrand  https://community.checkpoint.com/t5/General-Topics/R80-x-Performance-Tuning-Tip-SecureXL-Fast-Accele...) to exclude our proxies and some other nets and you can see that on friday CPU load was reduced by 10% but nowhere near what it used to be.

I just find it impossible to explain why would gateway with only FW blade enabled start to to throw all (by the looks of it) traffic via PXL. And statistics are a bit funny too:

image.png

 

FQDN alerts in logs. I can definitely confirm that only one core now is doing DNS lookups (against all DNS server you have defined, in our case 2). But we are still getting a lot of alerts like these: Firewall - Domain resolving error. Check DNS configuration on the gateway (0)

 

image.png

 

 

 

 

 

Especially after I enabled updatable object for O365 in the rulebase.

As said before - I have not spent too much time on this as we had other "fun" stuff to deal with on our chassis, so it's fairly "raw". I will report more once I had some answers

 

1 Solution

Accepted Solutions
Kaspars_Zibarts
Employee Employee
Employee

I got a "tip off" from inside CP! Verifying if I'm allowed to publish it here but seems like my PXL issue is resolved! Yeehaa! Power of community! Thanks to @Ilya_Yusupov 

 

And the "secret stuff" here:

  1. Regarding medium path – you see most traffic in medium path due to a known bug we have since R80.20, TLS parser is enabled when the following combinations of blades are enabled
    1. FW + IDA or/and Monitoring or/and VPN (exactly our case!)
  2. You can validate me by running the following command - “fw ctl get int tls_parser_enable” it will bring 1
  3. As WA you can disable it by running the following on the fly - “fw ctl set int tls_parser_enable 0” è for permanent disabled put it under $FWDIR/boot/modules/fwkern.conf  tls_parser_enable=0 and reboot.
  4. The above will bring the traffic to be fully accelerated as in previous version.

View solution in original post

25 Replies
Kaspars_Zibarts
Employee Employee
Employee

Just had a closer look at the IPs that are being sent to medium path and all points to O365 / MS.

Strange as O365 object is fully removed now from rules and DB. 

0 Kudos
Kaspars_Zibarts
Employee Employee
Employee

Ouch! Penny just dropped, not even sure how I overlooked the fact that CPUSE upgrade changed our hyper-threading from OFF to ON but (!) kept original manual affinity settings. So not surprising that CPU usage was screwed.  I.e. our multiqueue was running on 6 "half" cores instead of 6 "full"! etc etc

Something to watch out for if you are using manual affinities on VSX!

0 Kudos
HeikoAnkenbrand
Champion Champion
Champion

 

Hi @Kaspars_Zibarts 

The Fast Acceleration (picture 1 green) feature lets you define trusted connections to allow bypassing deep packet inspection on R80.20 JHF103 and above gateways. This feature significantly improves throughput for these trusted high volume connections and reduces CPU consumption.

During my tests, I could reduce CPU (Core) usage about 10%-30%. It is also logically, no more content inspection is executed.

I like that you showed that graphically.👍

fast_accel_3 (1).PNG

 

➜ CCSM Elite, CCME, CCTE
Timothy_Hall
Champion
Champion

Hi Kaspars,

Thanks for your report, a few comments:

1) What do you have set in the Track field of your rules?  If using Detailed or Extended logging this can pull traffic into PSLXL to provide the extra detail being requested.  Found out about this one while writing the third edition of my book.

2) Do you have any services with Protocol Signature enabled in the Network/Firewall policy if using ordered layers, or in the top level of rules if using inline?  This can also cause some of what you are seeing and you should try to stick to simple services (just a port number) in those layers if possible, then call for Protocol Signatures and applications/URLs/content in subsequent layers.

3) As far as that wacky Accelerated Conns percentage, you must have very large amount of stateless traffic, see sk109467: 'Accelerated conns' value is higher than 'Accelerated pkts' in the output of 'fwaccel stat....

4) As you noticed the gateway is much more dependent on speedy DNS starting in R80.20 due to Updatable objects, rad, wsdnsd and a lot of other daemons.

 

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
Kaspars_Zibarts
Employee Employee
Employee

I got a "tip off" from inside CP! Verifying if I'm allowed to publish it here but seems like my PXL issue is resolved! Yeehaa! Power of community! Thanks to @Ilya_Yusupov 

 

And the "secret stuff" here:

  1. Regarding medium path – you see most traffic in medium path due to a known bug we have since R80.20, TLS parser is enabled when the following combinations of blades are enabled
    1. FW + IDA or/and Monitoring or/and VPN (exactly our case!)
  2. You can validate me by running the following command - “fw ctl get int tls_parser_enable” it will bring 1
  3. As WA you can disable it by running the following on the fly - “fw ctl set int tls_parser_enable 0” è for permanent disabled put it under $FWDIR/boot/modules/fwkern.conf  tls_parser_enable=0 and reboot.
  4. The above will bring the traffic to be fully accelerated as in previous version.
Timothy_Hall
Champion
Champion

Wow nice one Kaspars, don't think I would have ever figured that one out.  Will disabling the TLS parser as shown cause issues with other blades should they get enabled later?

 

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
0 Kudos
Kaspars_Zibarts
Employee Employee
Employee

@Timothy_Hall as far as I understood R&D are working on proper long term solution to fix it.

As for FQDN alerts, now I can confirm that O365 updatable object is definitely causing it but only on our busy VSX. I haven't seen the same issue on regular gateways.

According to CP, alert is issued when resolver cannot get response to checkpoint.com query. I took a tcpdump and confirmed that DNS is actually responding but it does generate wsdnsd log, here's example of packet capture and matching wsdnsd.elg entry:

image.png

 

 

 

[wsdnsd 32546]@vsx1-ext[20 Jan 9:10:33] Warning:cp_timed_blocker_handler: A handler [0xf6f213d0] blocked for 44 seconds.
[wsdnsd 32546]@vsx1-ext[20 Jan 9:10:33] Warning:cp_timed_blocker_handler: Handler info: Library [/opt/CPshrd-R80.30/lib/libResolver.so], Function offset [0x2b3d0].
[wsdnsd 32546]@vsx1-ext[20 Jan 9:10:33] Warning:cp_timed_blocker_handler: Handler info: Nearest symbol name [_Z10Sock_InputiPv], offset [0x2b3d0].

 

Still digging through my packet capture to see if i can find any strange names / responses etc

Ilya_Yusupov
Employee
Employee

@Timothy_Hall  - Indeed when you will enabled blades that will require tls parser you will need to remove the WA i suggested.

The WA is currently only for the combinations i sent.

0 Kudos
Timothy_Hall
Champion
Champion

That makes sense, thanks.  Will add this workaround to the upcoming R80.40 addendum but be careful to add caveats for which blades are enabled.

 

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
0 Kudos
Timothy_Hall
Champion
Champion

Note that the long-term fix for the TLS parser being inappropriately invoked with certain blade combinations has been fixed in R80.40 Jumbo HFA Take 78+.  This fix is also going to be backported into R80.20 and R80.30 Jumbo HFAs as well as mentioned in my R80.40 addendum for Max Power 2020.  It is always preferable to have this fix present if possible rather than manually tampering with the TLS parser, as doing so can cause further problems.

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
rob123
Explorer

👍

0 Kudos
Kaspars_Zibarts
Employee Employee
Employee

Just a quick update on FQDN object alerts.

All was caused by missing rule that would permit DNS requests using TCP from gateway. I have added full details at the corresponding thread about FQDN here:

https://community.checkpoint.com/t5/General-Management-Topics/Domain-Objects-FQDN-An-Unofficial-ATRG...

 

Khalid_Aftas
Contributor

we upgraded to 80.30 with latest HF, and even with disabling the parser (we dont use other blade) 49% of trafic is going trough medium path, how can we check further ?

Case atm.

0 Kudos
Timothy_Hall
Champion
Champion

Output of enabled_blades please.

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
0 Kudos
Khalid_Aftas
Contributor

FW,Identity Awerness, IPS.

 

Got info from TAC (canada) that tcp/445 and https will use medium path on 80.30 regardless of PSL parser disabled or not.

0 Kudos
Kaspars_Zibarts
Employee Employee
Employee

That's correct - file shares (445) is forced via PXL. There's a procedure available to exclude it but it's fairly complex from memory

0 Kudos
Khalid_Aftas
Contributor

well we used fast_accel to bypass the big flows we had and PXL % droped to 20% now.

That traffic is now overloading SND core .... and fwk are still quiet high....

0 Kudos
Khalid_Aftas
Contributor

More than 4 months laters, multiple sessions with TAC, and still no solutions yet (one special hotfix that did not help), this seems to be a known issue and TAC is not able to sync with r&d to get a fix ? wondering if someone in the community was able to get a fix at the end.

0 Kudos
Ilya_Yusupov
Employee
Employee

Hi @Khalid_Aftas ,

 

the fix exist in R80.30 JHF take 219 and in R80.20 JHF 141.

 

Thanks,

Ilya 

Khalid_Aftas
Contributor

Hi Illya, are you referring to The fix from sk166700 is integrated into take 219 of the jumbo for R80.30 as  (look for the article or PRJ-14368, PRJ-15747, or PRHF-10818 in sk153152) Because in case 6-0002031094 we had a private fix for those on our current JHF. I would like to avoid doing a change and giving hope to my client for nothing. Thanks a lot

0 Kudos
Ilya_Yusupov
Employee
Employee

Hi @Khalid_Aftas ,

 

Yes I'm referring tp fix described in SK166700.

 

Thanks,

Ilya 

0 Kudos
Khalid_Aftas
Contributor

Hi Illya, after JHF 219 installation, the issue is still the same, i still need to force use fastaccel to accelerate tcp 443/445 otherwise the fwks are saturated.

0 Kudos
Ilya_Yusupov
Employee
Employee

Hi @Khalid_Aftas ,

 

You mention port 445 which is SMB protocol not sure it is same issue as i'm sure the TLS parser issue described in the SK solved in JHF 219.

For SMB protocol if you have majority of such traffic the only possible solution i'm aware of is putting in under fast_accel.

Did you tried to put only 445 under fast_accel?

0 Kudos
Khalid_Aftas
Contributor

I used fast accel with combination of specifics source/destination and smb (the top talkers) and saw a decrease directly.

Using fastaccel for all smb traffic is another story, with this workaround all the security features are useless ...

 

Is this a known limitation on r80.30 ? that SMB traffic is not accelerated ? because we had 0 issues on 77.30

0 Kudos
Ilya_Yusupov
Employee
Employee

Hi @Khalid_Aftas ,

As far as I know in R77.30 we had same behavior, I suggest to open TAC case for further investigation.

 

Thanks,

Ilya 

0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events