Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Authority
Authority

First impressions R80.30 on gateway - one step forward one (or two back)

Jump to solution

Ok, we were finally "forced" to go ahead and upgrade our gateways from R80.10 to R80.30 for fairly small things - we wanted to be ale to use O365 Updatable Object (instead of home grown scripts) and improve Domain (FQDN) object performance issues when all FWK cores were making DNS queries causing a lot of alerts (see https://community.checkpoint.com/t5/General-Topics/Domain-objects-in-R80-10-spamming-DNS/m-p/19786)

Positive things - upgrades were smooth and painless - both on regular gateways and VSX.

All regular gateways seems to be performing as before, but I have to be honest that they are "over-dimensioned" and having rather powerfull HW for the job - 5900 with 16 cores.

VSX though threw couple of surprises.

SXL medium path usage. CPU jumped from <30% to above 50% on the busiest VS that only has FW and IA blades enabled. Ok, there is also VPN but only one connection:

image.png

 

 

 

 

 

 

I haven't spent enough time digging into it but for some reason 1/3 of all connections took medium path whereas before in R80.10 it was nearly all fully accelerated. And most of it was HTTPS (95%) with next most used LDAP-SSL (2%)

I used the SXL fast accelerator feature (thanks @HeikoAnkenbrand  https://community.checkpoint.com/t5/General-Topics/R80-x-Performance-Tuning-Tip-SecureXL-Fast-Accele...) to exclude our proxies and some other nets and you can see that on friday CPU load was reduced by 10% but nowhere near what it used to be.

I just find it impossible to explain why would gateway with only FW blade enabled start to to throw all (by the looks of it) traffic via PXL. And statistics are a bit funny too:

image.png

 

FQDN alerts in logs. I can definitely confirm that only one core now is doing DNS lookups (against all DNS server you have defined, in our case 2). But we are still getting a lot of alerts like these: Firewall - Domain resolving error. Check DNS configuration on the gateway (0)

 

image.png

 

 

 

 

 

Especially after I enabled updatable object for O365 in the rulebase.

As said before - I have not spent too much time on this as we had other "fun" stuff to deal with on our chassis, so it's fairly "raw". I will report more once I had some answers

 

1 Solution

Accepted Solutions
Highlighted
Authority
Authority

I got a "tip off" from inside CP! Verifying if I'm allowed to publish it here but seems like my PXL issue is resolved! Yeehaa! Power of community! Thanks to @Ilya_Yusupov 

 

And the "secret stuff" here:

  1. Regarding medium path – you see most traffic in medium path due to a known bug we have since R80.20, TLS parser is enabled when the following combinations of blades are enabled
    1. FW + IDA or/and Monitoring or/and VPN (exactly our case!)
  2. You can validate me by running the following command - “fw ctl get int tls_parser_enable” it will bring 1
  3. As WA you can disable it by running the following on the fly - “fw ctl set int tls_parser_enable 0” è for permanent disabled put it under $FWDIR/boot/modules/fwkern.conf  tls_parser_enable=0 and reboot.
  4. The above will bring the traffic to be fully accelerated as in previous version.

View solution in original post

21 Replies
Highlighted
Authority
Authority

Just had a closer look at the IPs that are being sent to medium path and all points to O365 / MS.

Strange as O365 object is fully removed now from rules and DB. 

0 Kudos
Highlighted
Authority
Authority

Ouch! Penny just dropped, not even sure how I overlooked the fact that CPUSE upgrade changed our hyper-threading from OFF to ON but (!) kept original manual affinity settings. So not surprising that CPU usage was screwed.  I.e. our multiqueue was running on 6 "half" cores instead of 6 "full"! etc etc

Something to watch out for if you are using manual affinities on VSX!

0 Kudos
Highlighted

 

Hi @Kaspars_Zibarts 

The Fast Acceleration (picture 1 green) feature lets you define trusted connections to allow bypassing deep packet inspection on R80.20 JHF103 and above gateways. This feature significantly improves throughput for these trusted high volume connections and reduces CPU consumption.

During my tests, I could reduce CPU (Core) usage about 10%-30%. It is also logically, no more content inspection is executed.

I like that you showed that graphically.👍

fast_accel_3 (1).PNG

 

Tags (1)
Highlighted
Champion
Champion

Hi Kaspars,

Thanks for your report, a few comments:

1) What do you have set in the Track field of your rules?  If using Detailed or Extended logging this can pull traffic into PSLXL to provide the extra detail being requested.  Found out about this one while writing the third edition of my book.

2) Do you have any services with Protocol Signature enabled in the Network/Firewall policy if using ordered layers, or in the top level of rules if using inline?  This can also cause some of what you are seeing and you should try to stick to simple services (just a port number) in those layers if possible, then call for Protocol Signatures and applications/URLs/content in subsequent layers.

3) As far as that wacky Accelerated Conns percentage, you must have very large amount of stateless traffic, see sk109467: 'Accelerated conns' value is higher than 'Accelerated pkts' in the output of 'fwaccel stat....

4) As you noticed the gateway is much more dependent on speedy DNS starting in R80.20 due to Updatable objects, rad, wsdnsd and a lot of other daemons.

 

R80.40 addendum for book "Max Power 2020" now available
for free download at http://www.maxpowerfirewalls.com
Highlighted
Authority
Authority

I got a "tip off" from inside CP! Verifying if I'm allowed to publish it here but seems like my PXL issue is resolved! Yeehaa! Power of community! Thanks to @Ilya_Yusupov 

 

And the "secret stuff" here:

  1. Regarding medium path – you see most traffic in medium path due to a known bug we have since R80.20, TLS parser is enabled when the following combinations of blades are enabled
    1. FW + IDA or/and Monitoring or/and VPN (exactly our case!)
  2. You can validate me by running the following command - “fw ctl get int tls_parser_enable” it will bring 1
  3. As WA you can disable it by running the following on the fly - “fw ctl set int tls_parser_enable 0” è for permanent disabled put it under $FWDIR/boot/modules/fwkern.conf  tls_parser_enable=0 and reboot.
  4. The above will bring the traffic to be fully accelerated as in previous version.

View solution in original post

Highlighted
Champion
Champion

Wow nice one Kaspars, don't think I would have ever figured that one out.  Will disabling the TLS parser as shown cause issues with other blades should they get enabled later?

 

R80.40 addendum for book "Max Power 2020" now available
for free download at http://www.maxpowerfirewalls.com
0 Kudos
Highlighted
Authority
Authority

@Timothy_Hall as far as I understood R&D are working on proper long term solution to fix it.

As for FQDN alerts, now I can confirm that O365 updatable object is definitely causing it but only on our busy VSX. I haven't seen the same issue on regular gateways.

According to CP, alert is issued when resolver cannot get response to checkpoint.com query. I took a tcpdump and confirmed that DNS is actually responding but it does generate wsdnsd log, here's example of packet capture and matching wsdnsd.elg entry:

image.png

 

 

 

[wsdnsd 32546]@vsx1-ext[20 Jan 9:10:33] Warning:cp_timed_blocker_handler: A handler [0xf6f213d0] blocked for 44 seconds.
[wsdnsd 32546]@vsx1-ext[20 Jan 9:10:33] Warning:cp_timed_blocker_handler: Handler info: Library [/opt/CPshrd-R80.30/lib/libResolver.so], Function offset [0x2b3d0].
[wsdnsd 32546]@vsx1-ext[20 Jan 9:10:33] Warning:cp_timed_blocker_handler: Handler info: Nearest symbol name [_Z10Sock_InputiPv], offset [0x2b3d0].

 

Still digging through my packet capture to see if i can find any strange names / responses etc

Highlighted
Employee+
Employee+

@Timothy_Hall  - Indeed when you will enabled blades that will require tls parser you will need to remove the WA i suggested.

The WA is currently only for the combinations i sent.

0 Kudos
Highlighted
Champion
Champion

That makes sense, thanks.  Will add this workaround to the upcoming R80.40 addendum but be careful to add caveats for which blades are enabled.

 

R80.40 addendum for book "Max Power 2020" now available
for free download at http://www.maxpowerfirewalls.com
0 Kudos
Highlighted
Champion
Champion

Note that the long-term fix for the TLS parser being inappropriately invoked with certain blade combinations has been fixed in R80.40 Jumbo HFA Take 78+.  This fix is also going to be backported into R80.20 and R80.30 Jumbo HFAs as well as mentioned in my R80.40 addendum for Max Power 2020.  It is always preferable to have this fix present if possible rather than manually tampering with the TLS parser, as doing so can cause further problems.

R80.40 addendum for book "Max Power 2020" now available
for free download at http://www.maxpowerfirewalls.com
Highlighted
Explorer

👍

0 Kudos
Highlighted
Authority
Authority

Just a quick update on FQDN object alerts.

All was caused by missing rule that would permit DNS requests using TCP from gateway. I have added full details at the corresponding thread about FQDN here:

https://community.checkpoint.com/t5/General-Management-Topics/Domain-Objects-FQDN-An-Unofficial-ATRG...

 

Highlighted
Contributor

we upgraded to 80.30 with latest HF, and even with disabling the parser (we dont use other blade) 49% of trafic is going trough medium path, how can we check further ?

Case atm.

0 Kudos
Highlighted
Champion
Champion

Output of enabled_blades please.

R80.40 addendum for book "Max Power 2020" now available
for free download at http://www.maxpowerfirewalls.com
0 Kudos
Highlighted
Contributor

FW,Identity Awerness, IPS.

 

Got info from TAC (canada) that tcp/445 and https will use medium path on 80.30 regardless of PSL parser disabled or not.

0 Kudos
Highlighted
Authority
Authority

That's correct - file shares (445) is forced via PXL. There's a procedure available to exclude it but it's fairly complex from memory

0 Kudos
Highlighted
Contributor

well we used fast_accel to bypass the big flows we had and PXL % droped to 20% now.

That traffic is now overloading SND core .... and fwk are still quiet high....

0 Kudos
Highlighted
Contributor

More than 4 months laters, multiple sessions with TAC, and still no solutions yet (one special hotfix that did not help), this seems to be a known issue and TAC is not able to sync with r&d to get a fix ? wondering if someone in the community was able to get a fix at the end.

0 Kudos
Highlighted
Employee+
Employee+

Hi @Khalid_Aftas ,

 

the fix exist in R80.30 JHF take 219 and in R80.20 JHF 141.

 

Thanks,

Ilya 

Highlighted
Contributor

Hi Illya, are you referring to The fix from sk166700 is integrated into take 219 of the jumbo for R80.30 as  (look for the article or PRJ-14368, PRJ-15747, or PRHF-10818 in sk153152) Because in case 6-0002031094 we had a private fix for those on our current JHF. I would like to avoid doing a change and giving hope to my client for nothing. Thanks a lot

0 Kudos
Highlighted
Employee+
Employee+

Hi @Khalid_Aftas ,

 

Yes I'm referring tp fix described in SK166700.

 

Thanks,

Ilya 

0 Kudos