Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Kaspars_Zibarts
Employee Employee
Employee
Jump to solution

Cluster Full sync taking very long time R80.40 T161

Just wondering if anyone else has any thoughts on the subject..

We have a cluster of 28000 series running R80.40 T161 with IPS, APCL, URLF, AB, AV and HTTPS interception turned ON.

Yesterday we were forced to reboot standby member during day and observed that full sync took nearly half an hour which seemed quite excessive

Oct 25 09:55:42 2022 fw1 fwk: CLUS-120120-1: Fullsync started
Oct 25 10:20:21 2022 fw1 fwk: CLUS-120122-1: Fullsync completed successfully

Performance figures at that point:

  • total throughput ~15Gbps
  • internet ~4Gbps
  • HTTPS inspected ~2Gbps
  • Threat prevention applied to external traffic only
  • 600,000 concurrent connections
  • 10,000 new connections per second

It seemed that sync protocol was not able to keep up with new connection rate - we just saw from connections table size on the standby that it was growing very very slowly. An no obvious errors reported from cphaprob syncstat

It's a fairly new cluster and we are still in the "tuning" phase (new boxes and new functionality). So we disabled sync for DNS connections and delayed HTTP/S connection sync to 30secs. Which should help of course.

I just wanted to hear if anyone else is pushing high end appliances close to these numbers and have seen anything like that?

Has anyone noticed "performance" improvements after upgrading to R81.10 on gateways? I know management gets "faster" but gateways?

I realize that we are getting close to box MAX:

 

image.png

 

 

0 Kudos
1 Solution

Accepted Solutions
Kaspars_Zibarts
Employee Employee
Employee

it's fixed in T1543 😄 

View solution in original post

(1)
10 Replies
_Val_
Admin
Admin

600K connections is A LOT. I would look into an option to set up delayed sync for at least some of the trafffic.

0 Kudos
Kaspars_Zibarts
Employee Employee
Employee

If it was a FW blade only, it would not be that much. Especially when you look at the datasheet of 28000 🙂

 

image.png

_Val_
Admin
Admin

Full sync sends over all kernel tables for 600K connections. It is quite a chunk of data. 

0 Kudos
the_rock
Legend
Legend

I agree, thats way too much time. Personally, I would open TAC case to investigate more.

0 Kudos
Alexander_Wilke
Advisor

~400.000 concuirrent connections, 

~6.000 new conns per sec

162000 appliance

r80.40 take 156

only Firewall Blade

 

Nov 2 09:51:34 2022 xxxxx fwk: CLUS-120120-1: Fullsync started
Nov 2 09:52:04 2022 xxxxx fwk: CLUS-120122-1: Fullsync completed successfully

 

You have many blades and perhaps much more to sync than a firewall only GW.
however it should not take so long.

check MTU size on both sync interfaces to match.

open a ticket.

0 Kudos
Timothy_Hall
Legend Legend
Legend

Sounds like an unhealthy or overloaded sync network, for both members can you post the output of cphaprob syncstat, along with fw ctl pstat in case the firewalls are experiencing other memory issues.

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
0 Kudos
Kaspars_Zibarts
Employee Employee
Employee

Sorry, Elvis has left the building.. I'm not longer with the company and can't get any logs. But I'm 101% sure that sync network was intact. It's a black fiber between DCs approx 1km apart running mearly 100Mbps from 1Gbps available from memory

the_rock
Legend
Legend

But come on, now that you work for CP, thats more pressure to fix the issue ; - )

0 Kudos
Kaspars_Zibarts
Employee Employee
Employee

it's fixed in T1543 😄 

(1)
the_rock
Legend
Legend

🤣🤣🤣

0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events