- CheckMates
- :
- Products
- :
- Quantum
- :
- Security Gateways
- :
- Re: R81.10 and BGP
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Are you a member of CheckMates?
×- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
R81.10 and BGP
I have upgraded from R80.20 to R81.10 . I currently have 2 eBGP peers and 1 iBGP peer.
When switching from active to standby the (old active) now standby cluster member goes into down status briefly. ROUTED on the now standby member uses high (CPU 65% one cpu) for over 60 minutes.
Status so far,
- lots of debugs and cpinfo
- Checkpoint TAC's (ticket open 2 weeks) solution was to remove graceful restart which on causes all connections to be dropped and high CPU. I will continue to work with TAC .
FYI (In R80.20 the cluster lost all connections for 30 seconds when going from active to standby. Checkpoint said the solution was to turn on graceful restart. I turned on graceful restart and it resolved the dropping of all connections for 30 seconds in R80.20.)
But now Checkpoint TAC claims removing graceful restart will fix the issue.
Is anyone else using iBGP and R81.10? DO you have any ideas
Leo
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
How many routes are in the BGP table and do the adjacent peer/s have GR configured on their side?
Which JHF take is used on this gateway/cluster?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
400,000+ routes, GR is on both sides (see below) Members at JHF 30
PeerID AS Routes ActRts State InUpds OutUpds Uptime
12.122.NNN.NNN 7018 46809 40356 Established 11888 3 06:57:37
50.220.NNN.NNN 7922 7222 5110 Established 1936 3 06:57:01
4.53.NNN.NNN 21NNN 408564 392414 Established 126974 2 06:56:33
----- Peer 12.122
State Established (Uptime: 07:00:38)
Peer Type eBGP Peer
Remote AS 7018
Peer Capabilities IPv4 Unicast,Route Refresh,Cisco Route Refresh,Graceful Restart,4-Byte AS Extension
Our Capabilities IPv4 Unicast,Route Refresh,Graceful Restart,4-Byte AS Extension,Enhanced Route Refresh
----- Peer 50.220
State Established (Uptime: 07:00:02)
Peer Type eBGP Peer
Remote AS 7922
Peer Capabilities IPv4 Unicast,Route Refresh,Cisco Route Refresh,Graceful Restart,4-Byte AS Extension
Our Capabilities IPv4 Unicast,Route Refresh,Graceful Restart,4-Byte AS Extension,Enhanced Route Refresh
----- Peer 4.53
State Established (Uptime: 06:59:40)
Peer Type iBGP Peer
Remote AS 21NNN
Peer Capabilities IPv4 Unicast,Route Refresh,Cisco Route Refresh,Graceful Restart,4-Byte AS Extension,Enhanced Route Refresh
Our Capabilities IPv4 Unicast,Route Refresh,Graceful Restart,4-Byte AS Extension,Enhanced Route Refresh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
From an external view point 400,000 in iBGP seems high for most environments.
Has TAC provide guidance on if the situation would be improved by reducing this with employing route optimization strategies downstream?
Which model gateway appliances are used here out of interest?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In R80.20 I demonstrated to TAC that the issue went away when I filtered the iBGP routes. I mentioned the iBGP route size to TAC but TAC did not seem interested. I think TAC thinks it is a configuration issue. In R80.20 a custom ROUTED was created to fix the iBGP route issue. We are using open hardware.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
My issue has been open with Sirius since February and TAC for two weeks. You have been asking some very good questions. I can try to adding the route filtering tomorrow and 6 - 7pm ET . That is our slow time during the week. I have assumed it is an iBGP and the number of routes from the beginning. TAC keeps on saying that was fixed in R80
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you have the SR number for the same issue under R80.20 you should be able to request a portfix via TAC if a hotfix was provided.
Where possible I would suggest both strategies are employed to ensure stability.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Checkpoint R&D now claims that the standby cluster member in high CPU (ROUTED) for hours is caused by having ONLY a 1gig heartbeat interface. They said I need to upgrade to a 10 GIG heartbeat connection..... Very interesting CISCO says "Cisco typically recommends a minimum of 512 MB of RAM in the router to store a complete global BGP routing table from one BGP peer" 512MG needs a 10GIG connection?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
My switch says the heartbeat interface max'ed out at 141Mbps ? 10 GIG?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can you please share your SR number for the TAC case with me in private?
(P.S. How did you go with the route filtering / summarization?)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
holy zoinks bat scoob!
That is an impressive amount of routes. I'm assuming those aren't all 1918 prefixes?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
They are all Internet routes. I have been filtering out the RFC1918 routes out since R77.30 (the good old days)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Todays update is that ROUTED crashed on the active cluster member (HA1) and the (now standby member HA1) CUL'ed non-stop for 4 hours and 20 minutes.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Despite this occurrence, I want to come back to your original statement briefly.
"When switching from active to standby the (old active) now standby cluster member goes into down status briefly. ROUTED on the now standby member uses high (CPU 65% one cpu) for over 60 minutes."
What operational problem is this creating for you, how many cores/CPUs are assigned to the machine?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Message me privately if you need more help with this...I have BGP running in my lab with R81.10 and I had not seen these issues at all. Personally, I dont see logic in why you were asked to remove graceful restart option, that can only help in situation like yours.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can you setup your lab to have one Peer configured as iBGP and then send in the full BGP route table? In R80.20 if I restricted the routes (only default) from my iBGP peer the issue went away. Never had any issues with R77.30.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I can, but might take some time, since I gave lab access to lots of my colleagues, as its very good setup. I will try do it some time this week. In the meantime, be free to message me privately and we can do remote tomorrow if you have time. Im in EST time zone (GMT -4 currently).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I see what you are saying...tested it in R81.10, same issue. I wonder if its some kind of bug...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Out of interest what about your internal topology needs the full routing table versus fewer summarized routes?
Perhaps cBit is an alternate to GR that may assist per sk175923.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
we need the full touting table or else the data is not routed correctly . Do to the usage of BGP the same subnets are used by many carriers
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
My ROUTED crashed again and after over 30 days Checkpoint R&D has stated "We are suspect large route update creates a bottle neck, we are working on confirming this possibility." No kidding it was the same issue in R80.20. How long does it take to compare a R80.20 ticket to a R81.10 ticket/
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Personally, I doubt myself thats an issue, just my 2 cents. I had seen people in past advertise way more routes than yourself and never had any problems.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Are the routes that you have seen used iBGP or eBGP?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
the issue is the receiving of the iBGP routes not the advertising
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
iBGP...as a matter of fact, I saw someone use 900.000+ and no issues. If I were you, I would ask R&D for more details on this, because unless there is a clear cut proof of what they told you, I cant see that being a real reason for your issue. Of course, needless to say, it would be much better if you advertised 10, rather than 400K routes, but I would definitely inquire more.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Is there a way to get in touch with the other person that is receiving over 900,000 routes from an iBGp peer to compare with?
I had the same issues in R80.20 and it required a custom ROUTED from Checkpoint R&D.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As I explained before if you have the SR number for your previous case you should be able to request the fix be ported to a newer version where applicable.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Already did that
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
So the issue is different or the fix is still being prepared?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
R&D is still working on it the response is "We are suspect large route update creates a bottle neck, we are working on confirming this possibility."
