Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
the_rock
Legend
Legend
Jump to solution

Policy push overwrote default route on cluster active gateway

Hey guys,

I really hope someone can shed some light with this. So, one of our colleagues went into client's environment (they use smart-1 cloud) and 6000 series cluster and simply added couple of IP addresses to block group and once policy was applied, we noticed that active member could not be accessed.

At this point, thankfully, ssh to backup worked fine, so once we ssh-ed to active from backup, noticed that default route was gone. Now, in my 15 years with CP, I had NEVER seen or heard of problem like this. Keep in mind, failover never happened, however, there was Internet outage, as default route was gone. Default route was added back via clish afterwards and we did push policy couple of times afterwards and it was fine. 

Now, just to try and figure this out ourselves, we downloaded audit.log from /var/log/audit dir, but it was not useful at all, as it does not have any timestamps, but we searched for words, such as route, default, delete, but no luck. We are 99.99% sure that something else caused this, rather than policy push, but really hard to say what at this point.

Also checked /var/log/messages files, but no luck there either. There was no one who was even logged into firewalls before this issue happened, so it begs the question HOW this happened.

We ended up opening TAC case for it, but after doing zoom meeting, gentleman told us would consult further internally and see what else can be done to try and find the reason.

If anyone else has an idea or any other file(s) we could check, it would be greatly appreciated!

Thanks as always.

38 Replies
the_rock
Legend
Legend

Thanks man! But, for now, should we leave # exit 1 line in cpisp_update file commented out? Because as you know, that disabled the ISPR script, so technically, if client's primary ISP link failed, I dont believe 2nd one would take effect at all. Correct?

0 Kudos
Ilya_Yusupov
Employee
Employee

Yes please leave it with # exit 1 otherwise indeed the failover will not work.

since the issue is a race one, happened twice in 2 weeks and you know how to overcome this already, i guess we can leave it in that state, i hope we can provide you newer script soon, unless you think it's a huge issue which we can't run like that? if so i suggest to add your WA change the script like you did for a meanwhile.

Thanks,

Ilya 

0 Kudos
the_rock
Legend
Legend

Ok, deal. For anyone out there, IF you run ISPR and you happen to have this issue, here is what worked for me in the lab:

Modify $FWDIR/bin/cpisp_update script (ISPR script file)

change ANY line that shows clish -c "set static-route default off" to say on, as per attached screen cap

Screenshot_1.png

This works 100%, I tested it in 2 labs and ISP redundancy WILL cause 2nd link to work if there is a failover. There are 2 lines you need to change, thats it

Hope this helps others if they ever encounter this problem.

Disclaimer: This is more of a WORKAROUND, rather than true solution.

Andy

 

 

0 Kudos
BikeMan
Contributor

Is the fall back also working ?

0 Kudos
the_rock
Legend
Legend

Yup!

0 Kudos
the_rock
Legend
Legend

Well, you can correspond with TAC engineer helping us and if there is newer script, we will be happy to use it. In the meantime, I will email client and let them know about these changes, so we can try test it this week hopefully.

0 Kudos
BikeMan
Contributor

Well, seems to be resolved by sk176424

 

0 Kudos
the_rock
Legend
Legend

Definitely NOT :). I mean, ok, lets be fair and honest about it and say part of it is true, as I tested updated script and it does work fine, BUT, jumbo hotfix part is wrong, because customer is on R81.10 jumbo 78, so way higher than 38.

0 Kudos
the_rock
Legend
Legend

@Ilya_Yusupov provided me with updated cpisp_update script from $FWDIR/bin directory and it worked fine in my lab, so that is the solution!

Cheers and thanks again Ilya for all your efforts, truly grateful!! 🙌

0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events