Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Ryan_Ryan
Advisor

local probing

Hi all,

 

after upgrading to r81.10, all our clusters are showing:

Local Probing PNOTE ON

then within the same second we get:

Local Probing PNOTE OFF

its happening across multiple interfaces and all all our r81 clusters (about 12 times per day per cluster). We never had that error on R80.20.

 

The cluster members are split across multiple datacenters, and if my understanding is correct the default timer is 10ms, well doing a ping I can see ping times getting up to 8ms so I suspect that may be the case we are exceeding 10ms at times.

 

Although we never needed to tweak the value on the old version, I think this could have been resolved by changing fwha_timer_cpha_res from 1 to 2 (meaning a 20ms timeout) however this is no longer chanegable in R81.10. I tried changing fwha_timer_sync_res but it didnt help the issue.

 

Any ideas on what we can change?

 

0 Kudos
7 Replies
Chris_Atkinson
Employee Employee
Employee

Are you able to share some more about the clusters? e.g.

* Are they physical appliances or VMs

* Which JHF takes are used/installed

* Are they all running with Unicast CCP 

CCSM R77/R80/ELITE
0 Kudos
Ryan_Ryan
Advisor

They are all VM's we are running take 45, they al have: CCP mode: Manual (Unicast), vmac is enabled on them all too

 

thanks

0 Kudos
Chris_Atkinson
Employee Employee
Employee

Thanks - Vmware/ESX  or  Hyper-V and with multiple vCPUs assigned to each instance not single?

 

Can you provide the output of the following:

fw ctl get int fwha_dead_timeout_multiplier
fw ctl get int fwha_if_problem_tolerance

 

 

CCSM R77/R80/ELITE
0 Kudos
Ryan_Ryan
Advisor

its vmware, we have at least 2x CPUs in each instance. dynamic dispatcher is enabled.

 

fwha_dead_timeout_multiplier = 3

fwha_if_problem_tolerance = 0

 

 

0 Kudos
_Val_
Admin
Admin

Did you look into sk93454 yet?

0 Kudos
Ryan_Ryan
Advisor

I hadn't found that no, it does look good but I can confirm I don't see "cluster_info" events happening, the cluster routed info history log doesn't show any clusterxl events happening either, I believe we are only losing 1 or 2 of the 3 ccp packets and therefore never actually getting marked as dead, just probing starts, we get a response and its back to normal. 

 

 

0 Kudos
Yair_Shahar
Employee
Employee

Hi,

 

You can try follow sk171844.

 

Yair

0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events