Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted

ROUTED with strange errors on some standby VSes causing cluster failover

This started totally out of blue one day (suspect is a problem with outside switch / bond)

But now, two out of four standby VSes  would become active for less than a second (by sending gratuitous ARP)

No PNOTES and cphaprob looks all OK (so it's too short to see on the screen)

We are running R80.10 take 203

2019-07-05_16-11-55.jpg

 

ROUTED debug /trace reports this weird error:

api_get_member_info(155): failed to get memberinfo for member(x) - probably member does not exists.

Jul 5 13:25:32.337386 cpcl_add_cxl_addresses(3745): Entering...
Jul 5 13:25:32.337386 cpcl_add_cxl_addresses(): my instance id is 4 and runtime instance id from cp_get_current_ctx_id() is 4
Jul 5 13:25:32.337386 cpcl_add_cxl_addresses(3839): no change in cluster-ip for wrp257
Jul 5 13:25:32.337386 cpcl_add_cxl_addresses(3839): no change in cluster-ip for wrp256
Jul 5 13:25:32.337386 cpcl_add_cxl_addresses(): cxl_get_cluster_ip_from_member_if_name() failed, rval = -2 for lo4
Jul 5 13:25:32.337386 cpcl_add_cxl_addresses_v6(3598): cxl_get_cluster_ip_from_member_if_name_ctx_v6() failed, rval = -1 for wrp257 for instance 4
Jul 5 13:25:32.337386 cpcl_add_cxl_addresses_v6(3598): cxl_get_cluster_ip_from_member_if_name_ctx_v6() failed, rval = -1 for wrp256 for instance 4
Jul 5 13:25:32.337386 cpcl_add_cxl_addresses_v6(3598): cxl_get_cluster_ip_from_member_if_name_ctx_v6() failed, rval = -1 for lo4 for instance 4
Jul 5 13:25:32.337386 cpcl_add_cxl_addresses(4017): Cluster interface is up
Jul 5 13:25:32.337386 cpcl_evt_slave_slave
Jul 5 13:25:32.337386 api_get_member_info(155): failed to get memberinfo for member(2) - probably member does not exists.
Jul 5 13:25:32.337386 api_get_member_info(155): failed to get memberinfo for member(3) - probably member does not exists.
Jul 5 13:25:32.337386 api_get_member_info(155): failed to get memberinfo for member(4) - probably member does not exists.
Jul 5 13:25:32.337386 api_get_member_info(155): failed to get memberinfo for member(5) - probably member does not exists.
Jul 5 13:25:32.337386 api_get_member_info(155): failed to get memberinfo for member(6) - probably member does not exists.
Jul 5 13:25:32.337386 api_get_member_info(155): failed to get memberinfo for member(7) - probably member does not exists.
Jul 5 13:25:32.337386 api_get_member_info(155): failed to get memberinfo for member(8) - probably member does not exists.
Jul 5 13:25:32.337386 api_get_member_info(155): failed to get memberinfo for member(9) - probably member does not exists.
Jul 5 13:25:32.337386 api_get_member_info(155): failed to get memberinfo for member(10) - probably member does not exists.
Jul 5 13:25:32.337386 api_get_member_info(155): failed to get memberinfo for member(11) - probably member does not exists.
Jul 5 13:25:32.337386 api_get_member_info(155): failed to get memberinfo for member(12) - probably member does not exists.
Jul 5 13:25:32.337386 api_get_member_info(155): failed to get memberinfo for member(13) - probably member does not exists.
Jul 5 13:25:32.337386 api_get_member_info(155): failed to get memberinfo for member(14) - probably member does not exists.
Jul 5 13:25:32.337386 api_get_member_info(155): failed to get memberinfo for member(15) - probably member does not exists.
Jul 5 13:25:32.337386 api_get_member_info(155): failed to get memberinfo for member(16) - probably member does not exists.
Jul 5 13:25:32.337386 api_get_member_info(155): failed to get memberinfo for member(17) - probably member does not exists.
Jul 5 13:25:32.337386 api_get_member_info(155): failed to get memberinfo for member(18) - probably member does not exists.
Jul 5 13:25:32.337386 api_get_member_info(155): failed to get memberinfo for member(19) - probably member does not exists.
Jul 5 13:25:32.337386 api_get_member_info(155): failed to get memberinfo for member(20) - probably member does not exists.
Jul 5 13:25:32.337386 api_get_member_info(155): failed to get memberinfo for member(21) - probably member does not exists.
Jul 5 13:25:32.337386 api_get_member_info(155): failed to get memberinfo for member(22) - probably member does not exists.
Jul 5 13:25:32.337386 api_get_member_info(155): failed to get memberinfo for member(23) - probably member does not exists.
Jul 5 13:25:32.337386 api_get_member_info(155): failed to get memberinfo for member(24) - probably member does not exists.
Jul 5 13:25:32.337386 api_get_member_info(155): failed to get memberinfo for member(25) - probably member does not exists.
Jul 5 13:25:32.337386 api_get_member_info(155): failed to get memberinfo for member(26) - probably member does not exists.
Jul 5 13:25:32.337386 api_get_member_info(155): failed to get memberinfo for member(27) - probably member does not exists.
Jul 5 13:25:32.337386 api_get_member_info(155): failed to get memberinfo for member(28) - probably member does not exists.
Jul 5 13:25:32.337386 api_get_member_info(155): failed to get memberinfo for member(29) - probably member does not exists.
Jul 5 13:25:32.337386 api_get_member_info(155): failed to get memberinfo for member(30) - probably member does not exists.
Jul 5 13:25:32.337386 api_get_member_info(155): failed to get memberinfo for member(31) - probably member does not exists.
Jul 5 13:25:32.337386 api_get_member_info: failed to get memberinfo. selector = 837957d, data = 0
Jul 5 13:25:32.337386 cpcl_clear_recv_queue(1660): entering
Jul 5 13:25:32.337386 entering cpcl_slave_init()
Jul 5 13:25:32.337386 cpcl_slave_init(4326): instance 4 dynamic routing not enabled therefore not connecting to master
Jul 5 13:25:32.337386 KRT_CLUSTER: cluster transition <1> <0> <0>
Jul 5 13:25:32.337386 cpcl_request_initial_state(6415): Entering...
Jul 5 13:25:32.337386 cpcl_request_initial_state(6420): slave task is NULL, returning

 

Any ideas how is api_get_member_info call made? That probably would resolve it.

Node was fully rebooted to no avail. If I force these two VSes (using VSLS) to be active on VSX2 then it runs without any errors.

0 Kudos
2 Replies
Highlighted
Admin
Admin

Re: ROUTED with strange errors on some standby VSes causing cluster failover

If you only have two cluster members, I’m thinking the “failed to get memberinfo for member(X)” messages are a red herring.
Still need to understand why VSes are becoming active when they shouldn’t, recommend engaging the TAC.
0 Kudos
Highlighted

Re: ROUTED with strange errors on some standby VSes causing cluster failover

Yep fully agree, will do tac case after holidays. Time for summer break!
0 Kudos