Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Kaspars_Zibarts
MVP Gold CHKP MVP Gold CHKP
MVP Gold CHKP

ROUTED with strange errors on some standby VSes causing cluster failover

This started totally out of blue one day (suspect is a problem with outside switch / bond)

But now, two out of four standby VSes  would become active for less than a second (by sending gratuitous ARP)

No PNOTES and cphaprob looks all OK (so it's too short to see on the screen)

We are running R80.10 take 203

2019-07-05_16-11-55.jpg

 

ROUTED debug /trace reports this weird error:

api_get_member_info(155): failed to get memberinfo for member(x) - probably member does not exists.

Jul 5 13:25:32.337386 cpcl_add_cxl_addresses(3745): Entering...
Jul 5 13:25:32.337386 cpcl_add_cxl_addresses(): my instance id is 4 and runtime instance id from cp_get_current_ctx_id() is 4
Jul 5 13:25:32.337386 cpcl_add_cxl_addresses(3839): no change in cluster-ip for wrp257
Jul 5 13:25:32.337386 cpcl_add_cxl_addresses(3839): no change in cluster-ip for wrp256
Jul 5 13:25:32.337386 cpcl_add_cxl_addresses(): cxl_get_cluster_ip_from_member_if_name() failed, rval = -2 for lo4
Jul 5 13:25:32.337386 cpcl_add_cxl_addresses_v6(3598): cxl_get_cluster_ip_from_member_if_name_ctx_v6() failed, rval = -1 for wrp257 for instance 4
Jul 5 13:25:32.337386 cpcl_add_cxl_addresses_v6(3598): cxl_get_cluster_ip_from_member_if_name_ctx_v6() failed, rval = -1 for wrp256 for instance 4
Jul 5 13:25:32.337386 cpcl_add_cxl_addresses_v6(3598): cxl_get_cluster_ip_from_member_if_name_ctx_v6() failed, rval = -1 for lo4 for instance 4
Jul 5 13:25:32.337386 cpcl_add_cxl_addresses(4017): Cluster interface is up
Jul 5 13:25:32.337386 cpcl_evt_slave_slave
Jul 5 13:25:32.337386 api_get_member_info(155): failed to get memberinfo for member(2) - probably member does not exists.
Jul 5 13:25:32.337386 api_get_member_info(155): failed to get memberinfo for member(3) - probably member does not exists.
Jul 5 13:25:32.337386 api_get_member_info(155): failed to get memberinfo for member(4) - probably member does not exists.
Jul 5 13:25:32.337386 api_get_member_info(155): failed to get memberinfo for member(5) - probably member does not exists.
Jul 5 13:25:32.337386 api_get_member_info(155): failed to get memberinfo for member(6) - probably member does not exists.
Jul 5 13:25:32.337386 api_get_member_info(155): failed to get memberinfo for member(7) - probably member does not exists.
Jul 5 13:25:32.337386 api_get_member_info(155): failed to get memberinfo for member(8) - probably member does not exists.
Jul 5 13:25:32.337386 api_get_member_info(155): failed to get memberinfo for member(9) - probably member does not exists.
Jul 5 13:25:32.337386 api_get_member_info(155): failed to get memberinfo for member(10) - probably member does not exists.
Jul 5 13:25:32.337386 api_get_member_info(155): failed to get memberinfo for member(11) - probably member does not exists.
Jul 5 13:25:32.337386 api_get_member_info(155): failed to get memberinfo for member(12) - probably member does not exists.
Jul 5 13:25:32.337386 api_get_member_info(155): failed to get memberinfo for member(13) - probably member does not exists.
Jul 5 13:25:32.337386 api_get_member_info(155): failed to get memberinfo for member(14) - probably member does not exists.
Jul 5 13:25:32.337386 api_get_member_info(155): failed to get memberinfo for member(15) - probably member does not exists.
Jul 5 13:25:32.337386 api_get_member_info(155): failed to get memberinfo for member(16) - probably member does not exists.
Jul 5 13:25:32.337386 api_get_member_info(155): failed to get memberinfo for member(17) - probably member does not exists.
Jul 5 13:25:32.337386 api_get_member_info(155): failed to get memberinfo for member(18) - probably member does not exists.
Jul 5 13:25:32.337386 api_get_member_info(155): failed to get memberinfo for member(19) - probably member does not exists.
Jul 5 13:25:32.337386 api_get_member_info(155): failed to get memberinfo for member(20) - probably member does not exists.
Jul 5 13:25:32.337386 api_get_member_info(155): failed to get memberinfo for member(21) - probably member does not exists.
Jul 5 13:25:32.337386 api_get_member_info(155): failed to get memberinfo for member(22) - probably member does not exists.
Jul 5 13:25:32.337386 api_get_member_info(155): failed to get memberinfo for member(23) - probably member does not exists.
Jul 5 13:25:32.337386 api_get_member_info(155): failed to get memberinfo for member(24) - probably member does not exists.
Jul 5 13:25:32.337386 api_get_member_info(155): failed to get memberinfo for member(25) - probably member does not exists.
Jul 5 13:25:32.337386 api_get_member_info(155): failed to get memberinfo for member(26) - probably member does not exists.
Jul 5 13:25:32.337386 api_get_member_info(155): failed to get memberinfo for member(27) - probably member does not exists.
Jul 5 13:25:32.337386 api_get_member_info(155): failed to get memberinfo for member(28) - probably member does not exists.
Jul 5 13:25:32.337386 api_get_member_info(155): failed to get memberinfo for member(29) - probably member does not exists.
Jul 5 13:25:32.337386 api_get_member_info(155): failed to get memberinfo for member(30) - probably member does not exists.
Jul 5 13:25:32.337386 api_get_member_info(155): failed to get memberinfo for member(31) - probably member does not exists.
Jul 5 13:25:32.337386 api_get_member_info: failed to get memberinfo. selector = 837957d, data = 0
Jul 5 13:25:32.337386 cpcl_clear_recv_queue(1660): entering
Jul 5 13:25:32.337386 entering cpcl_slave_init()
Jul 5 13:25:32.337386 cpcl_slave_init(4326): instance 4 dynamic routing not enabled therefore not connecting to master
Jul 5 13:25:32.337386 KRT_CLUSTER: cluster transition <1> <0> <0>
Jul 5 13:25:32.337386 cpcl_request_initial_state(6415): Entering...
Jul 5 13:25:32.337386 cpcl_request_initial_state(6420): slave task is NULL, returning

 

Any ideas how is api_get_member_info call made? That probably would resolve it.

Node was fully rebooted to no avail. If I force these two VSes (using VSLS) to be active on VSX2 then it runs without any errors.

0 Kudos
2 Replies
PhoneBoy
Admin
Admin

If you only have two cluster members, I’m thinking the “failed to get memberinfo for member(X)” messages are a red herring.
Still need to understand why VSes are becoming active when they shouldn’t, recommend engaging the TAC.
0 Kudos
Kaspars_Zibarts
MVP Gold CHKP MVP Gold CHKP
MVP Gold CHKP

Yep fully agree, will do tac case after holidays. Time for summer break!
0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events