Issue with Azure platform communication (168.63.12...

RickyDan · ‎2022-07-28

Having an issue communicating to the Azure platform IP 168.63.129.16. When I do a fw unloadlocal the errors on the console stop and the VM in Azure Portal stops reporting that the Azure agent is not working. However, I tried the following in my firewall policy and none solved the issue:

Allow HTTP, DNS and TCP\32526 from cluster object to 168.63.129.16.
Bypass HTTPS inspection (shouldn't have an effect since communication to Azure platform is HTTP)
Applied new policy package with no Threat Prevention.

Edit: The default routes on the gateway VMs have not been changed.

Some errors seen on the command line:

2022/07/28 12:50:45.964617 WARNING ExtHandler [PERIODIC] [IMDS_CONNECTION_ERROR] Unable to connect to IMDS endpoint 169.254.169.254
2022/07/28 12:52:17.057629 WARNING ExtHandler [PERIODIC] [IMDS_CONNECTION_ERROR] Unable to connect to IMDS endpoint 168.63.129.16
2022/07/28 12:53:48.151508 WARNING ExtHandler HealthService: could not report observations: [HttpError] [HTTP Failed] POST http://168.63.129.16:80/HealthService -- IOError timed out -- 6 attempts made
2022/07/28 12:54:19.489577 ERROR ExtHandler ProtocolError processing goal state, giving up [[ProtocolError] [Wireserver Exception] [HttpError] [HTTP Failed] GET http://168.63.129.16/machine/?comp=goalstate -- IOError timed out -- 6 attempts made]
2022/07/28 12:54:19.491258 WARNING ExtHandler Exception retrieving extension handlers: [ProtocolError] Exceeded max retry updating goal state
2022/07/28 12:54:19.492048 ERROR ExtHandler Event: name=WALinuxAgent, op=ExtensionProcessing, message=Exception retrieving extension handlers: [ProtocolError] Exceeded max retry updating goal state [<FrameSummary file /usr/lib/waagent/azurelinuxagent/ga/exthandlers.py, line 230 in run>, <FrameSummary file /usr/lib/waagent/azurelinuxagent/common/protocol/wire.py, line 153 in get_ext_handlers>, <FrameSummary file /usr/lib/waagent/azurelinuxagent/common/protocol/wire.py, line 112 in update_goal_state>, <FrameSummary file /usr/lib/waagent/azurelinuxagent/common/protocol/wire.py, line 837 in update_goal_state>], duration=0
2022/07/28 12:56:19.317619 ERROR ExtHandler [ProtocolError] [Wireserver Exception] [HttpError] [HTTP Failed] POST http://168.63.129.16/machine?comp=telemetrydata -- IOError timed out -- 6 attempts made
2022/07/28 13:01:02.607474 WARNING ExtHandler HealthService: could not report observations: [HttpError] [HTTP Failed] POST http://168.63.129.16:80/HealthService -- IOError timed out -- 6 attempts made
2022/07/28 13:03:33.775584 ERROR ExtHandler [ProtocolError] [Wireserver Exception] [HttpError] [HTTP Failed] POST http://168.63.129.16/machine?comp=telemetrydata -- IOError timed out -- 6 attempts made
2022/07/28 13:03:52.867623 ERROR ExtHandler ProtocolError processing goal state, giving up [[ProtocolError] [Wireserver Exception] [HttpError] [HTTP Failed] GET http://168.63.129.16/machine/?comp=goalstate -- IOError timed out -- 6 attempts made]

Chris_Atkinson · ‎2022-07-28

What do the drop logs look like, have you tried ANY as the source?

Do you have the fwkern.conf entries per sk171584?

CCSM R77/R80/ELITE

RickyDan · ‎2022-07-28

I confirmed the file is configured correctly. It works when I unload the firewall policy so I believe something broke it when turning on blades and configuring security policy.

Output of fw ctl zdebug -m cluster cloud shows that only the active member is replying to health probes but the active member is only replying on eth1. It is not replying on eth0.

Also, how do I check the active/passive state on Cloudguard since cphaprob stat is not relevant?

The built-in static route to 168.63.129.16 is via eth0 tho.

Are you a member of CheckMates?

Issue with Azure platform communication (168.63.129.16)