Showing results for 
Search instead for 
Did you mean: 
Create a Post

Azure NIC issues - possibly waagent related

Hi all,

  I noticed recurring issues with the Azure CP R80.20 cluster and was wondering if anyone else had this behavior.

Basically the interfaces related to Azure Accelerated Networking unregister and may come up with a different name which breaks the traffic completely.

Although this was supposed to be solved by Jumbo HF take 17 it occurred again.

I believe it may be related to outdated buggy version of the Microsoft Azure Linux Agent (waagent) v2.2.11 installed on the VM (the last available version is v2.2.42)

Now waiting for my SR to be picked up...

Two other issues with the agent that are resolved in newer version:

-agent's logs filling up the Azure Serial Console making it unusable

-does not use the configured proxy server

Entries in /var/log/messages:

 kernel: kernel: hv_netvsc 000d3a25-c27e-000d-3a25-c27e000d3a25 eth0: Data path switched from VF: enP1p0s2

 kernel: kernel: hv_netvsc 000d3a25-c27e-000d-3a25-c27e000d3a25 eth0: VF unregistering: enP1p0s2

 kernel: kernel: [SIM4];cphwd_api_forward_packet: sim_mgr_prepare_packet failed

 kernel: kernel: [SIM4];simlinux_br_port: dev == NULL !!!!!



0 Kudos
5 Replies

Re: Azure NIC issues - possibly waagent related

From what I can tell, you will need to get a hotfix from TAC for this.
0 Kudos

Re: Azure NIC issues - possibly waagent related



The issues you describe are similar to known issues related to Azure maintenance operations which are currently under investigation. sk160992 contains additional information. Note that they are not related to the version of linux agent that we deploy.

Our latest available version in Azure - R80.30 - contains the necessary fixes for the issue and we recommend to upgrade. In case upgrading is not possible, please contact TAC.





0 Kudos

Re: Azure NIC issues - possibly waagent related

Hi Dimitri,
thanks for your feedback. The issues do look similar but the kernel messages are different as in sk160992.
After the first incident we have installed the suggested Jumbo HF from sk146212 to address GAIA-5479 issue described below:

“Azure maintenance operations on the Azure Hosts can cause the NIC driver to be reloaded. Our SW did not correctly handle all the use cases and configurations in the event of a reload operation when the gateway VM is in "started" state in Azure. This fix (introduced in Take_17) fixes this issue and makes sure that even if the driver is reloaded during regular operation, the NIC and the Security Gateway will be configured correctly.”

However the issue reoccurred and on both occasions Azure support confirmed there were no Azure maintenance operations, no issues and no changes on the hosting servers.

As for outdated waagent, I do think it might be relevant and should be updated considering this agent is responsible for “Linux provisioning and VM interaction with the Azure Fabric Controller. Ensures the stability of the network interface name”

Even the latest R80.30 image includes the old agent version with existing bugs.
0 Kudos

Re: Azure NIC issues - possibly waagent related

There were actually two separate issues. One was resolved in JHF Take 17, and the second one is fixed in R80.30 and related to a component of our operating system - this seems to be related according the output you've sent. For R80.20, it is possible to obtain a hotfix for the second issue by contacting TAC (and it will be included in a future JHF take).

This specific issue does not seem related to WALinuxAgent, but we do have plans to update it to a newer version in the future.


Please let me know if the issue is resolved by upgrade/hotfix deployment.




Re: Azure NIC issues - possibly waagent related

Thanks again Dimtry, this is good news. Although I've searched through release notes nothing similar came up.
I did request a new SR for this problem last week but it is still hanging...
0 Kudos