Microsoft Azure’s Virtual Machine service today faced a 6-hour long outage. Between 05:12 UTC and 11:45 UTC on 13 Oct 2021, a subset of Azure customers using Windows Virtual Machines faced issues while performing service management operations – such as start, create, update, delete. Deployments of new VMs and updates also failed. Linux-based VMs, and existing running Windows VMs were not impacted by this issue. Microsoft published the following as the root cause for this outage:
We identified that calls made during service management operations were failing as a required artifact version data could not be queried. Our investigation focused on the backend compute resource provider (CRP) to determine why the calls were failing, and identified that a required VMGuestAgent could not be queried from the repository. The VM Guest Agent Extension publishing architecture was being migrated (as part of a migration of legacy service management backend systems) to a new platform which leverages the latest Azure Resource Manager (ARM) capabilities.