Azure Storage service outage affected multiple Azure services last night

Microsoft Azure Storage service experienced outage for several hours last night. Along with Azure Storage service, all the dependent services like Azure Search, Azure Service Bus, Azure EventHub and Azure Stream Analytics also went down for many customers. Microsoft quickly found the root cause of the issue and started the mitigation process. The service is now up and running without any issues. Microsoft will also provide a full detailed Root Cause Analysis approximately in 72 hours. Read the summary of the impact below.

Starting at 22:42 on the 15th Mar to 00:00 UTC on the 16th Mar 2017, due to an underlying storage incident, other Azure services that leverage Storage may have experienced service management issues. This incident is limited to service management operations, and existing Storage resources were not impacted. Virtual Machines or Cloud Services customers may have experienced failures when attempting to provision resources. Storage customers would have been unable to provision new Storage resources or perform service management operations on existing resources. Azure Search customers may have been be unable to create, scale, or delete services. Azure Monitor customers may have been be unable to turn on diagnostic settings for resources. Azure Site Recovery customers may have experienced replication failures. API Management service activation in South India may have experienced a failure. Azure Batch customers will have been unable to provision new resources. During this time all existing Azure Batch pools would have scheduled tasks as normal. EventHub customers using a service called ‘Archive’ may have experienced failures. Customers using Visual Studio Team Services Build will have experienced failures. Azure Portal may have been unable to access storage account management operations and would have been unable to deploy new accounts.

Microsoft has mentioned that software error as the potential root cause for this outage. They also noted that engineers have observed around 50% of success rate during the impacted window. Most customers would have succeeded upon retries.

Comments