On September-8, Microsoft enterprise services such as Office 365, etc and Windows Live services such as Hotmail and SkyDrive suffered from outage for many users. The service was restored to normal in less than 4 hours. Microsoft then told that service interruption was caused to due to DNS problems and they are investigating the root cause of the system. Today they explained the reason behind the outage,
We determined the cause to be a corrupted file in Microsoft’s DNS service. The file corruption was a result of two rare conditions occurring at the same time. The first condition is related to how the load balancing devices in the DNS service respond to a malformed input string (i.e., the software was unable to parse an incorrectly constructed line in the configuration file). The second condition was related to how the configuration is synchronized across the DNS service to ensure all client requests return the same response regardless of the connection location of the client. Each of these conditions was tracked to the networking device firmware used in the Microsoft DNS service.
Microsoft is also doing some service improvements around monitoring, problem identification, and recovery.
Hope we don’t see any such outages in the future.
More details on this can be found here.