It seems Microsoft is having quite a few outages lately. With last weeks Azure AD outage, Microsoft has continued to have some service disruptions this week too. Yesterday, October 6, there was a service outage related to Azure Front Door that had a rippling affect on Azure DevOps. Today, October 7, there was an outage related to an internal Azure WAN (Wide Area Network) that caused traffic to “route non-optimally” that also appears to have affected Azure DevOps too. There was also a period of time where the Azure status history page was down resulting in an inability to check on Azure service health.
Based on the history of the recent Azure DevOps outages these Azure service disruptions affected customers in the United States and Europe. Although, by searching on the Azure status history page these recent Azure issues are listed under the Region category of “Global”. It’s not clear how many customers experienced issues in regards to these Azure service disruptions as Microsoft hasn’t published that information.
Azure Outages for October 6 – 7, 2020
Here’s details that Microsoft has published on these Azure service disruptions:
Issues accessing Microsoft or Azure services – Mitigated (Tracking ID 8TY8-HT0)
Summary of Impact: Between 18:20 UTC and approximately 21:45 UTC on 07 Oct 2020, a subset of customers may have experienced issues connecting to resources that leverage Azure Network infrastructure across regions. Resources with local dependencies in the same region should not have been impacted.
Preliminary Root Cause: A change was made to an internal service that controls routing across the Azure Wide Area Network (WAN). A bug in the new version of the service caused traffic to route non-optimally across the WAN, causing network congestion and packet loss.
Mitigation: At 18:42 UTC the service self-recovered and all packet loss was resolved. To ensure the bug does not repeat, the change was rolled back at 19:30 UTC. By approximately 21:45 UTC, all Azure services reported full recovery.
Next Steps: We will continue to investigate to establish the full root cause and prevent future occurrences, and a full Post Incident Report (PIR) will be published within the next 72 hours. Stay informed about Azure service issues by creating custom service health alerts: https://aka.ms/ash-videos for video tutorials and https://aka.ms/ash-alerts for how-to documentation.
Azure Front Door – Mitigated (Tracking ID 8KND-JP8)
Summary of Impact: Between 17:00 UTC and 21:19 UTC on 6 Oct 2020, a subset of customers may have experienced traffic routing to unhealthy backends.
Preliminary Root Cause: A configuration change was deployed causing the incorrect routing of traffic to unhealthy backends.
Mitigation: We reverted the recent change to a previous healthy configuration.Next Steps: We will continue to investigate to establish the full root cause and prevent future occurrences.