Unplanned VM unavailability (US1)

Incident Report for Happydance

Postmortem

Incident Overview

On 11 February 2026 between 00:29 and 01:43 GMT, customer websites hosted on US1 (Central US) experienced service disruption following an unexpected virtual machine availability event.

Azure Resource Health reported the virtual machine as unavailable during this period. While the platform began recovering, the operating system did not restore cleanly, resulting in prolonged service instability before normal operation resumed.

No data loss occurred during the incident.

Root Cause

Azure reported an unplanned platform-level availability event affecting the underlying infrastructure hosting the US1 virtual machine.

During the event:

The US1 virtual machine became temporarily unavailable
The VM underwent a platform-level recovery/restart process
The Windows operating system did not initialise cleanly
The server entered an unstable state during startup, delaying full service restoration

Although the Azure platform recovered, the operating system exhibited signs of instability post-restart. This behaviour is consistent with corruption or inconsistency introduced during abrupt platform-level interruption.

Operating system and attached data disks were preserved throughout the event. No database or customer data corruption was identified.

Incident Response

The availability issue was detected immediately via monitoring and Azure Resource Health alerts
Engineering confirmed the impact was isolated to workloads hosted on US1
Platform health and VM state were reviewed through Azure diagnostics
Services gradually began restoring as the VM stabilised
Once service stability was verified, the incident was marked as resolved

Following recovery, a deeper assessment identified residual operating system instability risk.

Mitigation Steps

Immediate actions taken:

Verified integrity of hosted websites and application services
Confirmed no data loss or disk corruption
Reviewed Azure Service Health notifications for Central US

Preventative action (in progress):

Planned migration of impacted websites to a new compute resource
Provisioning of a clean virtual machine instance
Controlled migration of workloads to eliminate residual OS-level risk
Decommissioning of the affected compute resource once migration is complete

This approach ensures long-term stability rather than relying on an OS instance that experienced an abnormal recovery state.

Conclusion

This incident was caused by an unplanned Azure platform availability event that resulted in virtual machine interruption. While Azure restored platform availability, the operating system entered an unstable recovery state, extending service disruption.

All services have been stable since 01:43 GMT on 11 February 2026.

As a precautionary measure, we are migrating affected workloads to a new compute resource to eliminate any potential residual instability and further strengthen service resilience.

We will continue to monitor platform health and take proactive steps to minimise the impact of future infrastructure-level events

Posted Feb 11, 2026 - 14:28 UTC

Resolved

The earlier unplanned availability issue affecting the US1 server has now been fully resolved.

Services have been stable for some time, and monitoring confirms that all systems are operating normally.

This incident was caused by a temporary platform-level event reported by Azure. No further impact is expected.

We will continue routine monitoring as part of our standard operational procedures.

If you experience any lingering issues, please contact support.

Posted Feb 11, 2026 - 08:49 UTC

Monitoring

We’re seeing monitoring services gradually restore to normal levels following the earlier unplanned availability event affecting the US1 server.

Recovery is still in progress, and some monitoring checks may continue to appear delayed or inconsistent while the platform fully stabilises.

We’ll continue to monitor closely and will provide further updates if anything changes.

Posted Feb 11, 2026 - 01:36 UTC

Investigating

We’re currently investigating an unplanned availability issue affecting one of our US1 servers.

At 12:37 AM GMT on Wednesday, 11 February 2026, Azure reported that the underlying virtual machine became temporarily unavailable. This was detected by Azure’s platform monitoring.

At this stage:

The issue is platform-side

No action is required from customers

Azure is actively assessing the root cause and recovery status

We’re closely monitoring the situation and will provide updates here as soon as more information becomes available.

Posted Feb 11, 2026 - 00:43 UTC

This incident affected: HappyDance Infrastructure (Microsoft Azure).