Acquia has detected an interruption of service for some Acquia Cloud Site Factory, Acquia Cloud Professional, and Acquia Cloud Enterprise sites
Incident Report for Acquia, Inc.
Postmortem

Acquia has detected an interruption of service for some Acquia Cloud Site Factory, Acquia Cloud Professional, Acquia Cloud Enterprise sites and Acquia UI Services

Purpose of This Report

This is a summary and analysis of an issue that occurred with the delivery of an Acquia product or service. The purpose of this document is to share details about what happened and why, so there is a common understanding of what is required to prevent a future occurrence if at all possible.

What happened

On August 6, 2019 Acquia detected an interruption of service for some Acquia Cloud Site Factory, Acquia Cloud Professional, and Acquia Cloud Enterprise sites. Acquia also received reports from customers regarding possible interruption in service.

What we did about it

Acquia initiated our highest level of escalation and the reported interruption was investigated by a cross functional team consisting of Support, Operations, Product and Engineering. During the investigation Acquia contacted our network and edge partners for information and assistance. Acquia personnel noted that our DNS provider had scheduled maintenance occuring with the times of the reported interruption in service.

Identified Root Cause

On August 6th, 2019, our third party DNS providers’ engineers began to see unexpected query traffic patterns against their Managed DNS platform across all their global datacenters. Our DNS provider internal monitoring graphs showed unusual traffic reductions or floods across all regions. During the investigation, they determined that changes related to their on-going network maintenance inadvertently caused all DNS query traffic to be routed to two of their global Points of Presence (PoP). The errant traffic routing was the result of a configuration error made by our DNS provider in an attempt to alter transit provider announcements at each site as part of the maintenance. This resulted in unexpected DNS failures.

All affected sites have been restored. All services are operational at this time.

The issue was cleared: August 6, 2019 at 16:41 UTC

Next Steps

Acquia is working with all involved teams to correct our internal workflow to prevent any delay in identifying similar situations in the future.

The Acquia Cloud products and engineering teams are actively investigating a number of enhancements to our DNS management systems to mitigate the risk of customer impact in the event of a similar DNS vendor outage in the future.

Posted Aug 08, 2019 - 19:02 UTC

Resolved
All services are operational at this time. We will provide additional information in a postmortem to follow.
Posted Aug 06, 2019 - 17:09 UTC
Monitoring
We have identified the service interruption for some Acquia Cloud Site Factory, Acquia Cloud Professional, and Acquia Cloud Enterprise sites. We are currently monitoring this service interruption. We will provide additional updates when services are fully restored.
Posted Aug 06, 2019 - 16:41 UTC
Investigating
We are currently investigating an interruption of service for some Acquia Cloud Site Factory, Acquia Cloud Professional, and Acquia Cloud Enterprise sites. We will provide additional information as it becomes available.
Posted Aug 06, 2019 - 16:02 UTC
This incident affected: Cloud Platform Enterprise, Cloud Platform Professional, and Acquia Site Factory.