Acquia has detected an interruption of service for some Acquia Cloud Site Factory, Acquia Cloud Professional, Acquia Cloud Enterprise sites and Acquia UI Services
Purpose of This Report
This is a summary and analysis of an issue that occurred with the delivery of an Acquia product or service. The purpose of this document is to share details about what happened and why, so there is a common understanding of what is required to prevent a future occurrence if at all possible.
What happened
On August 6, 2019 Acquia detected an interruption of service for some Acquia Cloud Site Factory, Acquia Cloud Professional, and Acquia Cloud Enterprise sites. Acquia also received reports from customers regarding possible interruption in service.
What we did about it
Acquia initiated our highest level of escalation and the reported interruption was investigated by a cross functional team consisting of Support, Operations, Product and Engineering. During the investigation Acquia contacted our network and edge partners for information and assistance. Acquia personnel noted that our DNS provider had scheduled maintenance occuring with the times of the reported interruption in service.
Identified Root Cause
On August 6th, 2019, our third party DNS providers’ engineers began to see unexpected query traffic patterns against their Managed DNS platform across all their global datacenters. Our DNS provider internal monitoring graphs showed unusual traffic reductions or floods across all regions. During the investigation, they determined that changes related to their on-going network maintenance inadvertently caused all DNS query traffic to be routed to two of their global Points of Presence (PoP). The errant traffic routing was the result of a configuration error made by our DNS provider in an attempt to alter transit provider announcements at each site as part of the maintenance. This resulted in unexpected DNS failures.
All affected sites have been restored. All services are operational at this time.
The issue was cleared: August 6, 2019 at 16:41 UTC
Next Steps
Acquia is working with all involved teams to correct our internal workflow to prevent any delay in identifying similar situations in the future.
The Acquia Cloud products and engineering teams are actively investigating a number of enhancements to our DNS management systems to mitigate the risk of customer impact in the event of a similar DNS vendor outage in the future.