Investigating service interruption
Incident Report for Acquia, Inc.
Postmortem

Purpose of This Report

This is a summary and analysis of an issue that occurred with the delivery of an Acquia product or service. The purpose of this document is to share details about what happened and why, so there is a common understanding of what is required to prevent a future occurrence if at all possible.

What happened

On Feb,5 2019 Acquia released a component as part of an unplanned maintenance. The release included changes to allow independent release of a file called prepend.php. During the release, environments which directly required the file site-info.php in their code base experienced PHP fatal errors on Drupal requests.

Rolling back the release of the component triggered additional failures for some customers where the reload of PHP happened after the prepend.php file was already removed.

What we did about it

When Acquia detected that the component release was causing some customer sites to return 500 errors, the component release was rolled back.

For some customers, the roll back caused their sites to return a different 500 error, due to the missing prepend.php. This was fixed by restarting PHP.

Identified Root Cause

Acquia’s unplanned maintenance component release moved some files (/var/www/site-scripts/*). These files are included automatically where needed and the path changing was not expected to cause application impact. However, some customers on Acquia Cloud Enterprise and Acquia Cloud Site Factory had a path to site-info.php hard coded in their application, which caused a PHP fatal error as impacted Applications tried to load the file from the old path.

When the scale of the impact was realized, Acquia initiated a rollback to the preceding version. This rollback caused temporary fatal errors for any customers that saw traffic between the package downgrade and the PHP reload. Unexpectedly, for some customers, the PHP reload did not execute correctly, causing PHP fatal errors as php.ini trying to load the file that was now removed.

Posted Feb 12, 2019 - 19:26 UTC

Resolved
Service is full restored for Acquia Cloud Enterprise, Acquia Site Factory, and Acquia Cloud Professional.

*Apologies for the incorrect update at Feb 06, 2019 - 04:24 UTC
Posted Feb 06, 2019 - 04:32 UTC
Investigating
We are currently investigating this issue.
Posted Feb 06, 2019 - 04:24 UTC
Update
Service for some Acquia Cloud Enterprise, Acquia Site Factory, and Acquia Cloud Professional sites is currently interrupted. A solution is currently being rolled out to the platform. We will update this status page once the roll out is completed.
Posted Feb 06, 2019 - 02:59 UTC
Identified
Service for some Acquia Cloud Enterprise, Acquia Site Factory, and Acquia Cloud Professional sites is currently interrupted. We have identified the issue and are working to resolve this service interruption at this time. We will provide additional updates when services are fully restored.
Posted Feb 06, 2019 - 01:50 UTC
Investigating
Service for some Acquia Cloud Enterprise and Acquia Site Factory sites is currently interrupted. We are working to resolve this service interruption at this time. We will provide additional updates when services are fully restored.
Posted Feb 06, 2019 - 01:11 UTC
This incident affected: Cloud Platform Enterprise, Cloud Platform Professional, and Acquia Site Factory.