Episerver Digital Experience Cloud™ Service (DXC Service) is the cloud-based offer from Episerver based on Microsoft cloud technology. A solution that delivers high availability and performance, easy connectivity with other cloud services and existing systems, ability to manage spikes in customer demand, and a platform that is ready to seamlessly adopt the latest technology updates.
Between December 11th and December 14th 2018 a subset of customers hosted in the Europe and West US Digital Experience Cloud regions experienced HTTP 500 responses on their websites and the following report describes additional details around the event.
December 11th, 2018
6:46 AM CET The first alert alert is received and an incident support ticket is created with level 1 support.
3:59 PM CET It is identified that the issue is not isolated to a single client but affecting a subset of clients hosted in the Digital Experience Cloud Europe region. Since the issue is surfacing on a patch Tuesday and the error message indicates that this is an OSE related issue a priority A case is opened with Microsoft. The error messages shows that there is a failure to load the third-party component React.
December 12th, 2018
08:30 AM CET It is identified that upgrading React to the latest version of React resolves the error and this is communicated to affected clients as a workaround as we continue to investigate the root cause with Microsoft.
16:29 PM AM CET, The first alert related to a client affected in the Digital Experience Cloud region West US is received.
December 13th, 2018
09:00 AM CET: Preliminary root cause identified and the likely cause for the interruptions is a Critical security update of Microsofts Javascript engine ChakraCore (CVE-2018-8624) being rolled out by Microsoft on Azure App Services throughout all Azure regions. The majority of client services has recovered by implementing the work around but we are still receiving occasional reports from clients.
December 14th, 2018
10:30 AM CET: No more incidents have been reported and the incident is marked as resolved. We continue to work with Microsoft to map the events leading up to the incident.
January 7th, 2019
11:21 AM CET: The full root cause analysis report is received from Microsoft and the Incident ticket is closed.
The investigation determined that the default javascript engine, MSIE was being loaded properly before the latest update occurred. However, after the latest update MSIE failed to load and the default JavaScript engine fell back to V8 which did not have all required components to support all features of older versions of React; hence, a misleading exception. The underlying problem was discovered to be that the new release did not include the newer chakra.dll (native lib backing MSIE engine).
The issue mainly affected implementations of React leveraging server side components.
Episerver was working closely with Microsoft on this issue and has been informed that the following corrective and preventive actions have or will be undertaken by Microsoft.
Engineers reviewed the deployment release package for the latest update and found a bug that prevented the newer chakra.dll from being utilized properly. A fix was immediately put together, tested and deployed on December 13, 2018 11:53:29 AM to mitigate all customers who had the potential to experience this issue.
A postmortem of this incident has included repair items for the App Service release process and code reviews.
We apologize for the impact to affected customers. We have a strong commitment to delivering high availability for our services and we will do everything we can to learn from the event and to avoid a recurrence in the future.
We also want to emphasize the importance for customers to regularly upgrade third party components used in their solutions. Especially since the work around for this incident was to upgrade to the latest version of React, solutions already using the latest version of React remained unaffected by this event.