Date: August 20, 2018
Time: 3:40pm - 7:55pm PDT
*New tests did not start across all VM platforms.
*Why did it happen:
*While mitigating a database issue, we encountered a network issue that prevented Operations from completing configuration changes. This caused the system to get into a backed up state, which was further exacerbated by a bug in handling a large backlog of tests.
*How did we fix it:
*We manually completed the configuration changes and terminated the tests in the backlog.
*What are we doing to prevent it from happening again:
*We are introducing more process around database changes. The network issue should be resolved by this weekend’s maintenance window. We are also working to make the system more resilient when the load causes a backup in test execution.