Date: November 30, 2017
Time: 7:04 - 8:06 AM PST
Sauce Labs’ VM capacity dropped, causing high wait times for test VMs.
Why Did It Happen:
A bug in the system used to predict VM demand caused the management system responsible for pre-booting VMs to stop requesting new resources, which in turn caused a drop in available capacity.
What did we do to fix it:
We manually set the VM demand values and allowed our cloud to catch up.
What are we doing to prevent it from happening again:
We've corrected the initial bug in our prediction service, as well as hardened the management system so that it is both easier to debug and reacts more quickly to bad inputs.