2017-April-05 Service Incident
Incident Report for Sauce Labs Inc

Date: April 5, 2017
Time: 6:40am PDT - 7:57am PDT

What Happened:
Wait times for VMs exceeded 30 seconds, which can cause some tests to error or not start.

Why it Happened:
One of the VMs running two core Sauce cloud management services went down. There are redundant VMs with a failover mechanism but an inefficiency in the component that starts and stops VMs led to a reduction in the number of VMs available to customers.

*What we did to fix it: *
We deployed a change to the component that starts/stops customer VMs to better handle increased load.

What we are doing to prevent this from happening again: - Investigate whether it’s possible to detect this problem sooner, which could prevent this situation from causing an outage. - Continue to investigate the clustering system (and failover mechanism) to look for opportunities for further improvements. - Make other performance enhancements to related services

Posted 11 months ago. Apr 19, 2017 - 11:28 PDT

This incident has been resolved.
Posted 12 months ago. Apr 05, 2017 - 08:13 PDT
We have been experiencing high wait times on automated and manual tests. Corrective action has been taken and things are returning to normal.
Posted 12 months ago. Apr 05, 2017 - 08:01 PDT
Our system monitoring has detected a problem. We are determining its scope and will provide more details soon.
Posted 12 months ago. Apr 05, 2017 - 07:01 PDT
This incident affected: Sauce Automated and Sauce Manual.