2017-April-05 Service Incident
Incident Report for Sauce Labs Inc

Date: April 5, 2017
Time: 6:40am PDT - 7:57am PDT

What Happened:
Wait times for VMs exceeded 30 seconds, which can cause some tests to error or not start.

Why it Happened:
One of the VMs running two core Sauce cloud management services went down. There are redundant VMs with a failover mechanism but an inefficiency in the component that starts and stops VMs led to a reduction in the number of VMs available to customers.

*What we did to fix it: *
We deployed a change to the component that starts/stops customer VMs to better handle increased load.

What we are doing to prevent this from happening again: - Investigate whether it’s possible to detect this problem sooner, which could prevent this situation from causing an outage. - Continue to investigate the clustering system (and failover mechanism) to look for opportunities for further improvements. - Make other performance enhancements to related services

Posted 7 months ago. Apr 19, 2017 - 11:28 PDT

Resolved
This incident has been resolved.
Posted 8 months ago. Apr 05, 2017 - 08:13 PDT
Monitoring
We have been experiencing high wait times on automated and manual tests. Corrective action has been taken and things are returning to normal.
Posted 8 months ago. Apr 05, 2017 - 08:01 PDT
Investigating
Our system monitoring has detected a problem. We are determining its scope and will provide more details soon.
Posted 8 months ago. Apr 05, 2017 - 07:01 PDT
This incident affected: Sauce Automated and Sauce Manual.