2017-April-05 Service Incident
Incident Report for Sauce Labs Inc

Date: April 5, 2017
Time: 6:40am PDT - 7:57am PDT

What Happened:
Wait times for VMs exceeded 30 seconds, which can cause some tests to error or not start.

Why it Happened:
One of the VMs running two core Sauce cloud management services went down. There are redundant VMs with a failover mechanism but an inefficiency in the component that starts and stops VMs led to a reduction in the number of VMs available to customers.

*What we did to fix it: *
We deployed a change to the component that starts/stops customer VMs to better handle increased load.

What we are doing to prevent this from happening again: - Investigate whether it’s possible to detect this problem sooner, which could prevent this situation from causing an outage. - Continue to investigate the clustering system (and failover mechanism) to look for opportunities for further improvements. - Make other performance enhancements to related services

Posted 9 months ago. Apr 19, 2017 - 11:28 PDT

Resolved
This incident has been resolved.
Posted 10 months ago. Apr 05, 2017 - 08:13 PDT
Monitoring
We have been experiencing high wait times on automated and manual tests. Corrective action has been taken and things are returning to normal.
Posted 10 months ago. Apr 05, 2017 - 08:01 PDT
Investigating
Our system monitoring has detected a problem. We are determining its scope and will provide more details soon.
Posted 10 months ago. Apr 05, 2017 - 07:01 PDT
This incident affected: Sauce Automated and Sauce Manual.