2018-October-3 Service Incident
Incident Report for Sauce Labs Inc
Postmortem

Date: October 3, 2018
Time: 12:01 pm - 2:02 pm PDT; degraded service from 2:02 pm to 2:55 pm PDT

*What happened:
*
Wait times for VMs in our PC cloud were above normal.

*Why it happened:
*
A issue was discovered in our VM dispatcher service that is triggered under higher than normal load. The issue caused it to underreport the total number of VMs available.

*How we fixed it:
*
The problem resolved itself as load dropped.

*What we are doing to prevent it from happening again:
*
We’ve made changes to our dispatcher retry logic to stabilize our capacity under high load.  We’re continuing to work on improvements to the logic used by the dispatcher service to improve the resiliency of its distributed reporting under load. We are also adding additional capacity.

Posted 2 months ago. Oct 11, 2018 - 13:13 PDT

Resolved
Wait times for our PC Cloud have returned to normal levels. All services are fully operational.
Posted 2 months ago. Oct 03, 2018 - 15:18 PDT
Monitoring
Wait times for our PC cloud are falling to normal levels, but still elevated. We expect a full recovery and are monitoring closely.
Posted 2 months ago. Oct 03, 2018 - 13:41 PDT
Update
Wait times for our PC cloud are still high. Our investigations continue.
Posted 2 months ago. Oct 03, 2018 - 13:01 PDT
Investigating
We are experiencing high wait times on all PC platforms. We are investigating.
Posted 2 months ago. Oct 03, 2018 - 12:31 PDT
This incident affected: Automated VM Testing (Automated PC Testing).