2018-October-9 Service Incident
Incident Report for Sauce Labs Inc
Postmortem

Date: October 9, 2018
Time: 11:50 am to 1:08 pm PDT

*What happened:
*
Wait times for VMs in our PC cloud were above normal.

*Why it happened:
*
A issue was discovered in our VM dispatcher service that is triggered under higher than normal load. The issue caused it to underreport the total number of VMs available.

*How we fixed it:
*
The problem resolved itself as load dropped.

*What we are doing to prevent it from happening again:
*
We’ve made changes to our dispatcher retry logic to stabilize our capacity under high load.  We’re continuing to work on improvements to the logic used by the dispatcher service to improve the resiliency of its distributed reporting under load. We are also adding additional capacity.

Posted about 2 months ago. Oct 17, 2018 - 11:24 PDT

Resolved
Wait times for our PC Cloud have returned to normal levels. All services are fully operational.
Posted 2 months ago. Oct 09, 2018 - 13:49 PDT
Update
Wait times on our PC cloud have improved significantly, but are still not at normal levels. We are continuing to monitor the situation.
Posted 2 months ago. Oct 09, 2018 - 13:31 PDT
Investigating
We are experiencing high wait times for tests run on our PC cloud. We are investigating.
Posted 2 months ago. Oct 09, 2018 - 12:33 PDT
This incident affected: Automated VM Testing (Automated PC Testing).