Sauce Labs Maintenance Windows for Sauce Labs
Customers may experience intermittent errors during automated browser and virtual mobile device tests in our US-West-1 datacenter. We are closely monitoring and investigating the affected services.
2023-September-7 Resolved Service Incident
Incident Report for Sauce Labs
Postmortem

Dates:

Thursday September 7th 2023, 22:36 - 22:55 UTC

What happened:

Users experienced high wait times, followed by elevated error rates both as jobs timed out and after restoration as we handled the backlog of requests. This impacted Windows, Mac, iOS Simulator, and Android Emulator tests in the US West region. 

Why it happened:

An internal service responsible for maintaining the virtual device state across our various platforms crashed due to a failover of MemoryStore during routine GCP maintenance. During this event, the service did not recover gracefully, and it caused another internal service responsible for pre-launching virtual devices based on demand to have connectivity issues with the cloud state service. Due to this connectivity issue, we did not pre-launch devices successfully for the duration of the incident leading to long device wait times. 

How we fixed it:

We restarted the cloud state service and then, subsequently, the pre-launching service. After that action, all connections were re-established, and the services began running normally. 

What we are doing to prevent it from happening again:

We are enhancing the cloud state service to handle a MemoryStore outage better, specifically introducing retry logic for reestablishing connectivity.

Posted Sep 21, 2023 - 10:38 UTC

Resolved
Between 22:39 - 22:58 UTC, we experienced elevated wait times affecting tests in our US-West-1 data center. We have identified the issue and taken remedial action.
Posted Sep 07, 2023 - 23:49 UTC