Sauce Labs Maintenance Windows for Sauce Labs
Customers may experience intermittent errors during automated browser and virtual mobile device tests in our US-West-1 datacenter. We are closely monitoring and investigating the affected services.
2022-December-16 Resolved Service Incident 2
Incident Report for Sauce Labs
Postmortem

Dates:

Friday December 16th 2022, 20:51 - 21:31 UTC

What happened:

All tests waiting to be assigned to a virtual device during the incident were marked as failed. The error rate was ~70-80%, but primarily impacted customers requesting specific test devices.

Why it happened:

Demand for macOS increased to a point where it passed what was available. As this happened, new tests began to queue up which triggered the clearing of the new jobs queue resulting in all of the tests waiting in this queue being marked as failed. While the underlying issue was with macOS and iOS capacity it impacted all test types as the new test queue (which is shared) was backed up.

How we fixed it:

Clearing new tests queue restored the system from starvation; no additional action is usually required. In some cases, even after removing the new tests queue, the starvation comes back, and another clearing is performed.

What we are doing to prevent it from happening again:

We are looking at ways to increase our capacity specifically for macOS and iOS. We are also looking into ways that jobs can be cleared from queues by image name or platform, rather than clear the whole queue.

Posted Jan 24, 2023 - 10:06 UTC

Resolved
Between 20:53 UTC and 21:14 UTC we experienced a spike in job errors on our Virtual Device cloud in our US West Data Center. All services are fully operational.
Posted Dec 16, 2022 - 23:46 UTC