Sauce Labs Maintenance Windows for Sauce Labs
Customers may experience intermittent errors during automated browser and virtual mobile device tests in our US-West-1 datacenter. We are closely monitoring and investigating the affected services.
2023-January-11 Service Incident
Incident Report for Sauce Labs
Postmortem

Dates:

Wednesday January 11th 2023, 16:41 - 17:09 UTC

What happened:

Customers could not launch Appium or Live tests running on real devices for approximately thirty minutes.

Why it happened:

During normal operation of our Kubernetes cluster, the pods running the service responsible for running live and Appium real device-based tests began a graceful shutdown and restart. As part of the shutdown procedure, the embedded jetty server disabled some application components. One of these was the WebSocket client, this meant that connections could not be established to real devices, and thus no new appium or live tests could be started. After the graceful shutdown period of ~25 minutes, Kubernetes forcefully restarted the pods, and all functionality recovered. There appears to be an issue with how WebSocket connections are managed during the shutdown process leading to it ultimately timing out. 

How we fixed it:

Kubernetes forced the shutdown process to complete and scheduled new pods which resolved the issue. 

What we are doing to prevent it from happening again:

We are looking into what can be done to manage WebSocket connections more effectively during the shutdown process.

Posted Feb 20, 2023 - 08:59 UTC

Resolved
After taking remedial action, both EU-Central-1 and US-West-1 Real Device Clouds have full device availability and sessions are starting as expected.
This issue is now resolved, and all services are fully operational.
Posted Jan 11, 2023 - 17:59 UTC
Update
We are continuing to investigate this issue.
Posted Jan 11, 2023 - 17:58 UTC
Update
EU-Central-1 datacenter has recovered Real Device Cloud.
We are still currently experiencing decreased availability in our Real Device Cloud for the US-West-1 datacenter. Live sessions are slow to load. We investigating.
Posted Jan 11, 2023 - 17:33 UTC
Investigating
We are currently experiencing decreased availability in our Real Device Cloud for the US-West-1 & Eu-Central-1 datacenters. We are currently investigating
Posted Jan 11, 2023 - 17:09 UTC
This incident affected: Automated Real Device Testing (US-West, EU-Central) and Live Real Device Testing (US-West, EU-Central).