Sauce Labs Maintenance Windows for Sauce Labs
Customers may experience intermittent errors during automated browser and virtual mobile device tests in our US-West-1 datacenter. We are closely monitoring and investigating the affected services.
2023-Feb-22 Service Incident
Incident Report for Sauce Labs
Postmortem

Dates:

Wednesday February 22th 2023, 14:20 - 16:13 UTC

What happened:

Some sessions running on iOS devices became disconnected, and some iOS devices appeared as “busy” for the duration of the incident preventing new sessions from being created on them. 

Why it happened:

We gradually performed an upgrade to the cluster that runs our production device pools, starting with EU iOS. After ~50 pools were upgraded, we started getting alerts about pool containers being down, prompting us to stop the upgrade process. Once the process was halted, we noticed that the just upgraded pools were in an “Error” state, throwing errors referencing network connectivity. 

How we fixed it:

We immediately stopped the upgrade process and rolled back to the previous version, starting with one pool to test, and then rolling back all the affected pools. This got the pools back into a “Running” state, but network connectivity errors were still thrown from inside the pools. After more troubleshooting, we restarted the affected servers, restoring network connectivity. 

What we are doing to prevent it from happening again:

This upgrade was run in our staging environment successfully with no degradation in device availability or network connectivity so we were confident in performing it in production. Going forward, we will approach these upgrades with a slower gradual rollout to better expose any issues they may introduce.

Posted Feb 23, 2023 - 23:10 UTC

Resolved
After taking remedial action, availability of our iOS real devices has returned to normal levels. This issue is now resolved, all services are fully operational.
Posted Feb 22, 2023 - 16:34 UTC
Update
We are experiencing ongoing availability issues with our iOS Real devices for Live & Automated tests in the EU-Central-1 datacenter. we are still investigating.
Posted Feb 22, 2023 - 15:44 UTC
Investigating
We are experiencing availability issues with our iOS Real devices for Live & Automated tests in the EU-Central-1 datacenter. we are currently investigating.
Posted Feb 22, 2023 - 14:41 UTC
This incident affected: Automated Real Device Testing (EU-Central), Live Real Device Testing (EU-Central), and Native Framework Mobile App Testing (EU-Central).