Sauce Labs Maintenance Windows for Sauce Labs
Sauce Labs will have a four hour previously scheduled maintenance window on Wednesday, July 10th, starting at 04:00 UTC and ending at 08:00 UTC. During this maintenance window, we will be making updates to our infrastructure and services in our US West data center. These maintenance actions may cause portions of the service (including running automated and manual tests) to be unavailable for up to four hours.
2023-October-3 Service Incident
Incident Report for Sauce Labs
Postmortem

Dates:

Monday October 2nd 2023, 11:51 UTC - Wednesday October 4th 15:20 UTC

What happened:

A small percentage of customer Appium tests failed when certain commands were executed, like capturing screenshots or executing custom scripts. 

Why it happened:

A connection pool used by our Appium server was intermittently exhausted during spikes in the usage of mid-session install scripts. During these spikes, we were leaking HTTP client connections, which caused timeouts and eventual connection issues to the Appium server. 

How we fixed it:

We restored the functionality of the service in two ways: 

  • Firstly, we gradually recycled pods to see if cleanly starting these processes would clear the issue. 
  • That bought us time to further debug the issue and we discovered the leak of HTTP client connections which led to this timeout issue. The team then put together a fix for the issue and deployed it to production. 

What we are doing to prevent it from happening again:

During the incident, we addressed the HTTP client connection leak and enhanced monitoring to better identify this specific issue should it happen again in the future.

Posted Oct 23, 2023 - 12:06 UTC

Resolved
We have resolved the issue which caused timeouts when running iOS and Android real device tests in the US West and EU Central datacenters. All services are fully operational.
Posted Oct 03, 2023 - 18:33 UTC
Investigating
We are seeing an increase in timeouts when running iOS and Android real device tests in the US West and EU Central datacenters. We are investigating.
Posted Oct 03, 2023 - 15:03 UTC
This incident affected: Automated Real Device Testing (US-West, EU-Central), Live Real Device Testing (US-West, EU-Central), and Native Framework Mobile App Testing (US-West, EU-Central).