Sauce Labs Maintenance Window
We are currently experiencing a slight reduction in available public and private iOS devices in the EU-Central Data Center. We expect to restore full availability for all iOS users in the EU-Central Data Center by tomorrow, Thursday, July 3, 2025, end of day

2022-August-11 Resolved Service Incident

Incident Report for Sauce Labs

Postmortem

Dates:

Thursday August 11th 2022, 00:24 - 14:50 UTC

What happened:

During a period of high volume, the service that handles incoming Selenium webdriver commands for both new and active sessions saw intermittent HTTP 503 errors: “No server is available to handle this request”. 

Why it happened:

The main driver behind the increased error rate was CPU throttling on the nodes that run the primary service that handles these command requests. This caused the service to scale up and down over the duration of the incident during which we saw:

  •  The graceful shutdown of replicas taking upwards of 30 minutes
  •  Uneven load balancing across the replicas for the service

These conditions ultimately led to limited capacity for command requests leading to the HTTP 503 errors. 

How we fixed it:

The team responded by terminating the replicas that were taking a long time to shut down during the high period of request volume, which increased available capacity. 

What we are doing to prevent it from happening again:

As part of the stabilization effort, we have disabled auto-scaling until:

  • Load balancing is improved for this service
  • The impact that the long graceful shutdown has on the service’s availability is reduced

Additionally, we identified steps to improve future detection and response times in similar cases.

Posted Sep 22, 2022 - 14:40 UTC

Resolved

Between 05:00 UTC and 13:15 UTC We experienced elevated 5xx error rate on all automated tests (Virtual and Real Device Cloud) in our US-West-1 Data Center. This issue is now resolved, all services are fully operational.
Posted Aug 11, 2022 - 13:00 UTC