2022-December-06 Service Incident

Incident Report for Sauce Labs

Postmortem

Dates:

Friday December 6th 2022, 14:53 - Saturday December 7th 17:30 UTC

What happened:

We were experiencing intermittent connection timeouts between internal services in our US region. There was no perceived or reported customer impact, but there may have been a slightly elevated error rate due to these timeouts.

Why it happened:

An internal service responsible for proxying requests between services was experiencing CPU throttling, causing intermittent latency and timing out some requests. In this particular case, the service was hitting resource limits allocated to the service.

How we fixed it:

We increased the resource limits for the service and saw requests return to normal.

What we are doing to prevent it from happening again:

Although our synthetic monitoring and alerting made us aware of this issue, we are putting in better observability and alerting for this particular service at both the application and infrastructure levels. This will give us a more direct indication of where the underlying problem exists.

Posted Jan 10, 2023 - 12:39 UTC

Resolved

After a few hours of monitoring we believe this issue’s impact is negligible and should not affect testing. All services are fully operational.

Posted Dec 06, 2022 - 19:13 UTC

Update

We believe the customer impact of this issue is negligible, but our investigation continues.

Posted Dec 06, 2022 - 16:22 UTC

Investigating

We are investigating intermittent SSL connection timeouts on our ondemand.us-west-1.saucelabs.com endpoint. This may result in errors when attempting to run automated tests in our US-West region. We are investigating.

Posted Dec 06, 2022 - 15:14 UTC

This incident affected: Automated Browser Testing (US-West), Automated Virtual Mobile Device Testing (US-West), Automated Real Device Testing (US-West), and Native Framework Mobile App Testing (US-West).