Friday May 5th 2023, 10:05 - 11:02 UTC
For the duration of the incident, customers experienced slow performance on the Sauce Labs UI, and when executing test requests in all regions.
A database node became responsive due to a higher than normal number of requests. When this happens, the database failover mechanism typically fires but it failed to do so fully due to a configuration issue.
How we fixed it:
The faulty database node was manually disabled. As a result, all services connected to another node.
The failover mechanism has been fixed. Next time a similar situation happens, our services will be able to automatically switch to a secondary database node.