Tuesday August 16th 2022, 10:07 - 16:04 UTC
A small percentage (< 1%) of users trying to load the test results dashboard in the web UI received an error message and the dashboard was unresponsive.
The API gateway we use to serve our user authentication services was experiencing intermittent issues that caused calls into those services to fail.
We redeployed the pods for the authentication services and then eventually restarted the API gateway which restored service.
We have added additional monitoring that will detect this specific error condition in the future. We are also expanding error rate alerting across our system and our API Gateway has been a focus of that effort.