Friday December 6th 2022, 14:53 - Saturday December 7th 17:30 UTC
We were experiencing intermittent connection timeouts between internal services in our US region. There was no perceived or reported customer impact, but there may have been a slightly elevated error rate due to these timeouts.
An internal service responsible for proxying requests between services was experiencing CPU throttling, causing intermittent latency and timing out some requests. In this particular case, the service was hitting resource limits allocated to the service.
We increased the resource limits for the service and saw requests return to normal.
Although our synthetic monitoring and alerting made us aware of this issue, we are putting in better observability and alerting for this particular service at both the application and infrastructure levels. This will give us a more direct indication of where the underlying problem exists.