Thursday April 6th 2023, 21:52 - 23:44 UTC
The service responsible for starting tunnels and assigning jobs to tunnels failed due to increased memory usage. This affected all jobs in the US West data center as they needed to be checked for tunnel access.
There was a bottleneck in our tunnel services that caused increased memory usage under heavy load.
We modified the tunnel services to distribute requests better, and to reduce memory consumption when those requests were being made.
The initial condition causing the increased memory usage on the tunnel services has been resolved, and we will improve our monitoring to alert earlier on high memory usage conditions.