Tuesday 08/02/2022 07:34 - 09:07 UTC
Users attempting to access the TestFairy dashboard received an error. There was no data loss during this time and application distribution and session data were unaffected.
A key data service that TestFairy relies on to quickly read session metrics suffered from an out of memory condition and crashed. Once this service went down, the TestFairy dashboard could no longer query the service for metrics, and timed out.
Once the incident was recognized, the service was restarted. This allowed the service to resume processing queued requests. This also allowed the dashboard to query the service for metrics.
We are looking at the deployment and configuration of this data service to ensure that it is up to date and set to restart when this condition arises. We are also looking at ways the TestFairy dashboard can better handle this situation should it happen again.