Wednesday January 11th 2023, 16:41 - 17:09 UTC
Customers could not launch Appium or Live tests running on real devices for approximately thirty minutes.
During normal operation of our Kubernetes cluster, the pods running the service responsible for running live and Appium real device-based tests began a graceful shutdown and restart. As part of the shutdown procedure, the embedded jetty server disabled some application components. One of these was the WebSocket client, this meant that connections could not be established to real devices, and thus no new appium or live tests could be started. After the graceful shutdown period of ~25 minutes, Kubernetes forcefully restarted the pods, and all functionality recovered. There appears to be an issue with how WebSocket connections are managed during the shutdown process leading to it ultimately timing out.
Kubernetes forced the shutdown process to complete and scheduled new pods which resolved the issue.
We are looking into what can be done to manage WebSocket connections more effectively during the shutdown process.