Date: April 18, 2017
Time: 10:30am PDT - 2:47pm PDT
Tests could not be run and Sauce Connect tunnels could not be created.
Why it Happened:
We deployed a change to improve VM boot performance that triggered a problem creating Mac/iOS VMs and Sauce Connect Proxy tunnels on some hosts. The failures resulted in a cycle that subsequently prevented creation of VMs of all OS types.
What we did to fix it:
We reverted the change and patched a core cloud component to stop the cascading failure cycle.
What we are doing to prevent this from happening again:
- Improvements to predictive monitoring.
- Improvements to cloud deploy strategy to allow for more fine grained control, resilience and safer rollbacks.
- Significant investments into performance and reliability of cloud management services