Date: March 8, 2018
Time: 3:37 PM - 5:18 PM PST
Wait times for VMs were high and there were periods of tests not starting. This impacted all VM clouds.
Why did it happen:
A service responsible for delivering payloads to our VMs started experiencing issues after a deploy due to a preexisting bug.
What did we do to fix it:
We reverted our payload delivery system to a proven legacy service, which then allowed us to continue booting VM’s normally.
What we are doing to prevent this from happening again:
We’ve fixed the aforementioned bug in the code and are working on deeper refactoring effort, as well as expanded load and functional testing, to eliminate the possibility of this issue reoccurring.