Thursday May 26th 2022, 17:37 - 18:18 UTC
In our US-West and US-East data centers, the operation of automated Virtual Device and Real Device tests was impacted after a third party migrated part of our cloud infrastructure to a new platform service.
After investigation, Sauce Labs identified that the third party had rate limited our traffic to 100Mbps on our interconnects after the migration. This caused communication between our Cloud infrastructure provider and on-premise nodes to intermittently fail due to bandwidth contention problems. Normally traffic over a single interconnect is ~1Gbps utilization.
The incident was resolved when the third party reverted their migration changes of our components inside our Cloud infrastructure.
We asked the third party to migrate some interconnects for us to a new platform so that we could leverage BFD, which would give us faster failure detection times, which would improve the reliability of hybrid communications.