Friday February 27th 2023, 03:00 - 14:32 UTC
A failure on the primary node in our European based IPsec cluster caused customer tests (using IPSec tunnels) to fail in that region. Throughout the incident, tests that had a destination within a customer network were intermittently blackholed.
Our IPsec Tunnel health checks failed because BGP was down towards the primary IPSec firewall cluster node, which caused network errors on the IPsec cluster when tests were attempting to reach customer networks.
The secondary node in the IPSec cluster was first rebooted to ensure that this node did not have any failures. After verifying services would remain stable, we initiated a failover from the primary to the secondary node, which resulted in restoring BGP on the cluster.
A change scheduled for the March 25 EU maintenance window will involve upgrading the IPSec firewall in our Europe region, as suggested by the vendor. We also performed the same maintenance for our US region’s IPSec firewall during the US maintenance window on March 18.