Date: August 31, 2018
Time: 2:41am - 4:20 am PDT
*Tests did not start on our Virtual platforms. The Web Application and REST API were unavailable.
*Why it happened:
*Our primary database server had a kernel level issue that made the database unavailable. This caused many of our services to fail. The failover to our secondary database server began but did not complete quickly enough to prevent a significant impact upon our services.
*How we fixed it:
*We completed the failover to the secondary database server manually.
*What we are doing to prevent it from happening again:
*We have added logging to the servers so that we will be able to get more detailed information in the event there is another occurrence. Additionally, we’re implementing an improved failover system which will allow us to seamlessly shift from healthy to unhealthy database nodes in the event of an issue. Finally, we’ll continue to work with our support vendor to harden our DB implementation and attempt to determine a root cause for this specific issue.