I recall back in December 2017, I was 5 months onto my new job at ICF Next and just fully taken over a major Azure customer that had a critical web application hosted on Azure Classic.
We had started hearing about meltdown and spectre earlier in the week and I also recall reading a Microsoft post that they will be scheduling a maintenance window by the end of the month. Then suddenly, on Friday, when our Azure customer was planned to have a major event during the evening where we were expecting up to 1.5 million visitors on the site within 2 hours, Microsoft sends a message that since the vulnerability had been made public, they were starting the patching process right away and that we should not see any application downtime as long as the environment is properly architected (highly available, availability sets and all). I let our customer know and told them not to worry, this is a well architected environment and the site will not go down while some servers get patched and rebooted since its a proper highly available environment, little did I know.
So the peak of our traffic was expected at 9:00pm EST and at exactly 8:00pm, I start seeing VMs rebooting within our Resource Groups. I was feeling a bit smug seeing the VMs and even WAFs reboot but the site still handling traffic. Then at exactly, 8:40pm EST when we had around 24,000 active users on the site, our IaaS SQL Server VM goes down for a reboot and little did I know at that point that that was a single point of failure for the environment and BOOM, down goes the whole website with an ugly IIS message. Turns out our SQL server configuration was not active/active but active/passive. It was an awkward 30 – 45 minutes explaining to my customer why this was happening and that we can’t really do anything until the Database comes back up.
That day taught me lessons that I will hold on for the rest of my life.
At Ignite, Microsoft had a great session about how they handled all of this on their end and if you do not have time to watch all 4 parts of the videos, you 100% need to watch the 3rd one. It clearly shows Microsoft’s commitment to its Customers.
Spectre // Meltdown: A retrospective, 1/4: https://streamable.com/r3frc
Spectre // Meltdown: A retrospective, 2/4: https://streamable.com/zncrl
Spectre // Meltdown: A retrospective, 3/4: https://streamable.com/nfbpl
Spectre // Meltdown: A retrospective, 4/4: https://streamable.com/2kl6h
The links were found on this Microsoft post: https://techcommunity.microsoft.com/t5/Microsoft-Ignite-Content-2019/Spectre-Meltdown-An-Azure-retrospective/m-p/946375
Leave a Reply