This post is going to be riddled with fail, most of it mine. The short version is, my site has been down for over two weeks and it took me a week and a half to even notice it. That, of course describes fail number one and fail number two, but there’s plenty more fail where that came from!
It all began when I started shifting around some of my own accounts in order to free up some IP addresses. Some of the domains had expired, some did not need their own IP and then there was this site, echoreply.us that had its own IP and I could not figure out why. So, I moved it to the main system IP, went to bed and forgot all about it.
What I completely forgot was that I had a 4 1/2 year old trial SSL certificate installed, and the stupid dolt that lives on my server (we’ll just say its name rhymes with zeb ghost scavenger) happily let me move the site to the main shared IP, despite the server name vhost also using it. That’s right, anyone who visited ‘/’ on this site for nearly two weeks got redirected to the Apache success page.
To fix it, I just removed the account and was ready to restore from a day old backup, when I realized .. oh crap, backups are corrupt. I think we’re somewhere near fail number 11 at this point, I completely lost count.
Thankfully, I store most parts of the site under version control. I was able to retrieve it from my build bot installation and restore the database from a week old copy that I received via e-mail.