Tony's ramblings on Open Source Software, Life and Photography

Disaster Action Plan

Tonight was a good test of our disaster action plan. Lighting struck either our building or our water tower and took down two of our three phases of power. The one remaining phase was producing extremely dirty power that made me afraid of frying anything plugged in.

For our network, in an extended power outage I really only have one server that MUST stay up and running. It's the server that customer's use.

For our two racks I have four large rack mounted UPS's. On average I get around 45 minutes of power out of them before we have to switch to generator. That gives me enough time to get to the office, flip a switch and fire up the generator.

This time however, we found that we were able to get clean power out of one of the subpanels so I decided to run an extension cord to power just the server I needed, the network switch, the router and the firewall. The firewall and switches pull very little power and the server I need to run is a 1U with only 2 hard drives, so it's not really pulling a heavy load either.

So, I set about shutting down the rest of the network until the main was back up. That's when I learned for all my planning that my server could not be an island.

The first thing I noticed was that e-mail couldn't get out. So, I thought "no big deal, I'll go ahead and fire up the internal email server." I hadn't realized that it was configured to relay all e-mail through the internal mail server.

Next, I noticed that e-mail still wasn't getting out. That's when I realized that all the DNS was being handled by one of two other servers, neither of which was online. So, I had to bring up one of the DNS servers. Really I should have brought one of those online anyway because one of the DNS servers is also the phone system.

Then I found that my paging e-mails still weren't getting out. For some reason at this exact time Sprint decided that I wasn't allowed to go talk to the US Cellular DNS server to find out where their mail server was. Weird - it still worked over my Verizon line and my BlackBerry. So, I used my Verizon line to figure out the DNS entry I needed and temporarily set up a master zone in the DNS server so I could get the e-mails delivered. It worked, but it's an ugly hack.

Net result: there's a new disaster action plan brewing in my head that involves making the one server that must stay up an island unto itself just in case. In a worst case scenario I want to be able to pick that server up, drive it to our other office and connect it in a pinch without worrying about having to change anything other than it's IP address.

In our efforts to make a well planned out network we forgot to place all required services on that one server so it wouldn't be dependent on everything else.

Still, we had very little downtime (15 minutes worth?) but it was still a great learning experience.


Categories: