I stopped relying long ago on RAID-5 after I had two separate installs that the controller card corrupted more than one disk in the array at the same time, causing total data loss.
When you're dealing with more than a terabyte of data, restoring from any sort of backup medium becomes a painful process. All of my data is backed up on DVD (yeah yeah, I've heard the complaints before, but you don't know what I know about DVD backups) but restoring a terabyte of data from DVD's can take a week or so.
Enter DRBD. DRBD stands for Distributed Block Device. Essentially it's RAID-1 that works over Ethernet. DRBD rides on top of whatever physical storage medium and network you have, but below the file system level. You run it on multiple machines, and set up an identical hard drive configuration on each machine. The DRBD partition is automatically replicated from the primary server to the secondary. Using tools like "heartbeat" you can even monitor this system automatically and promote the secondary server to the primary in the event of a failure.
Personally, I'm more interested in not having to restore from backup than I am instant failover, so I elected to configure my system without heartbeat, mostly because it's easier. I'm using two 1U servers each configured with four 500 GB hard drives. One of the two network ports in each server is on a separate subnet and uses a crossover cable to connect to the other server for replication. Each server has dual hotswap power supplies, each supply fed from a different battery backup UPS.
Installing and configuring drbd really isn't that hard. I found a nice tutorial describing how to install it in Ubuntu Hardy. There's a couple of other things you should know, however.
Realize, before taking my advice, that I'm a complete noob, and these are more notes of how I configured my own servers than any description of best practices. Most of the documentation and tutorials I found made assumptions that didn't fit with my particular needs.
First, when I installed my Ubuntu servers, I gave each one three 500 GB hard drives dedicated to be used with drbd. I configured those three to be a software RAID-0 to provide more storage space and more speed. The fourth hard drive in the server is where the OS is installed.
I learned that you don't want to automatically mount the drbd partition at boot, nor do you want the OS to check the filesystem automatically because it will try to do so before the partition is available. Make sure your fstab entry looks something like this:
/dev/drbd0 /media/mybigpartition ext3 relatime,noauto 0 0
The noauto keeps the OS from mounting it, and the last zero tells it not to do a filesystem check.
Then on the primary server I edited my /etc/rc.local file to add the following:
drbdadm primary all mount /media/mybigpartition
That way after all is said and done, drbd is told that this server is always primary (works for my environment where the second server is JUST a backup mirror - may not be appropriate for you) and then mounts the device in the filesystem.
The secondary server never actually mounts the device. Normally you'd use heartbeat to switch back and forth, but I didn't want that happening for me, not to mention it adds a layer of complexity.
Now, if one of my drives fail, I replace the drive and then I need to rebuild the RAID-0 on the dead machine:
mdadm --stop /dev/md0 mdadm --create --verbose /dev/md0 --level=0 --raid-devices=3 /dev/sdb1 /dev/sdc1 /dev/sdd1 --auto=md
After creating the RAID array again, I then issue the following on the server that still has valid data, causing it to use drbd to replicate the data to the other server:
drbdadm -- --overwrite-data-of-peer primary all
And there you have it. With the above config setup, my primary server will always start even if the secondary isn't available. If the primary dies, I will need to issue the primary all command on the secondary server and mount the partition in order to access the data. This means it's not an instantaneous failover, and requires manual intervention.

Post new comment