20120616

Resurrection of the RAID

Have you had a day like this?

Hmm... My RAID was working the last time I rebooted.  Why can I not assemble it now?  Only two devices available?  Nonsense...there should be five.  What?  What's this?!  The others are SPARES?!!  No they're not...they're part of the RAID!  Awwww F*** F*** F*** F*** F*** F***....


Thank the heavens for these critical links, which I repeat here for posterity:
http://maillists.uci.edu/mailman/public/uci-linux/2007-December/002225.htmlhttp://www.storageforum.net/forum/archive/index.php/t-5890.html (saves more lives!)
http://neil.brown.name/blog/20120615073245 !!! important read !!!
As you can probably surmise, one of my arrays decided to get a little fancy on me this week.  It all started when I was trying to reallocate some drives to better utilize their storage.  I had moved all my data onto a makeshift array composed of three internal SATA and two USB-connected hard drives.  I know, a glutton for punishment.  The irony of that is the USB drives were the only two with accurate metadata, as will be discussed below.

I really don't know what happened, but somewhere between moving ALL my data onto that makeshift array and booting into my new array, the metadata of the three internal SATA drives got borked....badly.  I discovered this while attempting to load up my makeshift array for data migration.  Upon examining the superblocks of all the drives (mdadm -E), I discovered that the three internals received a metadata change that turned them into nameless spares.  The two USB drives escaped destruction, so thankfully I had some info handy to verify how the array needed to be rebuilt.

After grieving a while, and pondering a while longer, I did a search for something along the lines of "resurrect mdadm raid" and eventually came up with the first two links.  The third was an attempt to find the author of mdadm, just to see if he had anything interesting on his blog.  Turns out he did!  Now, I honestly don't know if I was bit by the bug he mentions in that post, but whatever happened did its job quite well.

So, I started a Goggle Doc called "raid hell" and started recording important details, like which drives and partitions were in use by that array, what info I was able to glean from mdadm -E, and eventually I felt almost brave enough to try a reassembly.  But before I got the balls for that, I imaged the three drives with dd, piped it through gzip, and dumped them onto the new array that thankfully had just enough space to hold them.  Now, if something bad happened, I had at least one life in reserve.

The next step was now to attempt an array recreation with the --assume-clean option: this option would eliminate the usual sync-up that takes place on array creation, thereby preventing destructive writing from taking place while experimentation was happening.

Oh, did I mention it was a RAID-5?

Of course, with most of the devices suffering amnesia, the challenge now was to figure out what order the drives were originally in.  It would have been as easy as letter-order, maybe, if I had not grown the array two separate times during the course of the previous run of data transfers.  So, to make life a little easier, I wrote a shell-script.  Device names like /dev/sdf1 and /dev/sdi2 were long and hard and ugly, so I replaced them with variables F, G, I, J, and K (the letters of the drives for my array).  It went something like this:

F="/dev/sdf1"
G="/dev/sdg2"
I="/dev/sdi2"
J="/dev/sdj1"
K="/dev/sdk1"
mdadm -S /dev/md4
mdadm --create /dev/md4 --assume-clean -l 5 -n 5 --metadata=1.2 $F $G $I $J $K
dd if=/dev/md4 bs=256 count=1 | xxd
pvscan
With this, I could easily copy/paste and comment out mdadm --create lines that had the incorrect drive order, and keep track of what permutations I had attempted.  Now the array was originally part of LVM, so I was looking for some LVM header funk with the dd command.  I only knew it was there because I had been running xxd on the array previously, and seen it go whizzing by.  pvscan would be my second litmus test, and should the right first drive appear, the physical volume and subsequently the volume group and logical volumes would all become recognized.  With this, it took only five runs to get K as the first drive.  I now had four drives left to reorder.

To accomplish this, dd and pvscan would no longer help - they did their work on the first drive only.  Since I had two easily-accessible LVs on that array - root and home - it would be easy to run fsck -fn to determine if the file system was actually readable.  This assumed, of course, that all the data on the array was in good shape.  I honestly had no reason to believe otherwise, and Neil Brown's post gave me a great deal of hope in that regard.  Basically, if only the metadata was getting changed, the rest of the array should be A-OK.  I knew it was not being written to at the time of the last good shutdown...because the array was basically a transfer target and not even supposed to be operating live systems.

It took another half-dozen or so attempts before I managed to get the latter four drives in the correct order.  Finally, fsck returned no errors and a very clean pair of file systems.  Of course, this was done with -n, so nothing was actually written to the array.  I kicked off a RAID-check with an

echo check > /sys/block/md4/md/sync_action
and then the power went out.  After rebooting, and reassembling the RAID (this time without having to recreate it, since the metadata was now correct on all the drives), I re-ran the check.  It completed about three and a half hours later.  Nothing was reported amiss, so finally it was time: I performed some final fsck's without the -n to ensure everything was ultra-clean, and started mounting file systems.

SUCCESS!!!

Since root and home both weigh in around 30-60G each, it was easy to believe they touched every device in the array.  If something else had been out of order, I should have seen it (let's hope I'm right!!).  Now, with the volumes unmounted, I am migrating all the data off the makeshift array...after all, it has a habit of not actually assembling on boot, probably because of the two USB devices.

It Lives Again.

No comments:

Post a Comment