20120724

Reinventing the...cog?

WANTED: Error Detecting and Correcting Drive Surface

I am looking for a strong solution, and I fear I may have to implement it myself.  I want to boost the error-detection capability of typical hard drives.  Someone on a thread asked about creating a block device to do this for any given underlying device, like what LUKS does.  I want to at least detect, and _maybe_ (big maybe) correct, errors during reads from a drive or set of drives.  After having some drives corrupt a vast amount of data, I'm looking to subdue that threat for a very, very, very long time.

Error Detection

 

To avoid reinventing RAID - which, by the way, doesn't seem to care if the drive is spitting out bad data on reads, and from what I read that's evidently _correct RAID operation_ - I would propose writing a block driver that "resurfaces" the disk with advanced error detection/correction.  So, for detection, suppose that each 4k sector had an associated SHA512 hash we could verify.  We'd want to store these hashes either with the sector or away from it; in the latter case, consolidating hashes into dedicated hash-sectors might be handy.  The block driver would transparently verify sectors on reads and rewrite SHA hashes on writes - all for the cost of around 1.5625% of your disk.  Where this meets RAID would be that the driver would simply fail any sector that didn't match its hash, and RAID would be forced to reconstruct the sector from its hopefully-good stash of other disks...where redundant RAID is used.

Error Correction


Error correction using LDPC codes would be even more awesome, and I found some work from 2006 that is basically a general-use LDPC codec, LGPL'd.  Perhaps I'd need to write my own, to avoid any issues with eventual inclusion in the kernel.  The codes would be stored in a manner similar to the hashes, though probably conglomerated into chunks on the disk and stored with their own error-detection hash.

Questions, Questions, Questions


Anyway, lots of questions/thoughts arise from this concept:
  • How do we make this work with bootloaders?  Perhaps it would work regardless of the bootloader if only used on partitions.
  • It's an intermediary between the drive and the actual end-use by whatever system will use it, so it HAS to be there and working before an end-user gets hold of the media.
    • In other words, suppose we did use this between mdadm and the physical drive - how do we prevent mdadm from trying to assemble the raw drives versus the protected drives (via the intermediary block device)?  If it assembles the raw drives and does a sync or rebuild, it could wipe out the detection/correction codes, or (worse) obliterate valuable file system contents.
  • Where would be the best place for LDCP/hash codes?  Before a block of sectors, or following?
  • How sparsely should LDCP/hash codes be distributed across the disk surface?
  • Is it better to inline hash codes with sectors and cause two sector reads where one would ordinarily suffice, or push hash codes into their own sector and still do two sector reads, but a little more cleanly?
    • The difference being that in the former, a sector of data would most likely be split across two sectors - sounds rather ugly.
    • The latter case would keep sector data in sectors, possibly allowing the drive to operate a little more efficiently than in the former case.
  • How much space is required for LDCP to correct a sector out of a block of, say, 64 4k sectors?  How strong can we make this, and how quickly do we lose the benefits of storage space?
  • If a sector starts getting a history of being bad, do we relocate it, or let the system above us handle that?
  • How best do we "resurface" or format the underlying block device?  I would imagine dumping out code sectors that are either indicative of no writes having yet been done there, or code sectors generated from whatever content currently exists on disk.  A mode for just accepting what's there (--assume-clean, for example) should probably be available, too, but then do we seed the code sectors with magics and check to see that everything is how it SHOULD be before participating in system reads/writes? 
  • Do we write out the code sector before the real sector?  How do we detect faulty code sectors?  What happens if we're writing a code sector when the power dies?
I guess this really boils down to the following: RAID doesn't verify data integrity on read, and that is bugging me.  Knowing it would be a major performance hit to touch every drive in the array for every read, I can understand why it's that way.  If we could do the job underneath RAID, however, maybe it wouldn't be so bad?  Most file systems also don't seem to know/care when bad data shows up at the party, and trying to write one of those (like the guys working on btrfs) demonstrates it's no easy task.

And I guess the ultimate question is this: has anyone done this already?  Is it already in the kernel and I just haven't done the right Google search yet?

Please say yes.

No comments:

Post a Comment