That which wasn't virus-laden needed to be captured. I must admit I really enjoy how seamlessly iSCSI can make drive space available. As I snooped around my office for a spare drive and a USB adapter, the thought of copying gigs of critical data off a user's machine and onto an aging piece of storage medium just didn't seem appealing. So, I logged into my HA iSCSI cluster, created a new resource, and mounted it from the user's machine (via a secure, CD-boot operating system....read: KNOPPIX). Moments later...well, about an hour later...the data was secure and the machine ready to nuke.
But there were a few takeaways from this. First, it was a royal pain in the balls to get my new resource up. It's not that it was mysterious, or difficult, but that there were in fact six or seven different components that needed to be instantiated and configured (copy-paste-change) just to get the target online. I would have very dearly loved a "create a target for me with this much space, on this IP, named this," and all the appropriate cluster configuration would be done. This isn't so much a complaint as it is a wish. I have no issues with cluster configuration now, and see it as powerful and flexible. Automation must show a positive ROI. To that end, manufacturing HA virtualization clusters would certainly tend in the direction of good ROI for the right automation. It is, in fact, one of the things preventing me from just spawning independent iSCSI targets for each of my VMs.
Maybe that's A Good Thing (tm).
On another front, failing hard drives (at least one confirmed) and transparent, unnoticed corruption of file systems has me again thumbing through the ZFS manual, and trying to reconcile its use with the system I currently have built. I have some requirements: DRBD replication and heavy-duty data encryption. I get these with RAID+LVM. But RAID, it seems, does not really care about ensuring data integrity, except during scans or rebuilds. As a matter of fact, whatever corruption took place, I'm not even sure it was the hard drives that were at fault - except for the one that is spitting out the SMART message: "FAILURE IMMINENT!". It could have also been the RAID adapter (that is only doing JBOD at the moment), or a file system fluke...or perhaps the resync over DRBD went horribly wrong in just the wrong place.
We may never know.
I can happily say that it appears a good majority of my static data was intact. I'm thankful, because that static data was data I really only had one highly-redundant copy of. Thus the case for tape-drives, I guess. I'm considering trying something along the following: RAID+LVM+ZFS+DRBD+crypto+FS. Does that seem a bit asinine? Perhaps:
- RAID+LVM+DRBD+CRYPTO+ZFS
- RAID+LVM+DRBD+ZFS+CRYPTO+FS
- ZFS+DRBD+CRYPTO+FS
The problem I am running into is keeping DRBD in the mix. So, the convoluted hierarchy may be the only one that makes sense. Of course, if ZFS handles replication to secondary servers that are WAN-connected, maybe going purely ZFS would be better. That then begs the question: stick with Linux+ZFS-FUSE, or go with OpenSolaris?
I'm not sure I'm ready for OpenSolaris.
No comments:
Post a Comment