20120621

The Good and Bad of OCFS2

It's my own fault, really, for not having yet purchased an ethernet-controlled PDU.  I've been busy and time slips by, and the longer things run without happenstance the easier it is to forget how fragile it all is.

Whatever is causing the hiccups, it's pretty nasty when it happens.  I now have three hosts in my VM cluster.  I still run my two storage nodes, as a separate cluster.  There are two shared-storage devices, accessed by each VM host node via iSCSI, meant to distribute the load between the two storage nodes.  OCFS2 is the shared-storage file system for this installation.

Long story short, when one node dies, they all die.  15 VMs die with them, all at once.  Again, STONITH would fix this issue.  But what worries me more is the frequency of oops'.  I really can't have my VM hosts going AWOL on me just because they're tired of running load averages into the 60s.  I am beginning to rethink my design.  Here I will discuss a few pros and cons to the two approaches under consideration.

OCFS2 - Pros

  • Easy to share data between systems, or have a unified store that all systems can see.
  • VM images are files in directories, all named appropriately for their target VMs - No confusion, very little chance of human error.
  • Storage node configuration is easy - Set up the store, initialize with OCFS2, and you're done!

OCFS2 - Cons

  • Fencing is so massively important that you might as well not even use clustering without it.  Right now the cluster itself is about as stable as my "big VM host" that has motherboard and/or memory issues and regularly locks up for no apparent reason.
  • You have to configure the kernel to reboot on a panic, and to panic on an oops, per the OCFS2 1.6 documentation.  I'm not really uncomfortable with that, but again the prevalence of these system failures is leaving me in wonder about the stability of everything.  I cannot necessarily pin it on OCFS2 without some better logging, or at least some hammering while watching the system monitor closely.
  • One of my systems refuses to reboot on a panic, even though it says it's going to.  Don't have any idea what that's about.
  • The DLM is not terrible, but sometimes I wonder how great it is in terms of performance.  I may be misusing OCFS2.  Of course, I have only one uplink per storage node to the lone gigabit switch in the setup, and the ethernet adapters are of the onboard variety.  Did I mention I need to purchase some badass PCI-e ethernet cards??
The alternative to OCFS2, when you want to talk about virtualization, is of course straight-up iSCSI.  Libvirt actually has support for this, though I'm not certain how well it works or how robust it is to failures.  However, from what I've read and seen, I'd be very willing to give it a shot.

LIBVIRT iSCSI Storage Pool - Pros

  • STONITH is "less necessary" (even though it is STILL necessary) for the nodes in question, because they no longer have to worry so much about corrupting entire file systems.  They would only be at risk for corrupting a limited number of virtual machines...although, given the right circumstances I bet we could corrupt them all.
  • Single node failures do not disrupt the DLM, because there is no DLM.
  • iSCSI connections are on a per-machine basis, though it would be interesting to see how well this scales out.
  • No shared-storage means that the storage nodes themselves can use more traditional or possibly more robust file systems, like ext4 or jfs.

LIBVIRT iSCSI Storage Pools - Cons

  • Storage configuration for new and existing virtuals will require an iSCSI LUN for each one.  To keep the segregated, we could also introduce an iSCSI Target for each one, but that would become a cluster-management nightmare on the storage nodes.  It's already bad enough to think about pumping out new LUNs for the damn things.
  • Since LUNs would be the thing to use, there is greater risk of human error when configuring a new virtual machine (think: Did I start the installer on the right LUN?  Hmmmm....)
  • Changing to this won't necessarily solve the problems with the ethernet bottleneck.  In fact, it could very well exacerbate them.
  • There is no longer a "shared storage" between machines.  No longer a place to store all data and easily migrate it from machine to machine.  At present I keep all VM configuration on the shared storage and update the hosts every so often.  This would become significantly less pleasant without shared storage.
It would probably be in my best interest to simply keep the current configuration until I can get my STONITH devices and really see how well the system stays online.  It would also behoove me to configure the VM cluster to also monitor and protect the virtuals themselves.  I tested this with one VM, but haven't done a lot to toy with all the features and functions.

So much to do, so little time.

No comments:

Post a Comment