20120802

VM Cluster...now with STONITH! (On a Budget)

For some reason my Ubuntu 11.10 VM host servers have been misbehaving.  When one of the nodes died, it caused the DLM to hang and my other nodes quickly perished afterwards.  I've blogged about this a time or two now.  STONITH is the only answer, but how do you do it without spending $450 on an ethernet-controlled PDU?  Well, I "have" $450, but I have yet to put the stupid purchase request through to management...what can I say?  I'm busy...and it grieves me greatly.

What follows is what I did to make super-cheap STONITH a reality.  It does require some hardware, but if you have a lot of servers, you probably have a lot of UPSs, and maybe you might even have some that are APCs and are recent and have the little USB connection for your system.

In my case, I had several APC BackUPS ES (500 and larger) units laying around, all with fresh new batteries.  Most of them had USB connectivity.  These UPS-capable devices formed my new, albeit temporary, fencing solution.

Configuring NUT

First, a NUT server is needed.  I chose a non-cluster system for this job, but every system in your cluster could be a NUT server and serve to shoot other nodes.  The chosen system in my configuration just collects statistics and monitors other servers, so it's actually a very nice server for the job of shooting nodes.  For our example, we will call the NUT server stonithserver.

root@stonithserver:~#  apt-get install nut-server

/etc/nut/ups.conf was configured for each APC device as follows:

[cn01]
  driver = usbhid-ups
  port = auto
  serial = "BB0........3"


[cn02]
  driver = usbhid-ups
  port = auto
  serial = "BB0........5"

[cn03]
  driver = usbhid-ups
  port = auto
  serial = "BB0........8"

(The serial numbers here are obfuscated for security reasons, as are the cluster node names.  Your devices' serial numbers should be all alpha-numeric characters.  Methods other than serial numbers can be used to distinguish between devices - consult the NUT documentation for more details.)

You can grab the serial numbers via "lsusb -v | less".  Once you get NUT configured for a new UPS (or all of them), use "upscmd" to test them, first to make sure you didn't screw something up, and second to make sure it's going to work correctly when it needs to work.

root@stonithserver:~# upscmd -l cn01
root@stonithserver:~# upscmd cn01 load.off

The first command should return a list of available commands for your UPS.  The second will, on my APC BackUPS ES units, cause the UPS to switch off for about 1 second.  Use the command appropriate for your unit.  My units switch back on automatically, perhaps because they're still being fed mains power.

It's probably important to secure your NUT server in /etc/nut/upsd.users, although I imagine packet sniffing would end that pretty quick:

[stonithuser]
  password = ThisIsNotThePasswordYouAreLookingFor
  instcmds = ALL

Note that the above configuration is a very quick and simple (and probably stupid) one.  Review the relevant documentation to make for a more secure configuration.

Make sure that /etc/nut/upsd.conf is configured to allow connections in:

LISTEN 0.0.0.0 3493


Now each node in the cluster needs the nut-client package installed, or else it won't be able to talk to any other NUT server:

root@cn01:~#  apt-get install nut-client


root@cn02:~#  apt-get install nut-client


...

Configuring STONITH on the Cluster

Finally, some cluster configuration.  On Ubuntu, the NUT binaries are not where they are on Redhat/CentOS.  Also, my UPSs don't understand the "reset", so I had to change the reset command to "load.off".  It's enough to nuke a running server, and perhaps the best part is that if the server auto-powers-on (BIOS option), you have yourself a handy way to remote-reboot any failed machine.  Add Wake-on-LAN, and it's like having IPMI power control...without the nice user interface.

For each cluster node, a STONITH primitive is needed:



primitive p_stonith_cn01 stonith:external/nut \
        params hostname="cn01" \
               ups="cn01@stonithserver:3493" \
               username="stonithuser" \
               password="ThisIsNotThePasswordYouAreLookingFor" \
               upscmd="/bin/upscmd" \
               upsc="/bin/upsc" \
               reset="load.off" \
        op start interval="0" timeout="15" \
        op stop interval="0" timeout="15" \
        op monitor start-delay="15" interval="15" timout="15" \
        meta target-role="Started"


A STONITH primitive is like any other primitive - it runs and can be started and stopped.  Therefore it needs a node to run on.  Restrict them so that they don't run on the machines that are supposed to be killed by them - that is, a downed node can't (or shouldn't be expected to) suicide itself:

location l_stonith_cn01 p_stonith_cn01 -inf: cn01

Re-enable STONITH in the cluster options, because, frankly, if you're reading this then you've probably had it disabled this whole time:


property $id="cib-bootstrap-options" \
        stonith-enabled="true" \
...

Test the cluster by faking downed nodes.  Do this one machine at a time, and recover your cluster before testing another machine!  If you have three nodes, nuke one, then bring it back to life and let the cluster become stable again, and then nuke the second one.  Repeat for the third one.  This can be easily done by pulling network cables and watching the machines reboot.  Every machine should get properly  nuked. 


NB:  BEFORE you enable STONITH in Pacemaker, make sure you have a clean CIB.  I had a few stale machines (ex-nodes) defined in my CIB.  Pacemaker thought they were unclean it tried to STONITH them.  But since they really didn't exist and also didn't have any STONITH primitives defined, it failed, and in doing so prevented pretty much all my resources from loading throughout the cluster.  (I would classify that as a feature, not a bug.)  Once the defunct node definitions were removed, everything came up beautifully.