20121124

Broken cman init script?!

Nothing is going right in Ubuntu 12.04 for cluster-aware file systems.

OCFS2 seems borked beyond belief.
(Update 2013-02-25: I believe I have made progress on the OCFS2 front:  http://burning-midnight.blogspot.com/2013/02/quick-notes-on-ocfs2-cman-pacemaker.html)

GFS2 has its own issues, some of which I will detail here.

You CAN get these two file systems running on 12.04.  Whether or not the cluster will remain stable when you have to put a node on standby is another question entirely, and a very good question.  It shouldn't even BE a question, but it is, and the answer is a resounding FUCKING NO!  Well, at least, as far as OCFS2 is concerned.  The problems there lie in who manages the ocfs2_controld daemon.  CMAN ought to do it, but CMAN doesn't want to.  Starting it in Pacemaker causes horrible heartburn when you put a node into standby, and things just all fall apart from there.

I decided to try out GFS2.  After installing all the necessary packages, and manually running bits here and there to see things work, I could not get Pacemaker to mount the GFS2 volume.  First problem was the CLVM: if you want to be able to shut down a node without shooting the fucker, you'll need to make sure the LVM system can deactivate volume groups.  The standard method of vgchange -an MyVG doesn't work for the cluster-aware LVM.  It complains loudly about "activation/monitoring=0" being an unacceptable condition for vgchange.  This detailed in this bug: https://bugs.launchpad.net/ubuntu/+source/lvm2/+bug/833368

The solution suggested there, at least where the OCF script is concerned, works: change the lines that use "vgchange" to include "--monitor y" on the command-line, and it will magically work again.

My cluster starts a DRBD resource, promotes it (dual-primary), then starts up clvmd (ocf:lvm2:clvmd), activates the appropriate LVM volumes (ocf:heartbeat:LVM), then mounts the GFS2 file system (ocf:heartbeat:FileSystem).  These are all cloned resources.
primitive p_clvmd ocf:lvm2:clvmd \
    op start interval="0" timeout="100" \
    op stop interval="0" timeout="100" \
    op monitor interval="60" timout="120"
primitive p_drbd_data ocf:linbit:drbd \
    params drbd_resource="data" \
    op start interval="0" timeout="240" \
    op promote interval="0" timeout="90" \
    op demote interval="0" timeout="90" \
    op notify interval="0" timeout="90" \
    op stop interval="0" timeout="100" \
    op monitor interval="15s" role="Master" timeout="20s" \
    op monitor interval="20s" role="Slave" timeout="20s"
primitive p_fs_vm ocf:heartbeat:Filesystem \
    params device="/dev/cdata/vm" directory="/opt/vm" fstype="gfs2"
primitive p_lvm_cdata ocf:heartbeat:LVM \
    params volgrpname="cdata"
ms ms_drbd_data p_drbd_data \
    meta master-max="2" clone-max="2" interleave="true" notify="true" clone cl_clvmd p_clvmd \
    meta clone-max="2" interleave="true" notify="true" globally-unique="false" target-role="Started"
clone cl_fs_vm p_fs_vm \
    meta clone-max="2" interleave="true" notify="false" globally-unique="false" target-role="Started"
clone cl_lvm_cdata p_lvm_cdata \
    meta clone-max="2" interleave="true" notify="true" globally-unique="false" target-role="Started"
colocation colo_lvm_clvm inf: cl_fs_vm cl_lvm_cdata cl_clvmd ms_drbd_data:Master
order o_lvm inf: ms_drbd_data:promote cl_clvmd:start cl_lvm_cdata:start cl_fs_vm:start

The LVM clone is necessary so that you can deactivate the VG before disconnecting DRBD during a standby.  Not achieving this will STONITH the node.  The "--monitor y" change is absolutely necessary, or you won't even bring the VG online.  Starting clvmd inside Pacemaker might not be a necessary thing, but in this instance it seems to work very well.  It's also important to note that most of the init.d scripts related to this conundrum have been disabled: clvmd, drbd, to name two.

The GFS2 file system will not mount without gfs_controld running.  gfs_controld won't start on a clean Ubuntu Server 12.04 system because it seems the cman init script is fucked up.  Can't understand it, but inside /etc/init.d/cman you'll find a line that reads:
gfs_controld_enabled && cd /etc/init.d && ./gfs2-cluster start
Comment out this line and add this below it:
if [[ gfs_controld_enabled ]]; then
      cd /etc/init.d &&  ./gfs2-cluster start
fi
This will make the cman script actually CALL the gfs2-cluster script and thus start the gfs_controld daemon.  Shutdown seems to work correctly with no additional modifications.  You will find that once all these pieces are in place, GFS2 is viable on Ubuntu 12.04 AND you can bring your cluster up and down without watching your nodes commit creative suicide.

I honestly don't know why this is the way it is.  I wouldn't know where to even assign blame.  In the Ubuntu Server 12.04 Cluster Guide (work-in-progress), they suggest this resource:
primitive resGFSD ocf:pacemaker:controld \
        params daemon="gfs_controld" args="" \
        op monitor interval="120s"
This seems rather like a bastardization of what this resource agent is really for, but perhaps it works for them.  However, I would highly suspect this might suffer from the same issues that I ran into with OCFS2: that if CMAN isn't running the controld, putting a node into standby will wreak havoc on the node and cluster.  With OCFS2, the issue was in the ocfs2_controld daemon, which CMAN was all too happy to try to bring offline but would NOT under any circumstances that I could find start it up.

Once started by Pacemaker you also cannot seem to take it down, meaning the resource fails to stop and becomes a disqualifying offense for the node.   This issue seems unrelated to a missing killproc command that is non-standard among distributions, because even when you fix/fake it, the thing does not seem to accomplish anything.  ocfs2_controld continues to run in the background, and cman will fail to shutdown correctly after you try bringing a node down gracefully.  No ideas yet on how to fix this, but I might try for it next.  I had detailed making a working Ubuntu 12.04 OCFS2 cluster in a previous post...I will be double-checking those steps...

No comments:

Post a Comment