20120504

DRBD + Pacemaker

I followed this instruction fairly closely to get the basic DRBD + Pacemaker configuration working.  A few notes from the effort:

  • clone-max must be 2, or else the other node won't come up.
  • Having the wrong file system on the target drive really hoses things up.
During my escapades, I had forgotten that the DRBD device I wanted to use for this example had been formatted as OCFS2.  Since that file system requires some special massaging to mount, I specified ext4 in the DRBD resource configuration.  Unfortunately, when Pacemaker failed to mount the file system, it got stuck in a state I didn't know how to get out of.  The easy solution was to restart Corosync on each node.  The first node that restarted came up instantly with the mount (once it was properly configured).

I now feel I have a basic working knowledge of crm's command syntax, and the kinds of resources I am able to configure.  I still lack in-depth knowledge about items like ms (master-slave) and its meta fields, and other finer details.  I believe they're in the parts of the Pacemaker documentation I have not yet come to, though I have diligently read through a good portion of it before starting this adventure.

I've now reconfigured the resource to work in dual-primary mode.  That was easy - just change the master-max to 2, leave clone-max at 2, and remove the other options (don't know if I needed them or not - will find out later).  Next, OCFS2 Pacemaker support.  First order of business was examining the ocf:pacemaker:controld info.  I noticed this line:
It assumes that dlm_controld is in your default PATH.
Habawha?!  OK.  I go to the prompt and type dlm_controld and find nothing.  But Ubuntu is nice enough to point out I should install the cman  package if I want this command.  So I do so, and allow apt-get to install all the extra packages it believes it needs.

Following the DRBD OCFS2 guide, I notice one change I need - ocf:ocfs2:o2cb is actually ocf:pacemaker:o2cb.  I took a gamble and configured the o2cb resource with the parameter stack="cman".  Sadly, I endured mucho failure when I committed my changes.  None of the new OCFS2 resources seemed to start, and complained loudly about something being "not installed."  To this I answered with installing the dlm-pcmk package on both servers, and four of the errors went away (two per machine).  I am now left with two monitor errors that still complain that something is "not installed."

Of course, it would have been AWESOME if I had just read further on the Ubuntu wiki page to see the full apt-get line for supporting OCFS2 - one or two more packages later, and that fixed the problem.  Still, it was valuable to learn about CMAN, and I may migrate the cluster in that direction since it may help protect against internal split-brain.

I will now reformat the shared data store as OCFS2, modify the Filesystem resource, and prepare for cluster goodness.  Tonight or tomorrow I might try to get the iSCSI target working under Pacemaker.


No comments:

Post a Comment