BURNING MIDNIGHTm.at.work: Cluster Building

This is an incomplete article, and has been SUPERCEDED BY http://burning-midnight.blogspot.com/2012/07/cluster-building-ubuntu-1204-revised.html

Managing CMAN

Attempting to follow the instructions from both wiki.ubuntu.com and clusterlabs.org led to some minor frustration. That's past now, thanks to watching the /var/log/syslog scroll by while I started and stopped the cman service repeatedly. The problem was that the nodes were simply not talking to each other. All the configuration looked OK, but in fact there was a deeper problem. The Ubuntu setup typically puts an entry like this in the hosts file:

127.0.1.1 l6.sandbox l6

cman, trying to be all intelligent, was attempting to use the hostname reverse-resolution result to determine what adapter it should work with. It chose the loopback everytime. Even after adding entries specific to the machines, the loopback adapter was chosen first. I believe I didn't encounter this problem previously because I tend to use FQDNs when I specify hosts in files such as /etc/hostname. I don't know if that's best-practice or worst-practice, but at the moment it does what I need it to do. After updating l6 and l7 appropriately, and modifying /etc/cluster/cluster.conf to match the FQDNs, I was able to achieve full connectivity.

Reconstructing the Cluster

With the new 12.04 systems now talking and playing together, reconfiguration of resources has been the big task. So as to not forget this stuff in the future, here's a few laundry lists:

DRBD Backing Store Preparation

Create volumes for data
Create volumes for metadata, if using external metadata (see meta-disk and flexible-meta-disk)
Configure DRBD resource file(s)

Make sure the hostnames are correct - they MUST match `uname -n`, just like every other part of this cluster.
Make sure the target volumes are correctly specified
Double check the file!!!

Initialize the storage: drdbadm create-md
Force the initial sync: drbdadm -- -o primary
After syncing, shut down DRBD and disable it's startup links
Add appropriate DRBD management statements to the cluster configuration

Example:

primitive p_drbd_ds00 ocf:linbit:drbd \
params drbd_resource="ds00" \
operations $id="drbd-operations" \
op start interval="0" timeout="240" \
op stop interval="0" timeout="240"
ms ms_drbd_ds00 p_drbd_ds00 \
meta resource-stickiness="100" notify="true" master-max="2" interleave="true" target-role="Started"

OCFS2 Mount Preparation - Cluster-version

Create target mount-points (/srv/ for example)
Add appropriate DLM and O2CB statements to the cluster configuration
Start the resources - you should now have DRBD resources and the DLM stuff active in the cluster. If you don't, try rebooting all the nodes.
mkfs.ocfs2 for all resources that are to be utilized this way.

NOTE: The ocfs2_controld.cman seems to go a little nuts on first boot if it doesn't have a friend. In trying to format ds00 on l6, mkfs.ocfs2 didn't seem to be able to see the cluster until l7 came online. top also showed the ocfs2_controld.cman taking up to 50% CPU! That also went away after l7 popped onto the scene. When building a cluster, take it slow.

Example:

primitive p_dlm ocf:pacemaker:controld \
params daemon="dlm_controld" \
op monitor interval="120s" \
op start interval="0" timeout="90s" \
op stop interval="0" timeout="100s"
primitive p_o2cb ocf:pacemaker:o2cb \
params stack="cman" \
op monitor interval="120s" \
op start interval="0" timeout="90" \
op stop interval="0" timeout="100"
group g_dlm-o2cb p_dlm p_o2cb
clone cl_dlm-o2cb g_dlm-o2cb \
meta globally-unique="false" interleave="true" target-role="Started"

File System Mount

Add the appropriate statements to the cluster configuration.
Double-check all mount sources and targets!
Make sure that DRBD is actually running correctly, and that the resource is PRIMARY on the intended machine (or primary on both for dual-primaries).

Example:

primitive p_fs_ds00 ocf:heartbeat:Filesystem \
params device="/dev/drbd/by-res/ds00" directory="/opt/data/ds00" fstype="ocfs2" \
op monitor interval="120s" timeout="60s" \
op start interval="0" timeout="120s" \
op stop interval="0" timeout="120s"
clone cl_fs_ds00 p_fs_ds00 \
meta interleave="true" ordered="true" target-role="Started"
colocation colo_fs_ds00 inf: cl_fs_ds00 ms_drbd_ds00:Master cl_dlm-o2cb
order o_drbd-ds00 0: ms_drbd_ds00:promote cl_dlm-o2cb cl_fs_ds00

Notes:

The above order statement specifies a "should" ordering. So far I've observed, at least with Pacemaker 1.0, that "must" orderings (inf:) tend to cause total start failure. I haven't tried with 1.1 yet.
The colocation statement defines a "must" relationship.
Cloning the p_fs_ds00 resource is only necessary for dual-primaries. In fact, I think most of the clones are unnecessary if you're not going to use those resources on any other system in the cluster.

System Startup Scripts

It's getting hard to know what to start and what not to start automatically. I'll try to keep track of it here. Note that at the moment I have l6 not auto-starting cman or pacemaker, just because I want to be able to strangle the machine before it starts hosing the cluster after a reboot. But I would think the ideal thing would be to have it auto-start when things are nice again. You can always fence by disconnecting the NIC! (Although, NIC-fencing would make remote management of the node REALLY HARD...)

drbd: disabled
iscsitarget: ENABLED - make sure to flip the flag in /etc/default/iscsitarget to 'true'!
open-iscsi: ENABLEDa
corosync: disabled
cman: ENABLED
pacemaker: ENABLED
o2cb: disabled
ocfs2: enabled - not sure what this script does, or if its even useful...

Resource Access Notes

The iSCSILogicalUnit resource looks like it fails when trying to connect to a device that has already been mounted. I guess that's to say that it wants exclusive block-level access to the device. That being said, we'll probably need to follow Linbit's instructions for setting up a second iSCSI target if we really want Active/Active.
Local access to an iSCSI mount will probably need to take place using an iSCSI initiator. At least this abstracts the location of the target, and will most likely allow seamless migration of virtuals from one host to another (assuming that's really gonna work with OCFS2 - this remains to be configured and tested).

Having a successful connection with the iSCSI target now, I must resolve the issue of mounting the file system. Since it's OCFS2, and the cluster definition is managed by the cluster, a joining-node must therefore also be part of the cluster. OR, the resource must be managed by a secondary cluster; in this case, the storage cluster would be a cluster unto itself, and only manage the storage. DLM and all that would be unnecessary, only the iSCSI and DRBD stuff would be required. The secondary cluster would only focus on virtual device management and OCFS2 access. Should a node want to do both, either it has to be a member of both clusters (not likely, because I do not want to go there), or we must integrate both clusters into one unified cluster.

Taking the latter action, my guess is going to be that the clusters must run similar or the same cluster stacks. To avoid unnecessary suffering, let's go with that. All my hosts will need to migrate over to Ubuntu Server 12.04.

Let the fun begin!

BURNING MIDNIGHT
m.at.work

20120510

Cluster Building - 12.04 part 2

Managing CMAN

Reconstructing the Cluster

DRBD Backing Store Preparation

OCFS2 Mount Preparation - Cluster-version

File System Mount

System Startup Scripts

Resource Access Notes

No comments:

Post a Comment

About Me

Followers