For starters, I dropped back to Ubuntu 11.10. It's a nice intermediate step between 11.04 (with its broken iscsitarget-dkms build) and 12.04 (with its endless amounts of CMAN frustration). Basically, on 12.04, with CMAN running, putting a node in standby was a sentence of death for that particular node. It seems to be related to the DLM, but I haven't done much testing beyond that to verify it's that alone. I couldn't find anything in the forums to help, and don't feel like registering for accounts just to report what may possibly be my own stupidity, so the fallback position was a good compromise.
Why not just go without CMAN in 12.04? I couldn't find the dlm-pmck package! It's gone...possibly integrated into something else, but for lack of time and/or patience, I did not find it. It might be there, well hidden.
I watched a great video today from the three guys behind the majority of this tech: High Availability Sprint: from the brink of disaster to the Zen of Pacemaker - YouTube Really cool stuff, watch a cluster get built before your eyes!
After further trial and error, today I finally managed build and mount a HA iSCSI file store! What's better? On my two-node cluster, I successfully tested transparent fail-over during catastrophic node failure, while writing to the store. Using wget, I pulled down an Ubuntu ISO (I know, I know...but they're easy to find) and then hammered the cluster a bit. Now eventually things got kinda hairy and funky - maybe some 11.10 goodness to be fixed in 12.04? But for the most part, things ran great. And I was pretty brutal with the ups-and-downs of the resources and nodes. Chances are, Corosync just had a rough time catching up.
I did notice something strange: Pacemaker seemed to think nodes were back online even though Corosync was the only thing running on the recovered node.
A few words of caution:
- if your resource isn't starting, and you have constraints (like colocations, orders, etc), try lowering their scores or removing them entirely.
- remember that you have to enable resource explicitly on an asymmetric cluster (symmetric-cluster="false" in the cluster options)
- groups are handy ways to lump things together for location statements (where applicable)
- Use the ( ) syntax in ordering to make semi-explicit order events
- When using iscsitarget stuff, pick an implementation: iscsitarget or tgt - do NOT install both!
No comments:
Post a Comment