The problem seems non-deterministic, and I haven't pinned down exactly where the failure occurs, but here's the landscape:
We have two database server nodes attached via iSCSI to our little home-made SAN. On each machine we have /etc/iscsi/nodes directories that are chock FULL of various targets. However they got there, they're there, and they're not going away on their own. Reboot as much as you like. Now, something happens... In my case, it was something in the combination of the SAN going completely down, one of the two attached nodes surviving the outage, and one requiring reboot (because its VM image was...well...also on the SAN).
I had learned from prior experience that when the nodes directory was loaded with junk, something fails when Pacemaker tried to reconnect the iscsi resource. Maybe it's the resource script. Maybe it's open-iscsi. Who knows! And better yet, it's not guaranteed to fail, although I noticed a lot of failures when I was testing the fencing of my nodes. Node would go down, node would come back up, iSCSI would NOT reconnect. Errors galore.
What I do know is that cleaning out the /etc/iscsi/nodes folder on boot tends to make this problem go away, 99.999% guaranteed.
On some clusters I have a shell script called from /etc/rc.local that kills off anything left lingering in the /etc/iscsi/{nodes,send_targets}/ folders. Here's another way - add the following to /etc/fstab:
none /etc/iscsi/nodes ramfs defaults 0 2
none /etc/iscsi/send_targets ramfs defaults 0 2
The contents of these two folders, which appear to be relatively inconsequential (if you're not using any automatic iSCSI targets) will go away on reboot. They don't take up much room anyway, so hopefully a ramdrive is within your budget.
Applies to Ubuntu 11.10 and 12.04.
Applies to Ubuntu 11.10 and 12.04.
No comments:
Post a Comment