20130115

Hot Add/Remove Strangeness

I'm encountering a strange phenomenon.  I was busy playing with the ZFS add/remove/online/offline functions, to get a better feel for how it does its thing.  (To that end, it seems to me that ZFS has to really decide a device is actually BAD before it will initiate replacement with a hot-spare.  I can't find a way to force it, so maybe I don't really understand how ZFS views hot-spares.  Better to keep some spare devices on hand I guess.)

I did the following experiment:

  1. Offline a disk via zpool.
  2. Remove said disk by deleting it from the system.
  3. Pop the disk out of the array, then pop it back in so the controller will think it was replaced.
  4. Rescan the SCSI buses, do a udevadm trigger
  5. If the disk was found, bring it back into the zpool.
What I found interesting was that the device was not always, well, fully attached into the system.  Explicitly, when searching for the device directory under /sys (find /sys -iname "6:1:5:0" in this case), I would normally see three entries:
  • /sys/scsi_device/6:1:5:0
  • /sys/bsg/6:1:5:0
  • /sys/scsi_disk/6:1:5:0
Occasionally only the first two would appear.  The third missing, the device never appeared to the kernel other than a report in the log that the "scsi generic" was added.  There would be no drive letter assigned, no report on its write-caching, etc.  Feels like a race-condition.

In order for the device to appear, you can issue an "echo 1 > /sys/scsi_device/6\:1\:5\:0/device/delete" and then rescan the buses AGAIN.  It should find it.  Or not.  Race-condition...yes.... ;-)

I honestly don't know if this is a driver issue, a kernel issue, or a controller issue.  That the kernel SEES the device suggests the controller is not at fault.  What populates the scsi_disk portion of the sys tree?  That may be what is failing here.  I would have to dig deeper to know for certain, but am unsure where in the source to start...

For reference: this is on Ubuntu 12.04.1 LTS, currently running kernel 3.2.0-35-generic x86_64.

No comments:

Post a Comment