OK, time to gracefully reboot the box again.
I've shed a few users off the machine, and increased the timeout for devices connecting to Xen's backend disk device driver. That should help eliminate the majority of the "failed to get domid" error messages. However:
http://lists.xensource.com/archives/htm ... 00170.html
Once the machine gets into this state, no new block devices can be attached. That's clearly (to me, anyway) a bug in Xen's backend block driver, but what I suspect is that because of the small timeout value (10 seconds), something goes awry to cause that bad state.
I'm going to shutdown the nodes and reboot the host in a few minutes.
-Chris