Seriously, a few bizarre things happened today - all at the same time! What gives?!
We had a run-away Linode process on host4 that slowed it to a crawl. It just so happened that we started a handful of migrations off host4 to host8 around the same time. After a while and a lot of convincing that I should restart the Linode in question, I did. That fixed host4, and the Linodes on host4 looked a lot happier.
By that time the migrations (using scp, which was a bad idea in the first place because of performance) were hosed, only one migrations completing. I un-migrate those back where they were. "Something" also caused host8 to either drop off the network, or freeze completely. I wasn't able to get console access so I had to reboot host8
In the process of getting host8 rebooted, "someone" (*whistle*, looks around innocently) managed to power cycle host6. Do'h!
I'm disappointed because I thought the issue with host4 was hardware related. I'm starting to believe now that its a software/kernel condition, since both host4 and host8 have had similar freezes. I need to capture an oops. Why haven't I already, you ask?
The one remote-console unit I have at ThePlanet is quirky. It only stays alive for a few minutes at a time and then drops off the network. Baytech wants me to send the bad module in for repair work, but I just bought a bunch of new console units for a new rack I'm building. I'm awaiting shipment of the new console units, and I'm sending two of the units up to ThePlanet as a replacement.
The host4 issue also caused a few shutdowns to not complete, so I had to follow up on those, as well.
In regards to shutdown's hanging, there seems to be a bug that's been hit about 7 or 8 times over the last few days. The job gets stuck, because a call to uml_mconsole (a management utility for UMLs) is hanging.
I'm also questioning the stability of the UML kernels I released. I now know (after the fact) that there were bugs in 2.4.22-1um and 2um. Hopefully 2.4.22-3um (linode9) is working fine. Just so you guys don't think I'm a total flake, I do perform a kernel compile inside the new UML kernels as a test. Lesson learned: If it ain't broke, don't fix it. I'll keep making newer kernels available, but I won't be pointing "Latest 2.4 Kernel" to anything that you guys don't approved of first.
Our 2.4.21 kernel has uptimes of months (and counting).
I won't touch anything else today, I promise.
-Chris