Quote:
The load average displayed by uptime and top is for your node only, not the host. This means that if the average is high and your node is sluggish then *you* are the one killing performance.
That is not necessarily true, as can be evidenced on a physical machine. Let's say:
Your hard drive begins to fail
The kernel is repeatedly timing out IO requests and retrying them
A process (let's say ls) tries to read /home/you
It blocks until the kernel reads the block
Another process (now we pick vi) tries to /tmp/lolcats.txt
That blocks until the kernel finishes with ls and retries the write
If that lasts long enough your load is going to go up to 2. Now most people would agree that 2 is not optimal. I think most people would also say that running
ls and
vi at the same time is not *me* killing performance. Now add in all the normal IO that happens on a healthy system...
Linode staff was all over this and had an answer to us yesterday: someone on newark21 was thrashing/grinding their disk and it causes any IO access on other VMs to tank. So all of our loads went up.
For posterity, I'll also mention that they are monitoring this sort of thing now to prevent it from becoming a problem in the future. Gotta love Linode, these guys must never sleep!