Guspaz wrote:
I've not had that problem, but I've had something of a different one. My graphs are all scaled to 400%, but idle has always been nan. Not that it matters.
The graphs scale to 400% because a Linode "sees" 4 processors, so maximum CPU figures (busy, idle) will be up to 400% given how Linux measures such things. But on the paravirt kernels, without fixing the idle time, it'll measure up to the full 800% (8 processor host) so extend beyond the rest of the processor metrics. Since the graph limits are set based on the visible CPUs, you pretty much end up not being able to see the top of the idle bar varying since it's above the graph.
With that said, I've definitely had other problems with the idle metric in general with munin, and in fact one of my paravirt kernels after having just fine stats for a few months, suddenly switched idle to nan. More commonly though, and with any kernel, I hit cases where the idle metric shows some enormous (like 80 digit) value. I suspect, but have yet to track down, that it's due to some calculation being performed along the way, but once present it sort of infects all the graphs since you've got that bad (measured or calculated) data point for all time. It's possible the nan reading is a similar calculation overflow.
So far my only recourse has to been reset data collection and the rrd files. I've had no luck searching for help on this behavior, which given how frequently it happens to me I'd have thought I could find more references.
Now this is with Ubuntu 8.04 LTS, where the repository munin (and probably rrdtools) isn't the latest and greatest, so I figured I'd try a locally build version at some point. Other metrics all seem to be fine, so for now I just get a bit annoyed when viewing the CPU graphs on some nodes...
-- David