OHEC:
OBSERVE linode seems unresponsive at times
HYPOTHESIZE my uml isn't getting CPU
EXPERIMENT I ran the following program on my linode under screen to prevent network blocking from being an issue:
Code:
#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
#include <unistd.h>
unsigned long
tv2ms(struct timeval tv)
{
return (tv.tv_sec * 1000) + (tv.tv_usec / 1000);
}
int
main(void)
{
unsigned long ms,old_ms;
struct timeval tv;
gettimeofday(&tv,NULL);
old_ms = tv2ms(tv);
while (1) {
gettimeofday(&tv,NULL);
ms = tv2ms(tv);
if (ms < old_ms) {
/* ignore truncation bug... */
printf("burp: %ld\n",old_ms-ms);
} else if (old_ms + 1000 < ms) {
printf("timeout: %ld\n",ms - old_ms);
system("date");
}
old_ms = ms;
usleep(500000);
}
return 0;
}
after a few hours it caught little blips where I'd lose CPU for just a couple seconds that occaisonally corresponded to load on our own server (apt-get for example)...no big deal. but then i caught a period of about 10 minutes where i was losing cpu and it peaked at almost 30 seconds:
timeout: 3590
Sun Dec 7 17:31:30 EST 2003
timeout: 3630
Sun Dec 7 17:32:01 EST 2003
timeout: 5780
Sun Dec 7 17:32:47 EST 2003
timeout: 6770
Sun Dec 7 17:34:30 EST 2003
timeout: 3390
Sun Dec 7 17:34:35 EST 2003
timeout: 27230
Sun Dec 7 17:35:02 EST 2003
timeout: 2600
Sun Dec 7 17:36:22 EST 2003
timeout: 1440
Sun Dec 7 17:38:25 EST 2003
So I'm pretty sure we're running into scheduling trouble...but i'm just posting this for discussion..
But I'm optimistic about this kernel upgrade. After the kernel upgrade I'll try this same test except with realtime scheduling inside of our uml and also integrate a ping of the 64.62.190.1 gateway to see if there are independent networking flakiness. Will update.
- Greg