Linode Forum
Linode Community Forums
 FAQFAQ    SearchSearch    MembersMembers      Register Register 
 LoginLogin [ Anonymous ] 
Post new topic  Reply to topic
Author Message
PostPosted: Tue Feb 24, 2004 3:27 pm 
Offline
Senior Member

Joined: Thu Aug 28, 2003 12:57 am
Posts: 273
Hi all. I started keeping track of the load on my Linode last week when the system was being unresponsive. I've accumulated a week's worth of load data and it's rather interesting.

The way that this works is, I run a shell script appends a line with the date and the current "uptime" value into a file, then goes to sleep for 15 seconds, then repeats. Thus I have accumulated load information for my Linode every 15 seconds for a week. I wrote a script which plots this data using gnuplot. Here is the result:

Image

My system is almost completely unloaded; on its own, I would never expect its load to go over 1, and certainly never 2. I would attribute all of the spikes to activity of other Linodes on the host system.

It's nice to see that the uptime graph accurately reflects the load that occurs on host5 every night at about 1:20 am (note the consistent spikes to load 3 or 4 at this time). What is also interesting is that last night's load was quite low - will we see an improvement in this hotspot in the future? Only time will tell ...

Also notice the spikes early last week. Holy crap, I've never seen a load of 18+ on a Linux box before! I started keeping this information around the time of that first load 4 because I was having some performance problems and these are readily reflected in the big spikes in the graph over the next day or so.

But it's really quieted down since then and is more like what I have traditionally found to be the performance on host5 - quite good 95% of the time, but with spikes late at night when everyone's "updatedb" cron jobs run ...

BTW, I know that there are more elegant ways to accumulate this data than my stupid little script (MRTG/rrdtools?), but this took me all of 5 minutes to hack together, and I haven't set aside a block of time yet to install/configure those other tools ... pointers would be most appreciated!


Top
   
 Post subject:
PostPosted: Tue Feb 24, 2004 3:38 pm 
Offline
Linode Staff
User avatar

Joined: Tue Apr 15, 2003 6:24 pm
Posts: 3090
Website: http://www.linode.com/
Location: Galloway, NJ
Interesting!

A little history on this topic:

All of the cron jobs (mostly just updatedb and makewhatis and other "not-really-worth-it" jobs) were left in their default times when I first created the disto templates (used by the distro wizard). The first few hosts that were deployed, and the customers who are on them, deployed their Linux installs with the cron jobs running at the same time.

After realizing this, I modified the majority of the template distros and moved the cron jobs to weekly. So now it just gets hammered on Sundays. The two biggest problem hosts are host3 (Linode 128) and host5 (Linode 64). The hosts that were added later don't seem to exhibit this problem at all.

The only reason why an "idle" Linode's loadavg goes up is because of processes blocked (waiting) for disk access. Each process waiting for the disk adds 1 to loadavg.

I don't really like messing with people's filesystems, but I've considered a script which edits the FS the next time the Linode is rebooted. Other options include sending an email to those on host3 and host5 with a few commands they can run to lighten the load.

The biggest reason why I'm pushing 2.6 on the hosts is because of a more fair I/O scheduler. Still, though, running updatedb, etc and sucking up disk bandwidth is wasteful.

I'm open to suggestions.

-Chris


Top
   
 Post subject:
PostPosted: Tue Feb 24, 2004 4:47 pm 
Offline
Senior Member

Joined: Thu Aug 28, 2003 12:57 am
Posts: 273
caker wrote:
Interesting!

A little history on this topic:

All of the cron jobs (mostly just updatedb and makewhatis and other "not-really-worth-it" jobs) were left in their default times when I first created the disto templates (used by the distro wizard). The first few hosts that were deployed, and the customers who are on them, deployed their Linux installs with the cron jobs running at the same time.

After realizing this, I modified the majority of the template distros and moved the cron jobs to weekly. So now it just gets hammered on Sundays. The two biggest problem hosts are host3 (Linode 128) and host5 (Linode 64). The hosts that were added later don't seem to exhibit this problem at all.

The only reason why an "idle" Linode's loadavg goes up is because of processes blocked (waiting) for disk access. Each process waiting for the disk adds 1 to loadavg.

I don't really like messing with people's filesystems, but I've considered a script which edits the FS the next time the Linode is rebooted. Other options include sending an email to those on host3 and host5 with a few commands they can run to lighten the load.

The biggest reason why I'm pushing 2.6 on the hosts is because of a more fair I/O scheduler. Still, though, running updatedb, etc and sucking up disk bandwidth is wasteful.

I'm open to suggestions.

-Chris


Chris,

Thank you for your reponse!

Three things:

1. I think you should send an email out to customers on host3 and host5, rather than modifying people's filesystems without their knowledge. I think that a round of emails just letting people know how they can benefit from changing their cron job times would be sufficient to solve most of the problem (after all, it's for their own good too - their own updatedb will run faster at a time when the Linode host is not loaded down).

2. What about "randomizing" the cron times on a disk image before deploying it for a particular Linode? I imagine that right now, when a user selects the deployment of a particular distribution, the host just copies the filesystem over into their UML partition "file" and then resizes the filesystem. What about adding a step where the filesystem is mounted and the cron times are "randomized" - you could just have a script that opens a filesystem, and writes a randomized /etc/crontab out into it. By "randomized" I mean that daily scripts are run at a random time between say 2 and 4 am EST, weeklies a random time on either Saturday or Sunday between 4 and 6 am, etc.

3. I think that if step 2 was done, then updatedb and other stuff which is normally done daily, should be moved back to daily.

I hope that my graph demonstrates that for 95% of the time, Linode performance is really awesome. It's just those predictable spikes that I'd like to see if we can do something about, and I appreciate your enthusiasm in this endeavor!

Best wishes,
Bryan


Top
   
PostPosted: Tue Mar 02, 2004 2:54 pm 
Offline
Senior Member

Joined: Thu Aug 28, 2003 12:57 am
Posts: 273
Here's this week's graph:

Image

What's very interesting is that the nightly spike increases in severity linearly up to a maximum on Feb. 27 (Friday), and then decreases linearly from there. Very strange.

What's the status on addressing this issue? Have emails been sent out to host5 Linode owners asking them to change their cron times?

(Edited to change graph to use the same scale as the previous graph; I'll use 0 - 20 as my load scale from now on so that all graphs can be easily compared)


Top
   
PostPosted: Tue Mar 02, 2004 5:41 pm 
Offline
Linode Staff
User avatar

Joined: Tue Apr 15, 2003 6:24 pm
Posts: 3090
Website: http://www.linode.com/
Location: Galloway, NJ
bji wrote:
What's the status on addressing this issue? Have emails been sent out to host5 Linode owners asking them to change their cron times?

Not yet. I need to do is go through each distro and figure out which files to move and to where. Once I have a set of instructions, I'll send out the emails.

-Chris


Top
   
 Post subject: This week's graph
PostPosted: Tue Mar 09, 2004 8:59 pm 
Offline
Senior Member

Joined: Thu Aug 28, 2003 12:57 am
Posts: 273
Still no improvement.

Image


Top
   
 Post subject: Re: This week's graph
PostPosted: Wed Mar 10, 2004 12:19 am 
Offline
Senior Member

Joined: Thu Aug 28, 2003 12:57 am
Posts: 273
bji wrote:
Still no improvement.

Image


Whoops, I didn't realize that I have to keep those images hosted on my server in order for them to show up in this forum. I lost the most recent graph because it wasn't backed up, but I restored the other graphs. I lost about half a day's worth of data at the end of the most recent graph too ... my backups were only up to last night ... sorry :(


Top
   
 Post subject:
PostPosted: Sat Mar 13, 2004 7:23 am 
Offline
Senior Newbie

Joined: Sat Mar 13, 2004 7:18 am
Posts: 8
Those results seem very weird. What could be causing the nightly load to be symmetrical like that?

-Mike


Top
   
 Post subject:
PostPosted: Sat Mar 13, 2004 12:46 pm 
Offline
Senior Member

Joined: Thu Aug 28, 2003 12:57 am
Posts: 273
myrealbox wrote:
Those results seem very weird. What could be causing the nightly load to be symmetrical like that?

-Mike


Didn't you read the posts above? It's caused by everyone's "updatedb" cron jobs running at the same time; this cron job puts a heavy disk burden on the Linode, and a bunch of them at once is really bad for the whole system.

The very best solution would be a kernel which somehow fairly allocated disk bandwidth, so that no one would ever "starve" for disk I/O like this.

A secondary solution would be to change the cron times so that they are staggered instead of everyone running them at the same time. I have changed my Linode's cron time, but the majority of people on host5 seem to be oblivious to this problem and have not done so.


Top
   
PostPosted: Sat Mar 13, 2004 3:57 pm 
Offline
Linode Staff
User avatar

Joined: Tue Apr 15, 2003 6:24 pm
Posts: 3090
Website: http://www.linode.com/
Location: Galloway, NJ
bji wrote:
What's the status on addressing this issue? Have emails been sent out to host5 Linode owners asking them to change their cron times?

Emails sent to host3 and host5 members. I've asked people to ack back when they make a change -- we'll see how much of a difference it makes. Of course, tonight/tomorrow morning is cron.weekly day, so might have to wait a few days to see.

Looking forward to your graphs after people make these changes...

Also, I'm working on the host-reboot-to-2.6 schedule. So look for that in the next week or two. 2.6 has been running great on host18 and host19. This is big! :)

-Chris


Top
   
 Post subject:
PostPosted: Sun Mar 14, 2004 12:37 am 
Offline
Senior Newbie

Joined: Sat Mar 13, 2004 7:18 am
Posts: 8
bji wrote:
Didn't you read the posts above? It's caused by everyone's "updatedb" cron jobs running at the same time; this cron job puts a heavy disk burden on the Linode, and a bunch of them at once is really bad for the whole system.


But as far as I can see, this does not explain why the load is cyclic and always symmetrical about a particular, but differing, day of the week.

-Mike


Top
   
 Post subject:
PostPosted: Sun Mar 14, 2004 2:18 am 
Offline
Senior Member

Joined: Thu Aug 28, 2003 12:57 am
Posts: 273
myrealbox wrote:
But as far as I can see, this does not explain why the load is cyclic and always symmetrical about a particular, but differing, day of the week.

-Mike


Ah. Yes, that is an interesting question. Sorry, I didn't understand what you had meant before. I am looking forward to seeing people change their cron times after Chris' email (I hope he asked people to randomize their cron minutes and possibly hours rather than just moving stuff to cron.weekly), I hope that we never have to figure out why the graphs look like that :) ...


Top
   
 Post subject: This week's graph
PostPosted: Tue Mar 16, 2004 11:54 pm 
Offline
Senior Member

Joined: Thu Aug 28, 2003 12:57 am
Posts: 273
Here it is:

Image

It's hard to tell if there has been any improvement since Caker's email went out. The spikes are small but there were periods of smaller spikes in previous graphs as well. I hope to see them getting even smaller next week :) ...


Top
   
 Post subject: Re: This week's graph
PostPosted: Wed Mar 17, 2004 1:07 am 
Offline
Linode Staff
User avatar

Joined: Tue Apr 15, 2003 6:24 pm
Posts: 3090
Website: http://www.linode.com/
Location: Galloway, NJ
bji wrote:
It's hard to tell if there has been any improvement since Caker's email went out. The spikes are small but there were periods of smaller spikes in previous graphs as well. I hope to see them getting even smaller next week :) ...

Good. We'll get a good sample over the next week or so before host5 is rebooted onto 2.6.

-Chris


Top
   
 Post subject: A definite improvement
PostPosted: Tue Mar 23, 2004 5:38 pm 
Offline
Senior Member

Joined: Thu Aug 28, 2003 12:57 am
Posts: 273
There has been a definite improvement; this week's peak spike is small. The spikes are still regular but they are definitely getting smaller overall. Will a 2.6 kernel for host5 help? Let's keep our fingers crossed ...

Image


Top
   
Display posts from previous:  Sort by  
Post new topic  Reply to topic


Who is online

Users browsing this forum: No registered users and 6 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  
RSS

Powered by phpBB® Forum Software © phpBB Group