Linode Forum
Linode Community Forums
 FAQFAQ    SearchSearch    MembersMembers      Register Register 
 LoginLogin [ Anonymous ] 
Post new topic  Reply to topic
Author Message
 Post subject: More CPU / IO spikes..
PostPosted: Wed May 11, 2011 7:34 am 
Offline
Junior Member

Joined: Wed May 11, 2011 7:13 am
Posts: 32
I've seen a lot of people have the same problem on this forum. My server has been running perfect since January, then suddenly it started to crash last month. So far however, I haven't read about a fix, if there is any.

The CPU suddenly spikes to 400% and IO goes through the roof. It happens about daily now. No response to SSH or Lish..

I've tried to change everything that I thought might cause this spike, but nothing works.

It's a Linode 1024.

Mysql settings changed to the ones over here:

http://library.linode.com/troubleshooti ... networking

In apache2.conf I have these settings for mpm_prefork_module (if that's the right one):

StartServers 5
MinSpareServers 5
MaxSpareServers 10
MaxClients 50
MaxRequestsPerChild 500

I'm running a Drupal 7 site, with Piwik and normally the server hardly ever goes over 20% CPU usage in the Linode graph. Pages load fast, as far as I can tell. And it doesn't seem to have a problem loading a lot of pages at once.

The easiest fix would be to have it reboot after spiking (if I'm sleeping), but I'm not sure if that's possible. The best fix would be to not have the spikes at all, obviously.. :) Do I need a bigger server?


Top
   
 Post subject:
PostPosted: Wed May 11, 2011 7:49 am 
Offline
Senior Member

Joined: Sun Mar 07, 2010 7:47 pm
Posts: 1970
Website: http://www.rwky.net
Location: Earth
MaxClients 50 wayyyyy too high if you could allocate all 1024mb to apache that would be 20mb per apache process which drupal can easily chew up. Set it to 10, restart apache, see if it starts working, if it does you can gradually increase it.

_________________
Paid support
How to ask for help
1. Give details of your problem
2. Post any errors
3. Post relevant logs.
4. Don't hide details i.e. your domain, it just makes things harder
5. Be polite or you'll be eaten by a grue


Top
   
 Post subject:
PostPosted: Wed May 11, 2011 8:28 am 
Offline
Junior Member

Joined: Wed May 11, 2011 7:13 am
Posts: 32
Ok, thanks obs! I'll try and report back..


Top
   
 Post subject:
PostPosted: Wed May 11, 2011 2:01 pm 
Offline
Junior Member

Joined: Wed May 11, 2011 7:13 am
Posts: 32
Lassie rebooted about 8 times in the last 7 hours since I changed this. Can it be related?


Top
   
 Post subject:
PostPosted: Wed May 11, 2011 2:18 pm 
Offline
Senior Member

Joined: Sun Mar 07, 2010 7:47 pm
Posts: 1970
Website: http://www.rwky.net
Location: Earth
Unlikely, check the contents of /var/log/messages see if it gives a reason for your linode shutting down.

_________________
Paid support
How to ask for help
1. Give details of your problem
2. Post any errors
3. Post relevant logs.
4. Don't hide details i.e. your domain, it just makes things harder
5. Be polite or you'll be eaten by a grue


Top
   
 Post subject:
PostPosted: Wed May 11, 2011 2:47 pm 
Offline
Junior Member

Joined: Wed May 11, 2011 7:13 am
Posts: 32
Seems like it's just the boot sequence over and over in /var/log/messages, nothing in between..


Top
   
 Post subject:
PostPosted: Wed May 11, 2011 4:14 pm 
Offline
Junior Member

Joined: Wed May 11, 2011 7:13 am
Posts: 32
Anything else I can check? Lassie keeps rebooting it now. As if since I changed the apache config file, it can't handle some requests anymore. Like when I hit the Piwik dashboard, it crashes the server sometimes..


Top
   
 Post subject:
PostPosted: Wed May 11, 2011 4:23 pm 
Offline
Senior Member

Joined: Wed May 13, 2009 1:18 am
Posts: 681
Cheek wrote:
Anything else I can check? Lassie keeps rebooting it now. As if since I changed the apache config file, it can't handle some requests anymore. Like when I hit the Piwik dashboard, it crashes the server sometimes..

Check your console via LISH to see what's been logged at the point of crash. OOM errors can turn into kernel panics if memory can't be freed fast enough, in which case continuing to tune your configuration will also resolve the crashes.

Or, probably less likely, but there have also been one or two threads recently about some crashing issues with a recent 2.6 kernel (can't remember the specifics) that either switching kernel or in some cases boosting a kernel memory parameter higher have helped with (so it's also related to available memory issues). Found one of the threads, so viewtopic.php?t=6952 for example, if your console crash information matches that.

-- David


Top
   
 Post subject:
PostPosted: Wed May 11, 2011 4:37 pm 
Offline
Junior Member

Joined: Wed May 11, 2011 7:13 am
Posts: 32
I've got this from logview:

Code:
Kernel panic - not syncing: Out of memory: system-wide panic_on_oom i                                   s enabled

Pid: 2451, comm: apache2 Not tainted 2.6.38.3-linode32 #1
Call Trace:
 [<c063cfbf>] ? panic+0x57/0x13e
 [<c0181414>] ? out_of_memory+0x2c4/0x2f0
 [<c018488c>] ? __alloc_pages_nodemask+0x65c/0x670
 [<c01862bd>] ? __do_page_cache_readahead+0xed/0x230
 [<c018641e>] ? ra_submit+0x1e/0x30
 [<c017eeae>] ? filemap_fault+0x34e/0x420
 [<c01941c6>] ? __do_fault+0x56/0x520
 [<c0184325>] ? __alloc_pages_nodemask+0xf5/0x670
 [<c019529b>] ? handle_pte_fault+0x9b/0xac0
 [<c0196f61>] ? handle_mm_fault+0x101/0x1a0
 [<c011e81b>] ? do_page_fault+0xfb/0x3e0
 [<c016f218>] ? compat_irq_eoi+0x8/0x10
 [<c016fa7c>] ? handle_fasteoi_irq+0x8c/0xe0
 [<c0105b37>] ? xen_force_evtchn_callback+0x17/0x30
 [<c0138070>] ? __do_softirq+0x0/0x130
 [<c0106314>] ? check_events+0x8/0xc
 [<c010630b>] ? xen_restore_fl_direct_end+0x0/0x1
 [<c010af92>] ? do_softirq+0x42/0xb0
 [<c011e720>] ? do_page_fault+0x0/0x3e0
 [<c063fea6>] ? error_code+0x5a/0x60
 [<c011e720>] ? do_page_fault+0x0/0x3e0


So it's not the kernel?


Top
   
 Post subject:
PostPosted: Wed May 11, 2011 5:03 pm 
Offline
Senior Member

Joined: Wed May 13, 2009 1:18 am
Posts: 681
Cheek wrote:
So it's not the kernel?

Don't think so - more likely an OOM condition that got so bad the kernel gave up.

You'll need to turn down your Apache configuration (or other processes on your node) so that you can manage peak load (you can stress test with something like ab against a URL that involves your full application stack) without exceeding your available memory. I realize that may slow down your possible request rate, but a higher rate won't help if the entire node crashes...

Once you have your system operating within the available resources, then you can focus on improving performance (which may have to include a larger Linode, but I wouldn't jump to that point initially).

As your first post notes, there are a number of tuning threads on the forum that you may wish to review. (Unlike your first comment, I think most do in fact cover a "fix" - which is tuning your configuration to fit available resources).

-- David


Top
   
 Post subject:
PostPosted: Wed May 11, 2011 5:51 pm 
Offline
Junior Member

Joined: Wed May 11, 2011 7:13 am
Posts: 32
Thanks for all the advice David.

The weird thing is, it was working fine for months without crashing. It just crashed again with CPU and I/O spiking through the roof.

I'd rather have Lassie rebooting my node, than the CPU spiking. Because that could crash my server for hours when I'm not around.

But anyway, there's not much else running besides apache and mysql. And as I said, I've set the mysql to the values as in the library and MaxClients to just 10, which seems pretty low.

Is there anything else that could cause this? I was really happy with my linode, but I'm just one guy building a website and can't spent most of the day fixing the server.

I've got 3 options: find a fix, double the Linode or go with something like a Mediatemple DV server. It'll more than double the costs but maybe it's the best option for me?


Top
   
 Post subject:
PostPosted: Wed May 11, 2011 6:15 pm 
Offline
Senior Member

Joined: Wed May 13, 2009 1:18 am
Posts: 681
Cheek wrote:
The weird thing is, it was working fine for months without crashing. It just crashed again with CPU and I/O spiking through the roof.

That's not that unusual. Traffic load changes, performance of databases as they grow change, etc.. You may have been very close to exceeding your resource for a while now and just not known it. Or something may have bumped the request load to your node up significantly (a link from some site?) without your knowing it.

Quote:
I'd rather have Lassie rebooting my node, than the CPU spiking. Because that could crash my server for hours when I'm not around.

You should at least get an eventual email about the CPU usage exceeding the notification limits (I do). The problem is that depending on kernel configuration, a panic doesn't actually halt the box (the kernel is still running, just in a tight loop, which leads to the CPU usage) so Lassie doesn't consider it down.

There's a kernel parameter (kernel.panic) that you can set to the number of seconds after which the box will reboot itself after a panic. As a general matter there could be some risk of always restarting depending on the cause of the panic, but in a scenario like this it's probably preferable rather than staying in the panic'd state. You can save adjustments to that value in /etc/sysctl.conf.

Quote:
But anyway, there's not much else running besides apache and mysql. And as I said, I've set the mysql to the values as in the library and MaxClients to just 10, which seems pretty low.

Really the only way to know is to test. You may have a stack that is using even more memory than you think, so even MaxClients of 10 may be too much. You need to actually monitor your resource usage under load to identify what you actually use. The other threads cover ways of doing that in far greater detail, but basically you want to observe how much actual memory each Apache process is using when handling requests.

Quote:
Is there anything else that could cause this? I was really happy with my linode, but I'm just one guy building a website and can't spent most of the day fixing the server.

Going into an OOM condition? Nope - pretty much means you're using too much memory.

Assuming it's the Apache configuration is an educated guess, but as it's probably the leading contender of this scenario in just about all cases brought to the forum, it's a good first bet.

Quote:
I've got 3 options: find a fix, double the Linode or go with something like a Mediatemple DV server. It'll more than double the costs but maybe it's the best option for me?

Only you can answer that for yourself. Certainly just throwing more resource at the problem (the bigger Linode - I know nothing about Mediatemple) is "simpler", but I can't say it's guaranteed to solve the problem without your first identifying the root cause. Certainly might push off your having to deal with it until later though.

For example, let's say that you were still at MaxClients of 50, and your request load used them all, but each Apache process was using 100MB (all extreme values). Just bumping your Linode to a 2048 wouldn't solve the problem, just let you get a few more simultaneous requests before keeling over the same way.

-- David


Top
   
 Post subject:
PostPosted: Wed May 11, 2011 6:23 pm 
Offline
Junior Member

Joined: Wed May 11, 2011 7:13 am
Posts: 32
I guess you are right. I'll check the other threads for tuning methods I haven't tried.

How about nginx?


Top
   
 Post subject:
PostPosted: Wed May 11, 2011 6:28 pm 
Offline
Senior Member

Joined: Wed May 13, 2009 1:18 am
Posts: 681
Cheek wrote:
I've got this from logview:

Code:
Kernel panic - not syncing: Out of memory: system-wide panic_on_oom i                                   s enabled

Pid: 2451, comm: apache2 Not tainted 2.6.38.3-linode32 #1
Call Trace:

BTW, I just re-read the above from your earlier post while writing my previous response. I hadn't noticed earlier but this looks like you've set the vm.panic_on_oom parameter on your box. This forces an immediate panic on any OOM condition, rather than the regular OOM processing to kill off processes to free memory. I don't believe this is the default configuration, and while killing random processes - the default behavior - isn't necessarily conducive to normal operations, an automatic panic definitely isn't :-)

If you're going to do this, then you do likely want to combine it with a non-zero kernel.panic parameter (as per my other note) which essentially means you want to reboot on OOM. See also http://www.linode.com/wiki/index.php/Rebooting_on_OOM

-- David


Top
   
 Post subject:
PostPosted: Wed May 11, 2011 6:38 pm 
Offline
Junior Member

Joined: Wed May 11, 2011 7:13 am
Posts: 32
db3l wrote:
Cheek wrote:
I've got this from logview:

Code:
Kernel panic - not syncing: Out of memory: system-wide panic_on_oom i                                   s enabled

Pid: 2451, comm: apache2 Not tainted 2.6.38.3-linode32 #1
Call Trace:

BTW, I just re-read the above from your earlier post while writing my previous response. I hadn't noticed earlier but this looks like you've set the vm.panic_on_oom parameter on your box. This forces an immediate panic on any OOM condition, rather than the regular OOM processing to kill off processes to free memory. I don't believe this is the default configuration, and while killing random processes - the default behavior - isn't necessarily conducive to normal operations, an automatic panic definitely isn't :-)

If you're going to do this, then you do likely want to combine it with a non-zero kernel.panic parameter (as per my other note) which essentially means you want to reboot on OOM. See also http://www.linode.com/wiki/index.php/Rebooting_on_OOM

-- David
I actually set those parameters a couple of weeks ago, when my server started to crash while I was asleep. So just like the two lines in the note. Should I unset it?


Top
   
Display posts from previous:  Sort by  
Post new topic  Reply to topic


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  
RSS

Powered by phpBB® Forum Software © phpBB Group