Linode Forum
Linode Community Forums
 FAQFAQ    SearchSearch    MembersMembers      Register Register 
 LoginLogin [ Anonymous ] 
Post new topic  Reply to topic
Author Message
 Post subject:
PostPosted: Fri Aug 05, 2011 5:06 pm 
Offline
Senior Member

Joined: Sun Mar 07, 2010 7:47 pm
Posts: 1970
Website: http://www.rwky.net
Location: Earth
The kernel does manage swap so a change isn't surprising I wouldn't worry though, using swap can be good it puts processes that aren't used very often onto the disk so that the ram can be used for more useful things. If you start swapping in and out a lot that's when bad things happen, your munin graphs will show how much you swap in and out.

_________________
Paid support
How to ask for help
1. Give details of your problem
2. Post any errors
3. Post relevant logs.
4. Don't hide details i.e. your domain, it just makes things harder
5. Be polite or you'll be eaten by a grue


Top
   
 Post subject:
PostPosted: Fri Aug 05, 2011 6:19 pm 
Offline
Senior Member
User avatar

Joined: Sat Aug 30, 2008 1:55 pm
Posts: 1739
Location: Rochester, New York
With the nulls in the log file: Usually, this is because the logfile was being written to while the system crashed. Not too unusual.

With regards munin: It only snapshots every five minutes. If things go awry in under five minutes, munin will look completely normal.

On kernel output: The "logview" command in the lish shell will spit out the console output from the last run, and can be handy for troubleshooting kernel crashes and the like.

Worse comes to worse, sshing in and leaving 'htop' or 'top' running can give you a bit of a snapshot of the moment before it does keel over again!

_________________
Code:
/* TODO: need to add signature to posts */


Top
   
 Post subject:
PostPosted: Tue Aug 09, 2011 8:23 am 
Offline
Senior Newbie

Joined: Thu Aug 04, 2011 9:26 am
Posts: 15
It happened again. 4 days uptime and system crashed. In Lish i was able to view this:
Code:
[<c011f3bf>] ? do_page_fault+0x24f/0x3a0                                                           
 [<c0105c27>] ? xen_force_evtchn_callback+0x17/0x30                                                 
 [<c0106404>] ? check_events+0x8/0xc                                                               
 [<c01063fb>] ? xen_restore_fl_direct_reloc+0x4/0x4                                                 
 [<c011f170>] ? mm_fault_error+0x130/0x130                                                         
 [<c06bfc66>] ? error_code+0x5a/0x60                                                               
 [<c012007b>] ? try_preserve_large_page+0x7b/0x340                                                 
 [<c011f170>] ? mm_fault_error+0x130/0x130                                                         
 [<c01ab8a8>] ? swap_count_continued+0x158/0x180                                                   
 [<c01abe22>] ? __swap_duplicate+0xc2/0x160                                                         
 [<c01abb04>] ? add_swap_count_continuation+0x54/0x130                                             
 [<c01abee4>] ? swap_duplicate+0x14/0x40                                                           
 [<c01a068b>] ? copy_pte_range+0x45b/0x500                                                         
 [<c0106404>] ? check_events+0x8/0xc                                                               
 [<c01a08c5>] ? copy_page_range+0x195/0x200                                                         
 [<c0132756>] ? dup_mmap+0x1c6/0x2c0                                                               
 [<c0132b88>] ? dup_mm+0xa8/0x130                                                                   
 [<c01335fa>] ? copy_process+0x98a/0xb30                                                           
 [<c01337ef>] ? do_fork+0x4f/0x280                                                                 
 [<c010f780>] ? sys_clone+0x30/0x40                                                                 
 [<c06c000d>] ? ptregs_clone+0x15/0x48                                                             
 [<c06bf6f1>] ? syscall_call+0x7/0xb                                                               
 [<c06b0000>] ? sctp_backlog_rcv+0xf0/0x100                                                         
INFO: rcu_sched_state detected stall on CPU 2 (t=60000 jiffies)                                     
INFO: rcu_sched_state detected stall on CPU 1 (t=60000 jiffies)                                     
INFO: rcu_sched_state detected stall on CPU 3 (t=240030 jiffies)                                   
INFO: rcu_sched_state detected stall on CPU 2 (t=240031 jiffies)                                   
INFO: rcu_sched_state detected stall on CPU 1 (t=240031 jiffies)                                   
INFO: rcu_sched_state detected stall on CPU 1 (t=420061 jiffies)                                   
INFO: rcu_sched_state detected stall on CPU 2 (t=420061 jiffies)                                   
INFO: rcu_sched_state detected stall on CPU 1 (t=600091 jiffies)     

Lish was not responsive. Was not able to write anything there. And as usually - no SSH, no web, nothing.

Ideas?


Top
   
 Post subject:
PostPosted: Tue Aug 09, 2011 8:28 am 
Offline
Senior Newbie

Joined: Thu Aug 04, 2011 9:26 am
Posts: 15
Just rebooted and again:
Code:
[<c06bf28d>] ? rwsem_down_failed_common+0x9d/0x110                                                 
 [<c06bf353>] ? call_rwsem_down_read_failed+0x7/0xc                                                 
 [<c06bea6a>] ? down_read+0xa/0x10                                                                 
 [<c01683f5>] ? acct_collect+0x35/0x160                                                             
 [<c0137fbd>] ? do_exit+0x27d/0x350                                                                 
 [<c011f170>] ? mm_fault_error+0x130/0x130                                                         
 [<c010b7e1>] ? oops_end+0x71/0xa0                                                                 
 [<c011ef8f>] ? bad_area_nosemaphore+0xf/0x20                                                       
 [<c011f3bf>] ? do_page_fault+0x24f/0x3a0                                                           
 [<c0105c27>] ? xen_force_evtchn_callback+0x17/0x30                                                 
 [<c0106404>] ? check_events+0x8/0xc                                                               
 [<c01063fb>] ? xen_restore_fl_direct_reloc+0x4/0x4                                                 
 [<c011f170>] ? mm_fault_error+0x130/0x130                                                         
 [<c06bfc66>] ? error_code+0x5a/0x60                                                               
 [<c012007b>] ? try_preserve_large_page+0x7b/0x340                                                 
 [<c011f170>] ? mm_fault_error+0x130/0x130                                                         
 [<c01ab8a8>] ? swap_count_continued+0x158/0x180                                                   
 [<c01abe22>] ? __swap_duplicate+0xc2/0x160                                                         
 [<c01abee4>] ? swap_duplicate+0x14/0x40                                                           
 [<c01a068b>] ? copy_pte_range+0x45b/0x500                                                         
 [<c01a08c5>] ? copy_page_range+0x195/0x200                                                         
 [<c0132756>] ? dup_mmap+0x1c6/0x2c0                                                               
 [<c0132b88>] ? dup_mm+0xa8/0x130                                                                   
 [<c01335fa>] ? copy_process+0x98a/0xb30                                                           
 [<c01337ef>] ? do_fork+0x4f/0x280                                                                 
 [<c06bf395>] ? _raw_spin_lock+0x5/0x10                                                             
 [<c01c2cf0>] ? set_close_on_exec+0x40/0x60                                                         
 [<c01c3804>] ? do_fcntl+0x2c4/0x3b0                                                               
 [<c010f780>] ? sys_clone+0x30/0x40                                                                 
 [<c06c000d>] ? ptregs_clone+0x15/0x48                                                             
 [<c06bf6f1>] ? syscall_call+0x7/0xb               


Top
   
 Post subject:
PostPosted: Tue Aug 09, 2011 9:02 am 
Offline
Senior Newbie

Joined: Thu Aug 04, 2011 9:26 am
Posts: 15
I found out why server started to crash even after reboot - multiple tables in mysql crashed - and thus mysql started to use 400% of cpu, apache started to build up in line, etc. etc. And as result - all ram was used and swap. Tables repaired and now everything is smooth again.

Most likely tables crashed when server got frozen. However question remains - why it crashed.


Top
   
 Post subject:
PostPosted: Wed Aug 10, 2011 2:28 am 
Offline
Senior Newbie

Joined: Sun Feb 05, 2006 1:21 pm
Posts: 10
Location: United Kingdom
zumzum wrote:
I found out why server started to crash even after reboot - multiple tables in mysql crashed - and thus mysql started to use 400% of cpu, apache started to build up in line, etc. etc. And as result - all ram was used and swap. Tables repaired and now everything is smooth again.

Most likely tables crashed when server got frozen. However question remains - why it crashed.


This is similar, if not the same, as my issue (http://forum.linode.com/viewtopic.php?t=7538). I wonder?

_________________
-=L9NUX=-


Top
   
 Post subject:
PostPosted: Mon Aug 29, 2011 10:07 pm 
Offline

Joined: Mon Aug 29, 2011 10:04 pm
Posts: 1
Having the exact same issue here. Tried both 3.0.0 and 2.6.39 with no success. Getting these crashes something like once an hour (though not with any regularity or pattern). Did you ever find a solution?


Top
   
 Post subject:
PostPosted: Tue Aug 30, 2011 2:59 am 
Offline
Senior Newbie

Joined: Thu Aug 04, 2011 9:26 am
Posts: 15
What i have found in other forums is that kernel sometimes does such things when system is under heavy load. It should not be some CPU load, but for example, IO load. Few possible reason: OOM problem because of to many apache servers or some crashed MySQL tables.

What i did was - i found out that some of my tables grow insanely fast and then when reach ~3GB they start to crash and thus very fast server starts to swap and whole server crashes without any evidences of problem. I have set up crons to clean up tables regulary, dowgraded to older kernel and now 6 days without any problem. Dont know if that solved problem permanently or i'm just having good luck, but it is working for now.


Top
   
Display posts from previous:  Sort by  
Post new topic  Reply to topic


Who is online

Users browsing this forum: No registered users and 4 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  
cron
RSS

Powered by phpBB® Forum Software © phpBB Group