Linode Forum
Linode Community Forums
 FAQFAQ    SearchSearch    MembersMembers      Register Register 
 LoginLogin [ Anonymous ] 
Post new topic  Reply to topic
Author Message
PostPosted: Sat Dec 05, 2009 12:19 am 
Offline
Junior Member

Joined: Mon Sep 22, 2008 12:28 am
Posts: 23
I realize that's kind of a pathetic question :-)

I have had a linode 360 running for about a week as a primary MX for various scattered computers. It is a pretty vanilla minimal gentoo installation, qmail being the main point of its existence. I use the qmail-spp plugin to implement a simple screen against bogus incoming email by verifying RCPT TO addresses.

By the logs in UTC time, Friday 04 Dec 2009, about 0420, CPU usage shot up to 107%, dropped to 100% about 0530, and I noticed it about 0745. Disk I/O and network traffic fell to zero for the entire period.

I tried ssh but could not get in and used the dash to reboot it. I should have tried LISH but I am new to this and didn't think of it.

My first thought was that my little C plugin had gone into an infinite loop, but I don't see how that could have blocked everything else, including all disk I/O and net traffic. Further incoming port 25 connections would have started a new qmail-smtpd session. Besides, the plugin had been running for at least several hours with no problems.

Does anyone have any ideas on what could make a Linode virtual server go haywire like that, 100% (actually 105.73%!) CPU and zero disk/net?


Top
   
 Post subject:
PostPosted: Sat Dec 05, 2009 11:33 am 
Offline
Senior Member
User avatar

Joined: Sat Aug 30, 2008 1:55 pm
Posts: 1739
Location: Rochester, New York
That sounds like a kernel panic.

From the lish shell (e.g. ssh to linodexxxxx@citynameyyy.linode.com, then detach with ^A-d), run the "logview" command... this will show you the last ~250 lines from the previous boot, along with the last ~100 lines from the current boot. A kernel panic will be obvious if it's there.


Top
   
 Post subject:
PostPosted: Sat Dec 05, 2009 11:54 am 
Offline
Junior Member

Joined: Mon Sep 22, 2008 12:28 am
Posts: 23
I should have mentioned that I checked /var/log/messages and saw ... nothing. The last log entry before hanging is from an iptables rule, the first one after reboot is syslog-ng startup.


Top
   
 Post subject:
PostPosted: Sat Dec 05, 2009 12:29 pm 
Offline
Senior Member

Joined: Fri Dec 07, 2007 1:37 am
Posts: 385
Location: NC, USA
hoopycat wrote:
From the lish shell (e.g. ssh to linodexxxxx@citynameyyy.linode.com, then detach with ^A-d), run the "logview" command...

Scarecrow wrote:
I should have mentioned that I checked /var/log/messages

These are not the same thing - if your linode was OOMing or panicked, then it would not be able to write to your log files, but it may very well have been able to write an error to the console which you could see with the lish logview.


Top
   
 Post subject:
PostPosted: Sat Dec 05, 2009 2:15 pm 
Offline
Junior Member

Joined: Mon Sep 22, 2008 12:28 am
Posts: 23
Didn't realize that, but I'll try to remember if it happens again.

Any ideas about what might cause the problem? I don't see how my program could have caused all three symptoms at once.


Top
   
 Post subject:
PostPosted: Sat Dec 05, 2009 2:59 pm 
Offline
Senior Member
User avatar

Joined: Sat Aug 30, 2008 1:55 pm
Posts: 1739
Location: Rochester, New York
You can still gather the "logview" data from lish now, as long as you've rebooted exactly once since the problem occurred. That will be the quickest way to figure out exactly what happened.

A halted kernel (e.g. one that has panicked but reboot_on_panic is unset) will exhibit all of those symptoms... the question is, what halted the kernel? :-)


Top
   
PostPosted: Sat Dec 05, 2009 5:14 pm 
Offline
Junior Member

Joined: Mon Sep 22, 2008 12:28 am
Posts: 23
I had two bogus entries in inittab, typoes, and it couldn't respawn them fast enough.

Altho I don't understand why that would lock up the CPU since it tries a few times then says it i waiting 5 minutes. I would think that 5 minutes would have been plenty of time to launch new smtp connections. Even if tcprules had died and itself could not handle incoming connections, why would sshd not have taken clients? Why would the CPU peg solid rather than for a split second then off for 5 minutes?

And why was it trying to re-init anyway after running for several days since the previous reboot?

I don't think I have actually found the problem, but I did learn something.

The AJAX console also showed the problem, altho lish was easier to use.

Thanks. I like the tools, but I think I will have to wait for it to bork again.


Top
   
Display posts from previous:  Sort by  
Post new topic  Reply to topic


Who is online

Users browsing this forum: No registered users and 6 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  
RSS

Powered by phpBB® Forum Software © phpBB Group