Linode Forum
Linode Community Forums
 FAQFAQ    SearchSearch    MembersMembers      Register Register 
 LoginLogin [ Anonymous ] 
Post new topic  Reply to topic
Author Message
PostPosted: Thu Jun 17, 2010 1:53 pm 
Offline
Senior Newbie

Joined: Thu Jun 17, 2010 1:10 pm
Posts: 16
Website: http://www.nerdkits.com/
Hi all,

I'm currently bouncing between kernel versions because over the last few weeks, I haven't found one that is issue-free for me.

2.6.33-linode24: The best of the bunch, but twice now the time has "frozen", stopping all CPU timers, cronjobs, etc. This may be fixed now thanks to some advice from the Linode staff (relates to Xen and clocksources), but it's very hard to tell because it only happened somewhat randomly, and only after 10-15 days of uptime. Probably the most frustrating kind of bug -- intermittent, non-reproducible, and totally fatal! (Nevertheless, this is the kernel that I'm sticking with at the moment.)

2.6.32.12-linode25: Every few hours -- seemingly randomly and uncorrelated to cronjobs or external load -- we'd see a huge spike in load average up to 30-40 or so, just for a few seconds, but enough to set off our monitoring software and for the server to be barely responsive to other tasks for 30 to 60 seconds or so. No noticable spikes on any of the Linode graphs during these events.

2.6.35-rc3: My latest attempt was to compile a custom kernel, stock from kernel.org, using the .config file from http://linode.com/src/2.6.32.12-linode25.tar.bz2 as my starting point. Thanks to the excellent article "Running Custom Kernels with PV-GRUB", I had no problem compiling the kernel and getting it started.

Everything seems to run great on 2.6.35-rc3, including the web server (lighttpd), database (mysql), e-mail (qmail), asterisk, etc... with one major exception: my Python Django FastCGI processes will run, but will only seem to take one or two requests from lighttpd. After that, lighttpd continues to try to pass them requests (via tcp localhost:3303), but there's no answer. In the lighttpd logs, I get:

"establishing connection failed: Connection timed out socket: tcp:127.0.0.1:3303"

The python processes continue running and don't seem to use any CPU.

Since the userland was *exactly* the same, and only the kernel was changing, my only thought so far was that it might be firewall related (arno-iptables-firewall). However, I tried disabling the firewall entirely, but still had identical results!

Any ideas / clues? Hoping to make the custom kernel work. Why would this one piece out of everything be affected so dramatically by a new kernel? Thanks in advance!

Mike


Top
   
 Post subject:
PostPosted: Thu Jun 17, 2010 2:34 pm 
Offline
Senior Member

Joined: Sun Mar 07, 2010 7:47 pm
Posts: 1970
Website: http://www.rwky.net
Location: Earth
Re: 2.6.33-linode24 had the same issue

Re: 2.6.32.12-linode25 I had a different issue for me nginx under this kernel had page faults

So what I've done is gone the custom kernel route, I'm using ubuntu 9.10 with the 2.6.31-307-ec2 kernel and I have no problems.

What distro are you using?


Top
   
 Post subject:
PostPosted: Thu Jun 17, 2010 2:43 pm 
Offline
Senior Newbie

Joined: Thu Jun 17, 2010 1:10 pm
Posts: 16
Website: http://www.nerdkits.com/
Hi obs,

Thanks for the reply -- very interesting to hear that you've experienced similar issues with those two kernels (2.6.33-linode24 and 2.6.32.12-linode25). I thought I was going nuts!

Still surprised to see that there are such issues that are so dependent on the kernel version.

I'm using Debian stable (lenny).

I'll have to try a handful of different versions of custom kernels and see if it's specific to this 2.6.35-rc3.

Thanks again,

Mike


Top
   
 Post subject:
PostPosted: Thu Jun 17, 2010 2:56 pm 
Offline
Senior Member

Joined: Sun Mar 07, 2010 7:47 pm
Posts: 1970
Website: http://www.rwky.net
Location: Earth
Try using the default debian kernel linux-image-2.6.26-2-486 (that's assuming you're using 32 bit). It might not boot, the ubuntu default kernel doesn't but the ec2 one does.


Top
   
 Post subject:
PostPosted: Thu Jun 17, 2010 2:57 pm 
Offline
Linode Staff
User avatar

Joined: Tue Apr 15, 2003 6:24 pm
Posts: 3090
Website: http://www.linode.com/
Location: Galloway, NJ
Give me a few and I'll roll an "official" 2.6.34 to test.

-Chris


Top
   
 Post subject:
PostPosted: Thu Jun 17, 2010 5:20 pm 
Offline
Linode Staff
User avatar

Joined: Tue Apr 15, 2003 6:24 pm
Posts: 3090
Website: http://www.linode.com/
Location: Galloway, NJ
OK - 2.6.34-linode26 and 2.6.34-x86_64-linode13 are out there. Test away.

-Chris


Top
   
 Post subject:
PostPosted: Thu Jun 17, 2010 5:33 pm 
Offline
Senior Member

Joined: Sun Mar 07, 2010 7:47 pm
Posts: 1970
Website: http://www.rwky.net
Location: Earth
caker wrote:
OK - 2.6.34-linode26 and 2.6.34-x86_64-linode13 are out there. Test away.

-Chris


Sounds good, I'm going to be unavailable for a few days, compumike if you get a chance to test it I'd be interested in your results.


Top
   
 Post subject:
PostPosted: Thu Jun 17, 2010 5:55 pm 
Offline
Senior Newbie

Joined: Thu Jun 17, 2010 1:10 pm
Posts: 16
Website: http://www.nerdkits.com/
Hi Chris (& obs),

Thanks for your quick replies! Currently spending a few hundred bucks trying out reddit.com ads today, so this isn't a good time to experiment, but my plan is to give it a shot early tomorrow. Will write in and let you know how it goes.

Just curious -- when you built the kernel, did you do anything other than get the stock kernel from kernel.org, copy in a .config from one your recent linode kernels (like 2.6.32.12-linode25), run "make oldconfig" and answer with the defaults, and then build? Any patches to apply or special config options? Just trying to track down my issues with 2.6.35-rc3.

(Also noticed that http://www.linode.com/src/ hasn't been updated. If you get a chance, I'd really just like the .config file so I know we're working from the same point.)

(One more "quickie" -- http://www.linode.com/irc/logs/ permissions issue? A lot of my search results ended up pointing there, but it's 403'ed.)

Thanks again,

Mike


Top
   
 Post subject:
PostPosted: Thu Jun 17, 2010 6:24 pm 
Offline
Linode Staff
User avatar

Joined: Tue Apr 15, 2003 6:24 pm
Posts: 3090
Website: http://www.linode.com/
Location: Galloway, NJ
I've been doing Xen a long time and have learned a few things alone the way - it can be sensitive to config options and even certain toolchain versions. Fortunately, mainline support for pv_ops means no external patching or other shenanigans, so it's pretty much kernel.org it, copy my precious working-config from a version past, and then make oldconfig, etc.

I'll push up the tarballs once I know the kernels will stick around for more than a few days. For now, zcat /proc/config.gz :)

I fixed the /irc/ folder permissions on one of our loadbalancers. Thanks for the heads-up.

-Chris


Top
   
 Post subject:
PostPosted: Fri Jun 18, 2010 2:06 am 
Offline
Senior Newbie

Joined: Thu Jun 17, 2010 1:10 pm
Posts: 16
Website: http://www.nerdkits.com/
Hi Chris,

Just booted with the 2.6.34-linode26 kernel (32-bit) and everything seems to be fine! (All userspace services working properly. No issue with the lighttpd/django fastcgi intercommunication, even when stressed via "ab".)

It is yet to be seen whether the issues I experienced with 2.6.33-linode24 (with CPU timers stopping) and with 2.6.32.12-linode25 (with random load average spikes) will repeat themselves, as those seemed to happen over the course of weeks and hours respectively. But for at least 15 minutes of uptime, I can safely say that it's actually working, which was definitely not the case for my attempt to build 2.6.35-rc3.

Will watch it carefully this weekend and report back if there are any issues.

Thanks!

Mike


Top
   
 Post subject:
PostPosted: Fri Jun 18, 2010 1:12 pm 
Offline
Senior Newbie

Joined: Thu Jun 17, 2010 1:10 pm
Posts: 16
Website: http://www.nerdkits.com/
11+ hours of uptime and all is still running fine.

At this point I would have expected to experience the 2.6.32.12-linode25 load average spike issue, so it's very good news that I haven't.

Mike


Top
   
 Post subject:
PostPosted: Mon Jun 21, 2010 8:02 pm 
Offline
Senior Newbie

Joined: Thu Jun 17, 2010 1:10 pm
Posts: 16
Website: http://www.nerdkits.com/
Hi all,

Now at 3 days, 18 hours of uptime, and it's been the most rock-solid I've experienced. Finally had a nice quiet weekend without issues -- seriously, this has made a tremendous change.

Still will have to see if it has the same once-every-few-weeks "timer stopped" fatal issues as with 2.6.33-linode24, but I'm hoping not!

I recommend trying 2.6.34-linode24 if you are having issues with one of the other kernels similar to those I've described earlier in this thread.

Thanks!

Mike


Top
   
 Post subject:
PostPosted: Tue Jun 22, 2010 1:56 pm 
Offline
Senior Member

Joined: Sun Mar 07, 2010 7:47 pm
Posts: 1970
Website: http://www.rwky.net
Location: Earth
Ok I've cloned a linode and booted it up with the new kernel and asked my uses to go hammer it to see if it has issues, I'll leave it up for a few days and let you know what happens.


Top
   
 Post subject:
PostPosted: Thu Jun 24, 2010 1:48 pm 
Offline
Senior Member

Joined: Sun Mar 07, 2010 7:47 pm
Posts: 1970
Website: http://www.rwky.net
Location: Earth
It's been running for 2 days no issues! This kernel's a keeper!


Top
   
 Post subject:
PostPosted: Fri Jun 25, 2010 6:27 am 
Offline
Newbie

Joined: Fri Jun 25, 2010 6:08 am
Posts: 2
2.6.34-linode26 has frozen the clock for me on 2 different linodes.


Top
   
Display posts from previous:  Sort by  
Post new topic  Reply to topic


Who is online

Users browsing this forum: No registered users and 6 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  
RSS

Powered by phpBB® Forum Software © phpBB Group