Hi all,
I'm currently bouncing between kernel versions because over the last few weeks, I haven't found one that is issue-free for me.
2.6.33-linode24: The best of the bunch, but twice now the time has "frozen", stopping all CPU timers, cronjobs, etc. This may be fixed now thanks to some advice from the Linode staff (relates to Xen and clocksources), but it's very hard to tell because it only happened somewhat randomly, and only after 10-15 days of uptime. Probably the most frustrating kind of bug -- intermittent, non-reproducible, and totally fatal! (Nevertheless, this is the kernel that I'm sticking with at the moment.)
2.6.32.12-linode25: Every few hours -- seemingly randomly and uncorrelated to cronjobs or external load -- we'd see a huge spike in load average up to 30-40 or so, just for a few seconds, but enough to set off our monitoring software and for the server to be barely responsive to other tasks for 30 to 60 seconds or so. No noticable spikes on any of the Linode graphs during these events.
2.6.35-rc3: My latest attempt was to compile a custom kernel, stock from kernel.org, using the .config file from
http://linode.com/src/2.6.32.12-linode25.tar.bz2 as my starting point. Thanks to the excellent article
"Running Custom Kernels with PV-GRUB", I had no problem compiling the kernel and getting it started.
Everything seems to run great on 2.6.35-rc3, including the web server (lighttpd), database (mysql), e-mail (qmail), asterisk, etc... with one major exception: my Python Django FastCGI processes will run, but will only seem to take one or two requests from lighttpd. After that, lighttpd continues to try to pass them requests (via tcp localhost:3303), but there's no answer. In the lighttpd logs, I get:
"establishing connection failed: Connection timed out socket: tcp:127.0.0.1:3303"
The python processes continue running and don't seem to use any CPU.
Since the userland was *exactly* the same, and only the kernel was changing, my only thought so far was that it might be firewall related (arno-iptables-firewall). However, I tried disabling the firewall entirely, but still had identical results!
Any ideas / clues? Hoping to make the custom kernel work. Why would this one piece out of everything be affected so dramatically by a new kernel? Thanks in advance!
Mike