Linode Forum
Linode Community Forums
 FAQFAQ    SearchSearch    MembersMembers      Register Register 
 LoginLogin [ Anonymous ] 
Post new topic  Reply to topic
Author Message
PostPosted: Thu Apr 02, 2009 3:45 pm 
Offline
Linode Staff
User avatar

Joined: Tue Apr 15, 2003 6:24 pm
Posts: 3090
Website: http://www.linode.com/
Location: Galloway, NJ
Kernels 2.6.29-linode18 and 2.6.29-x86_64-linode6 have been pushed out. I'd appreciate another round of testing. I'm also curious if time drifts on these kernels and if I should ping the devs about it.

Thanks!
-Chris


Top
   
 Post subject:
PostPosted: Thu Apr 02, 2009 7:12 pm 
Offline
Senior Member

Joined: Fri Dec 07, 2007 1:37 am
Posts: 385
Location: NC, USA
I can confirm that it boots :)

Time drift looks to be still present with 2.6.29-x86_64-linode6. On boot time was dead-on, but I'm seeing about 130ms/hr linear drift relative to NIST.

Code:
$ grep ^server /etc/ntp.conf
server time-b.nist.gov minpoll 8 noselect
$ uptime
 19:10:50 up  3:09,  1 user,  load average: 0.00, 0.00, 0.00
$ ntpq -p
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 time-b.nist.gov .ACTS.           1 u   51  256  273   19.989  525.194  39.836
$ cat /var/lib/ntp/ntp.drift
0.000


Top
   
 Post subject:
PostPosted: Fri Apr 03, 2009 1:57 pm 
Offline
Senior Member

Joined: Fri Dec 07, 2007 1:37 am
Posts: 385
Location: NC, USA
Got the following repeated five times in my logs. This occurred within the first few seconds of a gentoo emerge --update. The emerge seems to have finished successfully, so it apparently wasn't a terminal error.
Code:
Apr  3 12:47:33 flup swapper: page allocation failure. order:0, mode:0x20
Apr  3 12:47:33 flup Pid: 0, comm: swapper Not tainted 2.6.29-x86_64-linode6 #1
Apr  3 12:47:33 flup Call Trace:
Apr  3 12:47:33 flup <IRQ>  [<ffffffff802815c9>] __alloc_pages_internal+0x389/0x480
Apr  3 12:47:33 flup [<ffffffff802436e7>] local_bh_enable_ip+0x77/0xb0
Apr  3 12:47:33 flup [<ffffffff802a3449>] cache_alloc_refill+0x2f9/0x5a0
Apr  3 12:47:33 flup [<ffffffff802a37f7>] __kmalloc+0x107/0x110
Apr  3 12:47:33 flup [<ffffffff805d24c8>] __alloc_skb+0x68/0x140
Apr  3 12:47:33 flup [<ffffffff80642677>] tcp_send_ack+0x27/0xe0
Apr  3 12:47:33 flup [<ffffffff80640fb2>] tcp_rcv_established+0x162/0x600
Apr  3 12:47:33 flup [<ffffffff80646e8c>] tcp_v4_do_rcv+0xdc/0x230
Apr  3 12:47:33 flup [<ffffffff80244067>] local_bh_enable+0x77/0xb0
Apr  3 12:47:33 flup [<ffffffff80648ead>] tcp_v4_rcv+0x71d/0x810
Apr  3 12:47:33 flup [<ffffffff8062a984>] ip_local_deliver_finish+0xc4/0x240
Apr  3 12:47:33 flup [<ffffffff8062a669>] ip_rcv_finish+0x129/0x380
Apr  3 12:47:33 flup [<ffffffff8062ae39>] ip_rcv+0x219/0x300
Apr  3 12:47:33 flup [<ffffffff80589a41>] xennet_poll+0xb01/0xcb0
Apr  3 12:47:33 flup [<ffffffff80717ca6>] error_exit+0x36/0x70
Apr  3 12:47:33 flup [<ffffffff807176ee>] _spin_lock_irqsave+0x2e/0x40
Apr  3 12:47:33 flup [<ffffffff805dba88>] net_rx_action+0x78/0x140
Apr  3 12:47:33 flup [<ffffffff80243ae2>] __do_softirq+0x92/0x160
Apr  3 12:47:33 flup [<ffffffff8021279c>] call_softirq+0x1c/0x30
Apr  3 12:47:33 flup [<ffffffff80214239>] do_softirq+0x49/0x90
Apr  3 12:47:33 flup [<ffffffff80243a45>] irq_exit+0x85/0x90
Apr  3 12:47:33 flup [<ffffffff80214303>] do_IRQ+0x83/0x110
Apr  3 12:47:33 flup [<ffffffff80278fec>] __rcu_process_callbacks+0x15c/0x270
Apr  3 12:47:33 flup [<ffffffff805520e5>] xen_evtchn_do_upcall+0xf5/0x140
Apr  3 12:47:33 flup [<ffffffff802127ee>] xen_do_hypervisor_callback+0x1e/0x30
Apr  3 12:47:33 flup <EOI>  [<ffffffff802093aa>] _stext+0x3aa/0x1000
Apr  3 12:47:33 flup [<ffffffff802093aa>] _stext+0x3aa/0x1000
Apr  3 12:47:33 flup [<ffffffff8020e400>] xen_vcpuop_set_next_event+0x0/0x70
Apr  3 12:47:33 flup [<ffffffff8020ddcc>] xen_safe_halt+0xc/0x20
Apr  3 12:47:33 flup [<ffffffff8020c1a6>] xen_idle+0x26/0x50
Apr  3 12:47:33 flup [<ffffffff8020ff49>] cpu_idle+0x49/0x90
Apr  3 12:47:33 flup Mem-Info:
Apr  3 12:47:33 flup DMA per-cpu:
Apr  3 12:47:33 flup CPU    0: hi:    0, btch:   1 usd:   0
Apr  3 12:47:33 flup CPU    1: hi:    0, btch:   1 usd:   0
Apr  3 12:47:33 flup CPU    2: hi:    0, btch:   1 usd:   0
Apr  3 12:47:33 flup CPU    3: hi:    0, btch:   1 usd:   0
Apr  3 12:47:33 flup DMA32 per-cpu:
Apr  3 12:47:33 flup CPU    0: hi:   90, btch:  15 usd:  76
Apr  3 12:47:33 flup CPU    1: hi:   90, btch:  15 usd:  35
Apr  3 12:47:33 flup CPU    2: hi:   90, btch:  15 usd:  85
Apr  3 12:47:33 flup CPU    3: hi:   90, btch:  15 usd:  13
Apr  3 12:47:33 flup Active_anon:8639 active_file:20588 inactive_anon:12730
Apr  3 12:47:33 flup inactive_file:19196 unevictable:0 dirty:3183 writeback:0 unstable:0
Apr  3 12:47:33 flup free:554 slab:23674 mapped:2095 pagetables:541 bounce:0
Apr  3 12:47:33 flup DMA free:1360kB min:36kB low:44kB high:52kB active_anon:0kB inactive_anon:28kB active_file:2352kB inactive_file:8kB unevictable:0kB present:5788kB pages_scanned:0 all_unreclaimable? no
Apr  3 12:47:33 flup lowmem_reserve[]: 0 339 339 339
Apr  3 12:47:33 flup DMA32 free:856kB min:2336kB low:2920kB high:3504kB active_anon:34556kB inactive_anon:50892kB active_file:80000kB inactive_file:76776kB unevictable:0kB present:347440kB pages_scanned:32 all_unreclaimable? no
Apr  3 12:47:33 flup lowmem_reserve[]: 0 0 0 0
Apr  3 12:47:33 flup DMA: 2*4kB 0*8kB 0*16kB 1*32kB 1*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1384kB
Apr  3 12:47:33 flup DMA32: 1*4kB 1*8kB 0*16kB 0*32kB 1*64kB 0*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 844kB
Apr  3 12:47:33 flup 39937 total pagecache pages
Apr  3 12:47:33 flup 139 pages in swap cache
Apr  3 12:47:33 flup Swap cache stats: add 251, delete 112, find 5043/5057
Apr  3 12:47:33 flup Free swap  = 736616kB
Apr  3 12:47:33 flup Total swap = 737272kB
Apr  3 12:47:33 flup 92160 pages RAM
Apr  3 12:47:33 flup 4086 pages reserved
Apr  3 12:47:33 flup 38289 pages shared
Apr  3 12:47:33 flup 54125 pages non-shared


Top
   
 Post subject:
PostPosted: Fri Apr 03, 2009 3:37 pm 
Offline
Senior Member

Joined: Sat Mar 28, 2009 4:23 pm
Posts: 415
Website: http://jedsmith.org/
Location: Out of his depth and job-hopping without a clue about network security fundamentals
Stever wrote:
Apr 3 12:47:33 flup <IRQ> [<ffffffff802815c9>] __alloc_pages_internal+0x389/0x480

Originated here, in the source. The stack says that Xen handled an IRQ (networking), the IP stack got it, began handling TCP, TCP intended to reply, attempted to allocate some memory to do so, and somehow you ended up at a label in the Linux source code that says "nopage". Reading the source code (I'm hazy on Linux) leads me to believe that this may be due to an OOM situation, which is odd because you have a large amount of swap available. Also in that function:
Quote:
/* The OOM killer will not help higher order allocs so fail */

It appears to circumvent the OOM killer, so the kernel may not have reported to you that you were OOM other than these log entries.
Stever wrote:
Apr 3 12:47:33 flup Free swap = 736616kB
Apr 3 12:47:33 flup Total swap = 737272kB
Apr 3 12:47:33 flup 92160 pages RAM
Apr 3 12:47:33 flup 4086 pages reserved
Apr 3 12:47:33 flup 38289 pages shared
Apr 3 12:47:33 flup 54125 pages non-shared

That's more pages allocated than are in RAM, so swap was just starting to get tickled (that is reflected there). I/O ops, maybe? Do earlier messages of this nature have a higher number for free swap? All of these are educated guesses, just throwing 'em out there and seeing what sticks.


Top
   
 Post subject:
PostPosted: Fri Apr 03, 2009 4:36 pm 
Offline
Senior Member

Joined: Fri Dec 07, 2007 1:37 am
Posts: 385
Location: NC, USA
The message I posted is repeated five times, exactly the same each time. However, it looks like the log only caught the tail end of these since there is a partial message at the beginning of the event.

No other signs of OOM - this is just my testing linode with only sshd and fail2ban running, and as far as I can tell everything kept running. The emerge --update command that triggered these log entires completed without errors, and I have been compiling on the node for the last couple hours without any recurrence.


Top
   
 Post subject:
PostPosted: Tue Apr 28, 2009 1:50 am 
Offline
Senior Member

Joined: Sat May 03, 2008 4:01 pm
Posts: 569
Website: http://www.mattnordhoff.com/
FWIW:

I just migrated to Xen on dallas153 a few days ago. I've been using 2.6.29-linode18 for 27.5 hours, and my clock drift is 30 PPM, and dropping. That's not so bad, is it? My home computer is a lot worse.


Top
   
 Post subject:
PostPosted: Tue Apr 28, 2009 11:54 am 
Offline
Senior Member

Joined: Fri Dec 07, 2007 1:37 am
Posts: 385
Location: NC, USA
30ppm isn't bad (about 2.5s/day), but the old kernels had NO drift because they stayed in sync with the host which was presumably running ntp.

So to keep good time on the newer kernels, you need to run your own ntpd.

FWIW, since the original issue I reported I've had this kernel running on my test machine for 22 days with no problems.


Top
   
 Post subject:
PostPosted: Wed Apr 29, 2009 4:33 pm 
Offline
Senior Member

Joined: Sat Mar 28, 2009 4:23 pm
Posts: 415
Website: http://jedsmith.org/
Location: Out of his depth and job-hopping without a clue about network security fundamentals
I've mentioned this in IRC, but I haven't experienced any time drift in my running kernel. Yesterday I set up time.jedsmith.org in the NTP pool, and it's a stratum 2 with minimal offset:

Image

Note that from the first moment I joined the pool and ran ntpd (I started them both at the same time), I was only about 6ms away from "the real time". I'm not sure what I'm doing differently, but my kernel isn't drifting at all.
Code:
jed@frodo:~$ uname -a
Linux frodo 2.6.29-linode18 #1 SMP Wed Apr 1 20:22:21 EDT 2009 i686 GNU/Linux
jed@frodo:~$ uptime
 20:34:51 up 13 days, 12:49,  2 users,  load average: 0.00, 0.00, 0.00
jed@frodo:~$ ntpq -p
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
+clock.fmt.he.ne .PPS.            1 u   51   64  377    1.443    0.452   0.290
+clock.sjc.he.ne .CDMA.           1 u   18   64  377    1.838    0.219   0.033
xclock.isc.org   .GPS.            1 u   45   64  377    2.093   -6.272   0.086
*clock.via.net   .GPS.            1 u    5   64  377    0.976    0.188   0.026
-nist1-sj.witime .ACTS.           1 u    2   64  377    2.125    2.334   0.070


Top
   
 Post subject:
PostPosted: Wed Apr 29, 2009 6:16 pm 
Offline
Senior Member

Joined: Fri Dec 07, 2007 1:37 am
Posts: 385
Location: NC, USA
jed wrote:
I'm not sure what I'm doing differently, but my kernel isn't drifting at all.

What do you have in your ntp.drift file? That should show the drift in ppm that ntpd is correcting for you. For example, on my linore running 2.6.29-x86_64-linode6 I get:
Code:
 $ cat /var/lib/ntp/ntp.drift
44.004


Unless yours is zero, then your kernel is drifting and ntpd is compensating.

The first thing ntpd will do if you are more than about 120ms off at startup is to perform a step correction. I don't think it will serve time to others until it considers itself locked, so an external monitoring system would probably not see the initial offset unless it was very small.


Top
   
 Post subject:
PostPosted: Wed Apr 29, 2009 7:11 pm 
Offline
Senior Member

Joined: Sat Mar 28, 2009 4:23 pm
Posts: 415
Website: http://jedsmith.org/
Location: Out of his depth and job-hopping without a clue about network security fundamentals
Stever wrote:
What do you have in your ntp.drift file?

18.429. Not terrible for a virtualized kernel, but I see your point.

I was merely noticing that other users had drifts on the order of several minutes and mine was not showing symptoms of that with this kernel.


Top
   
 Post subject:
PostPosted: Wed Apr 29, 2009 10:31 pm 
Offline
Senior Member

Joined: Fri Dec 07, 2007 1:37 am
Posts: 385
Location: NC, USA
jed wrote:
I was merely noticing that other users had drifts on the order of several minutes and mine was not showing symptoms of that with this kernel.

Yeah - I've never seen more than a few seonds per day of drift, but without ntpd running it always seems to be cumulative. After a few weeks of uptime you start to notice the time being off by minutes.

I don't think these drifts are any worse than I get on real hardware systems - it is just that before pv_ops we didn't have to worry about it at all. I suppose if I think of it that way, this is making a linode behave more like real hardware ;)


Top
   
 Post subject:
PostPosted: Tue May 12, 2009 4:33 pm 
Offline

Joined: Tue May 12, 2009 4:30 pm
Posts: 1
I have a linode 360 with jaunty 9.04, but when I change the kernel in the configuration and reboot the linode doesn't come up. De lish bootlog is empty. I've tried this one and 2.6.28.


Top
   
 Post subject:
PostPosted: Wed May 13, 2009 3:45 pm 
Offline
Senior Newbie

Joined: Mon May 11, 2009 4:44 pm
Posts: 11
Website: http://www.spottedairplanes.com
Location: New York
jzuijlek wrote:
I have a linode 360 with jaunty 9.04, but when I change the kernel in the configuration and reboot the linode doesn't come up. De lish bootlog is empty. I've tried this one and 2.6.28.


Me too, except I'm running Gentoo.


Top
   
 Post subject:
PostPosted: Wed May 13, 2009 6:47 pm 
Offline
Senior Member

Joined: Fri Dec 07, 2007 1:37 am
Posts: 385
Location: NC, USA
I'm having no problems with mine running gentoo on 2.6.29-x86_64-linode6, and I just rebooted into the same kernel with no problem.


Top
   
 Post subject:
PostPosted: Sat May 16, 2009 10:04 am 
Offline
Senior Member
User avatar

Joined: Tue Apr 13, 2004 6:54 pm
Posts: 833
FWIW, 2.6.29-linode18 froze hard on me and required a hard reboot :-(

_________________
Rgds
Stephen
(Linux user since kernel version 0.11)


Top
   
Display posts from previous:  Sort by  
Post new topic  Reply to topic


Who is online

Users browsing this forum: No registered users and 4 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  
RSS

Powered by phpBB® Forum Software © phpBB Group