Linode Forum
Linode Community Forums
 FAQFAQ    SearchSearch    MembersMembers      Register Register 
 LoginLogin [ Anonymous ] 
Post new topic  Reply to topic
Author Message
PostPosted: Sun May 04, 2014 6:48 pm 
Offline
Senior Newbie
User avatar

Joined: Sat Oct 30, 2010 12:53 pm
Posts: 19
Website: http://www.whyaskwhy.org/
Location: USA
Hi all,

Thanks for taking the time to read this post, and thanks in advance for any help you can offer. I opened a ticket with Linode Support, but they "politely" told me to come here for support. Without dwelling too much on it, I'll just say I was insulted and disappointed that they wouldn't take the time to assist a paying customer. That was not what I've come to expect from Linode.

So I'm hoping you guys can help pick up their slack. Again, any help you can provide is appreciated.

------

I don't recall exactly when this started, but if I had to guess I'd say sometime in the last month or so.

On reboot my Atlanta Linode comes up with the time (presumably) synchronized to the host system's clock and then ntpdate/ntp fires and corrects the time. This involves a backwards time jump which seems to affect Dovecot the most, but I've also had blocked SSH connections until the system settles out.

I have a second Linode in another data center which after checking I found is off by 2 minutes. I compared the ntp.conf files between the two and found them to be identifical. To prove to myself that the issue is not directly related to a ntp configuration I booted the Atlanta node to a configuration profile that has init=/bin/bash and ran 'date'. This was an attempt to boot the Linode without networking support and before other processes had loaded so I could see what the host clock was set to. The assumption on my part is that the VM and the host are synchronized at boot, although I have no proof either way.

To the point, I took the value given from running the date command and compared it against my local laptop (which also uses NTP), several phones that sync against cell networks and my other Linode with a properly synchronized clock and found the Atlanta node to be off about 5 minutes.

From everything I've read this means the time has to be corrected in "BIOS", which for the Xen VPS setup I assume Linode runs would mean the host system that my Atlanta Linode runs on. Linode Support tells me that they've confirmed the host system's clock is correct.

By this point I've read various forum postings, searched via Google and read the official Dovecot wiki page here:

http://wiki2.dovecot.org/TimeMovedBackwards

I've been running ntpd for years now, presumably without this issue (although I wasn't running many services that would have immediately complained), so installing/configuring ntpd has already been taken care of.

Here's an original log snippet before I made changes to ntp.conf:

Code:
May  3 20:21:12 atlanta ntpd[1226]: ntpd 4.2.6p3@1.2290-o Tue Jun  5 20:12:11 UTC 2012 (1)
May  3 20:21:12 atlanta ntpd[1227]: proto: precision = 0.972 usec
May  3 20:21:12 atlanta ntpd[1227]: ntp_io: estimated max descriptors: 1024, initial socket boundary: 16
May  3 20:21:12 atlanta ntpd[1227]: unable to bind to wildcard address 0.0.0.0 - another process may be running - EXITING
May  3 20:21:12 atlanta dovecot: master: Dovecot v2.0.19 starting up (core dumps disabled)
May  3 20:21:13 atlanta dovecot: lmtp(1271): Connect from local
May  3 20:21:13 atlanta dovecot: auth-worker: mysql(127.0.0.1): Connected to database mailserver_v1
May  3 20:15:59 atlanta ntpdate[671]: step time server 50.116.38.157 offset -314.012644 sec
May  3 20:15:59 atlanta ntpd[1302]: ntpd 4.2.6p3@1.2290-o Tue Jun  5 20:12:11 UTC 2012 (1)
May  3 20:15:59 atlanta ntpd[1303]: proto: precision = 0.974 usec
May  3 20:15:59 atlanta ntpd[1303]: ntp_io: estimated max descriptors: 1024, initial socket boundary: 16
May  3 20:15:59 atlanta ntpd[1303]: Listen and drop on 0 v4wildcard 0.0.0.0 UDP 123
May  3 20:15:59 atlanta ntpd[1303]: Listen and drop on 1 v6wildcard :: UDP 123
May  3 20:15:59 atlanta ntpd[1303]: Listen normally on 2 lo 127.0.0.1 UDP 123
May  3 20:15:59 atlanta ntpd[1303]: Listen normally on 3 eth0 74.207.228.106 UDP 123
May  3 20:15:59 atlanta ntpd[1303]: Listen normally on 4 eth0 fe80::fcfd:4aff:fecf:e46a UDP 123
May  3 20:15:59 atlanta ntpd[1303]: Listen normally on 5 lo ::1 UDP 123
May  3 20:15:59 atlanta ntpd[1303]: peers refreshed
May  3 20:15:59 atlanta ntpd[1303]: Listening on routing socket on fd #22 for interface updates
May  3 20:15:59 atlanta dovecot: log: Warning: Time moved backwards by 314 seconds.
May  3 20:15:59 atlanta dovecot: master: Warning: Time moved backwards by 314 seconds, waiting for 180 secs until new services are launched again.
May  3 20:15:59 atlanta dovecot: ssl-params: Warning: Time moved backwards by 313 seconds.
May  3 20:16:00 atlanta dovecot: anvil: Warning: Time moved backwards by 313 seconds.
May  3 20:16:00 atlanta dovecot: auth: Warning: Time moved backwards by 313 seconds.
May  3 20:16:00 atlanta dovecot: config: Warning: Time moved backwards by 313 seconds.
May  3 20:16:00 atlanta dovecot: lmtp(1271): Fatal: Time just moved backwards by 313 seconds. This might cause a lot of problems, so I'll just kill myself now. http://wiki2.dovecot.org/TimeMovedBackwards


I attempted to work around the early exit and restart of ntpd by adding a line to ignore the wildcard interface, but that caused synchronization to fail. I have confirmed that by reverting to the previous configuration that synchronization does work, even if the earlier exit and restart of ntpd returns.

root@atlanta:~# ntpq -p
Code:
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
+palpatine.steve 231.146.174.254  3 u    9   64  377   61.853    0.154   1.585
+samur.ulak.net. 192.36.144.23    2 u    5   64  377  145.905   -1.958   1.479
*time3.chpc.utah 132.163.4.103    2 u   67   64  376   56.198    1.018   1.111
-time01.muskegon 64.113.32.5      2 u   13   64  377   60.461    3.472   1.545
-europium.canoni 140.203.204.77   2 u    1   64  377   87.009   -4.037   0.870


root@atlanta:~# grep -Ev '#|^$' /etc/ntp.conf
Code:
driftfile /var/lib/ntp/ntp.drift
statistics loopstats peerstats clockstats
filegen loopstats file loopstats type day enable
filegen peerstats file peerstats type day enable
filegen clockstats file clockstats type day enable
server 0.ubuntu.pool.ntp.org
server 1.ubuntu.pool.ntp.org
server 2.ubuntu.pool.ntp.org
server 3.ubuntu.pool.ntp.org
server ntp.ubuntu.com
restrict -4 default nomodify nopeer noquery notrap
restrict -6 default nomodify nopeer noquery notrap
restrict 127.0.0.1
restrict ::1


I'm not sure what else to try. I found the /etc/init/hwclock-save upstart job and ran it, but it error'd out. Light research seems to show that the /dev/rtc0 device does not exist for Xen VMs. Too bad as I was hoping that I could save the updated time to a local clock (however it is virtualized) so the next boot would be current.


Top
   
PostPosted: Sun May 04, 2014 6:53 pm 
Offline
Senior Member

Joined: Mon Jul 05, 2010 5:13 pm
Posts: 392
Xen provides the initial clock point to the Linode. That clock is initially based on the host's time, but it can skew over time and there isn't really a good way to stop that. All our hosts keep their clocks in sync with NTP; that doesn't stop Xen from skewing the initial boot time under some circumstances. There really isn't another fix for this besides running NTP inside your Linode. If you have services that don't properly start before NTP has updated your Linode's internal clock, you may want to tune your init system to start them only after NTP has done its thing.

- Les


Top
   
PostPosted: Sun May 04, 2014 7:42 pm 
Offline
Senior Newbie
User avatar

Joined: Sat Oct 30, 2010 12:53 pm
Posts: 19
Website: http://www.whyaskwhy.org/
Location: USA
akerl wrote:
Xen provides the initial clock point to the Linode. That clock is initially based on the host's time, but it can skew over time and there isn't really a good way to stop that.


Hi Les, thanks for the reply.

As I noted in my post the clock is 5 minutes fast immediately upon reboot of my Atlanta Linode. To prove it's not my setup that is the problem, I booted from Finnix and it also reports the time as off by five minutes.

akerl wrote:
All our hosts keep their clocks in sync with NTP; that doesn't stop Xen from skewing the initial boot time under some circumstances. There really isn't another fix for this besides running NTP inside your Linode.


When you say under some circumstances, do you know of certain scenarios that yield worse time skews than others? I currently run a 32-bit OS and use a distro kernel via pv_grub. As part of the guide I originally followed I disabled the Xenify Distro option. Does that have any bearing?

akerl wrote:
If you have services that don't properly start before NTP has updated your Linode's internal clock, you may want to tune your init system to start them only after NTP has done its thing.

- Les


No offense to you, but that seems more like a workaround than an actual fix to the problem. I'll look modifying the init job order like you suggested, but I can't help but feel something is being overlooked.


Top
   
PostPosted: Sun May 04, 2014 7:57 pm 
Offline
Senior Member

Joined: Mon Jul 05, 2010 5:13 pm
Posts: 392
No, the kernel choice, architecture, or other options internal to your Linode don't impact this. The reality is that the clock used by the hypervisor to give guests their initial time can and will skew over time. We run NTP, which corrects the dom0's clock, not the hypervisor.

- Les


Top
   
PostPosted: Mon May 05, 2014 10:56 am 
Offline
Senior Member
User avatar

Joined: Tue May 26, 2009 3:29 pm
Posts: 1691
Location: Montreal, QC
You really ought to run ntpd inside your linode, since your linode's clock will drift otherwise. This will ensure that your clock is always accurate regardless of what's going on on the host system. If you get a big skew right on boot, you can set ntpdate to run on boot, which will give you the gross correction, and then rely on ntpd to cancel out any drift (ntpd basically adjusts how fast time passes to counter drift, without making any sudden adjustments).


Top
   
PostPosted: Mon May 05, 2014 11:25 am 
Offline
Senior Newbie
User avatar

Joined: Sat Oct 30, 2010 12:53 pm
Posts: 19
Website: http://www.whyaskwhy.org/
Location: USA
Guspaz wrote:
You really ought to run ntpd inside your linode


I know it's long, but the OP mentions that I have been running ntpd in the Linode for years already. The problem I have with the workarounds mentioned thus far is finding a way to get ntpdate to run before all other services that depend on reliable time.

Dovecot has an Upstart job, it looks like Postfix and ntpd have init.d scripts and the ntpdate command is run as soon as eth0 comes online (/etc/network/if-up.d/ntpdate).

I have not had time to work on this again today, but i was going to try calling ntpdate from rc.local. I do not recall where that falls in the boot order, but I figured it was worth a try before I had to try and write an Upstart job (which I have no experience with).

For the record, this is Ubuntu 12.04 that I am working with.


Top
   
PostPosted: Mon May 05, 2014 12:36 pm 
Offline
Senior Member
User avatar

Joined: Tue May 26, 2009 3:29 pm
Posts: 1691
Location: Montreal, QC
I'm not sure I understand the problem then. The host system runs ntpd too. The only way you'd see a time jump is if your linode has drifted, which it normally would. If you also run ntpd on your linode, it won't drift, so there won't be any time jump when you reboot.


Top
   
PostPosted: Mon May 05, 2014 12:54 pm 
Offline
Senior Newbie
User avatar

Joined: Sat Oct 30, 2010 12:53 pm
Posts: 19
Website: http://www.whyaskwhy.org/
Location: USA
Guspaz wrote:
I'm not sure I understand the problem then. The host system runs ntpd too.


That's what I was told via the support ticket. They also mentioned that dom0 runs ntpd, but the hypervisor is handing over the wrong time to the guest. I don't have Xen training, so I don't know the different layers that make up a Xen infrastructure. Googling around I saw mention of some older versions of Xen having a bug which causes the problem described here, but I've no way of knowing how Linode staff have their host systems configured.

As I also mentioned in this thread I booted the VM using Finnix which had the same initial time that my VM's current OS has: 5 minutes too fast. Since I'm unable to set the equivalent of BIOS/CMOS time like I would in a VMware or VirtualBox VM, this tells me the problem lies with the host configuration. In this situation I would suspect that either a software or a misconfiguration issue is to blame. Linode Support says otherwise.

Guspaz wrote:
The only way you'd see a time jump is if your linode has drifted, which it normally would. If you also run ntpd on your linode, it won't drift, so there won't be any time jump when you reboot.


My earlier comments address this, but for whatever reason (again, my ignorance of Xen doesn't help things) the current time isn't saved properly on shutdown/reboot like it would on another setup (VMware/VirtualBox/real hardware), so the time given by the host (or hypervisor as I've been told) is what the VM has to start with when it boots.

I noticed that there are Upstart jobs dealing with loading and saving the system clock on startup/reboot, but they reference a Real Time Clock device that it appears Xen doesn't support.

All in all, the time given to my Linode on boot is incorrect and to work around it I have to have ntpdate run before any other service that requires accurate time (all of them?). Until I can figure that out I'll continually have problems every time the box is rebooted.


Top
   
PostPosted: Tue May 06, 2014 9:06 am 
Offline
Senior Member

Joined: Sun Mar 07, 2010 7:47 pm
Posts: 1970
Website: http://www.rwky.net
Location: Earth
I had the same problem on one node. You could try asking Linode to move you to another host, I've only see this happen once so odds are you'll move to a node that has the correct time or a time skewed in the past which should cause less problems.

_________________
Paid support
How to ask for help
1. Give details of your problem
2. Post any errors
3. Post relevant logs.
4. Don't hide details i.e. your domain, it just makes things harder
5. Be polite or you'll be eaten by a grue


Top
   
PostPosted: Tue May 06, 2014 2:24 pm 
Offline
Senior Newbie
User avatar

Joined: Sat Oct 30, 2010 12:53 pm
Posts: 19
Website: http://www.whyaskwhy.org/
Location: USA
obs wrote:
I had the same problem on one node. You could try asking Linode to move you to another host


That's a good suggestion. Is that what you did to resolve your issue? I'm a little surprised that Linode Support hasn't made that offer to me as a workaround for the problems with their setup.

In my testing so far it is looking like I'd have to use a combination of a new init.d script and modifications to the Dovecot Upstart job to ensure that correct time is set before Postfix, Dovecot and the like come online. It seems that Dovecot is the only daemon to raise such a fuss, but reading over their notes on the issue I imagine that a backwards time jump can't be good for other daemons either.


Top
   
PostPosted: Tue May 06, 2014 2:38 pm 
Offline
Senior Member

Joined: Mon Jul 05, 2010 5:13 pm
Posts: 392
Moving to a new host is not a solution to this issue. Any other host is just as likely to experience this issue, and statistically all hosts will eventually experience this issue.

- Les


Top
   
PostPosted: Tue May 06, 2014 5:42 pm 
Offline
Senior Member

Joined: Sun Mar 07, 2010 7:47 pm
Posts: 1970
Website: http://www.rwky.net
Location: Earth
I didn't move host I just ignored it since it only caused issues with fsck on boot up which I could work around by running fsck via finnix. However even though Les states that moving host isn't the solution I've only encountered this once and I've had a lot of nodes over the years.

_________________
Paid support
How to ask for help
1. Give details of your problem
2. Post any errors
3. Post relevant logs.
4. Don't hide details i.e. your domain, it just makes things harder
5. Be polite or you'll be eaten by a grue


Top
   
PostPosted: Wed May 07, 2014 1:29 am 
Offline
Senior Newbie
User avatar

Joined: Sat Oct 30, 2010 12:53 pm
Posts: 19
Website: http://www.whyaskwhy.org/
Location: USA
As a workaround to the hypervisor issue giving the wrong time I've put together an Upstart job, but it still needs work. It successfully sets the time prior to starting up key daemons, but on reboot it appears to hang the system. I've tracked that down to me using the 'rc' Upstart job as a start event.

Anyone know how to safely key off of that job?

While I already have what I need to workaround just Dovecot having the wrong initial time I'd like to correct the issue for all of the daemons that I can.

What I had:

Code:
start on (
    runlevel 2
    and static-network-up
    and starting cron
    and starting mysql
    and starting dovecot
    and starting ssh
    and starting rc
)


What I have now:
Code:
start on (
    runlevel 2
    and static-network-up
    and starting cron
    and starting dovecot
    and starting ssh
)


I removed rc because of the hanging issue and I removed mysql because Postfix gets upset when it starts via the SysV script and Upstart hasn't yet started the MySQL instance it needs to resolve addresses. Once I find a safe way to key off of rc so I can delay the SysV scripts I'll add it back to the list to delay that daemon's launch as well.

I tried creating a wrapper around the ntpdate command (only run on runlevel 2), but didn't have any luck. I thought about creating a SysV script that _only_ paused execution of the other scripts there, but didn't think it would have any effect.

I keep telling myself that this should be really simple.

Thanks in advance!


Top
   
PostPosted: Wed May 07, 2014 1:36 am 
Offline
Senior Member
User avatar

Joined: Sun Dec 27, 2009 11:12 pm
Posts: 1038
Location: Colorado, USA
Just curious, how often are you rebooting to make this such a big problem - and why?

_________________
Either provide enough details for people to help, or sit back and listen to the crickets chirp.
Security thru obscurity is a myth - and really really annoying.


Top
   
PostPosted: Wed May 07, 2014 2:09 am 
Offline
Senior Newbie
User avatar

Joined: Sat Oct 30, 2010 12:53 pm
Posts: 19
Website: http://www.whyaskwhy.org/
Location: USA
vonskippy wrote:
Just curious, how often are you rebooting to make this such a big problem - and why?


Mainly just when applying kernel, OpenSSL or similar patches. However I've had unexpected results because of the time jumps and I'm tired of it, so I'd like to go ahead and knock out a solid workaround and be done with it.

If you have any tips/suggestions I'm all ears. Thanks.


Top
   
Display posts from previous:  Sort by  
Post new topic  Reply to topic


Who is online

Users browsing this forum: No registered users and 4 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  
RSS

Powered by phpBB® Forum Software © phpBB Group