Linode Forum
Linode Community Forums
 FAQFAQ    SearchSearch    MembersMembers      Register Register 
 LoginLogin [ Anonymous ] 
Forum locked  This topic is locked, you cannot edit posts or make further replies.
Author Message
 Post subject: Reboot: newark41
PostPosted: Thu Jul 16, 2009 2:31 pm 
Offline
Linode Staff
User avatar

Joined: Tue Apr 15, 2003 6:24 pm
Posts: 3090
Website: http://www.linode.com/
Location: Galloway, NJ
newark41 became unresponsive to our monitoring and required a reboot. We took this opportunity to upgrade it to our latest Xen stack and Linodes are coming up now.

-Chris


Top
   
 Post subject: reboot: newark41
PostPosted: Thu Jul 16, 2009 2:41 pm 
Offline
Senior Newbie

Joined: Wed Dec 07, 2005 10:57 pm
Posts: 17
Location: Philadelphia, PA
Excellent. A+ on your response time and communication.


Top
   
 Post subject:
PostPosted: Fri Jul 17, 2009 4:26 pm 
Offline
Senior Member

Joined: Wed May 13, 2009 1:18 am
Posts: 681
I was wondering if there is any information that could be made available on how monitoring, such as what led to this restart, operates at Linode and what sort of response times are expected to host outages?

From what I can glean across various logs on my VPS (which was on this host), it would appear that operations had halted perhaps 45 minutes prior to the restart, which seems a bit extreme for a full outage. My VPS would have been essentially idle, but periodic stuff like munin polls, syslog hourly marks, etc.. had all stopped. Of course, I suppose individual VPSes may have issues before monitoring of the host itself is impacted, depending on the type of problem, or type of monitoring used.

The SLA question in the FAQ is pretty generic ("the quickest fashion possible"), but 45 minutes is longer than I would have anticipated for a monitoring system and/or response to a hung host.

Thanks.

-- David


Top
   
 Post subject:
PostPosted: Fri Jul 17, 2009 4:38 pm 
Offline
Linode Staff
User avatar

Joined: Tue Apr 15, 2003 6:24 pm
Posts: 3090
Website: http://www.linode.com/
Location: Galloway, NJ
David,

If you consider: a potential few minutes for our monitoring to detect the condition and issue alerts, a few more minutes for us to respond, investigate and diagnose the problem and come up with a plan of attack, and actually initiate it, then a few minutes waiting on a host reboot, performing some integrity checks and potentially updating the stack (another reboot), I think from time of failure to ready-state there's the potential for anywhere between 5-15 minutes of elapsed time. After that, the Linodes boot serially, so depending on where your Linode is in the list it may boot sooner than later.

-Chris


Top
   
 Post subject:
PostPosted: Fri Jul 17, 2009 5:50 pm 
Offline
Senior Member

Joined: Wed May 13, 2009 1:18 am
Posts: 681
Thanks for the response. I hope I'm not coming across as just complaining - I'd like to use this event to better understand my exposure down the road, as this particular VPS is a test box in advance of a production system. Would you consider this outage a representative event, or was this one that might have taken longer to trigger alerts on the host level monitoring, or encountered any unusual delays along the way?

The monitoring/diagnostic times up seem reasonable, especially to try to determine if a full host reboot is actually required since in itself that can be intrusive if there's an alternative measure possible. Hopefully the timing for most events skews towards the low end, but the worst case range is understandable. For my reference, do you happen to know the actual timing involved with this specific event?

The VPS restart latency is a major point I hadn't run the calculations on previously, and upon reflection is a notable difference between a VPS and, say, shared hosting, though the latter isn't something I'd consider for many other reasons. The Linode Manager job log for the reboot yesterday shows 27 seconds of host time, which I assume would be the granularity of the timing of the serial nature of the individual VPS restarts. I believe my VPS ought to be fairly representative of a typical configuration, so extrapolating from that, if I'm at the back of the VPS list on a host (a 360 in this case, say 40 VPSes), any restart of the host could cost me up to perhaps 15-20 minutes of downtime.

A test reboot I just did now took 11 seconds of host time, but I assume the host would be more loaded during a full restart.

Is there any randomization of (or thought of randomizing in the future) that restart list to avoid always penalizing the same VPSes on a given host during a restart?

I'm still a little concerned about the total duration of this event though, even given the above. Even at the edge of the expectation to take action, and even if at the end of the VPS list, I would think I should have been up more like 30-35 minutes than 45. I'd be happier if I could account for or understand that extra time.

There's also an interesting time anomaly with the restart job shown in Linode Manager. It claims my box was up 5 minutes earlier than my own logs show it was. My test restart timestamps match to within seconds, so I don't think it's time inaccuracy. Perhaps that's due to overlapping VPS systems booting up (past the point where the restart jobs imposed a serial sequence) and thus being delayed by shared disk access or some other resource? In my case the time descrepancy is from the job log to the first kernel log message.

-- David


Top
   
 Post subject:
PostPosted: Sat Jul 18, 2009 8:33 pm 
Offline
Senior Newbie

Joined: Tue Feb 03, 2009 3:38 pm
Posts: 9
Just figured I'd chip in here as I too have a Linode on newark41.

As far as I'm aware (looking through the forum posts), this is only the second time this host has needed to be rebooted since it was put on-line. In my eyes that's excellent. Yes a shared host would be back on-line quicker, but the extra downtime is understandable when the host machine has to boot 40 visualised OS's one-by-one.

As for the 45 minute downtime, I don't believe the host was down that long. I run a ventrilo server on my Linode and myself and a friend were talking the moment it went down. From that point to the full-boot of my Linode was around 15-20 minutes at the absolute maximum, and I'm fairly certain my Linode is one of the last to come up.

Perhaps whatever caused the issue affected your Linode earlier than mine, I've no way of knowing, but I do know my Linode wasn't off-line anywhere near 45 minutes.

All this aside, I want to express my thanks to Linode once again. Your service is unmatched and despite the fact I barely use my Linode, you've got a customer for as long as I can afford to be one.

All the best.


Top
   
 Post subject:
PostPosted: Sun Jul 19, 2009 6:05 pm 
Offline
Senior Member

Joined: Wed May 13, 2009 1:18 am
Posts: 681
Nexx wrote:
As far as I'm aware (looking through the forum posts), this is only the second time this host has needed to be rebooted since it was put on-line. In my eyes that's excellent.

I hesitated responding since it'll sound negative again (I'm really not, honest), but the above would depend on how long it's been operational. A forum search seems to show three reboots for newark41 (Sep 1 2008, Mar 21 2009 and now). I only got my Linode in May, so I'm assuming newark41 is a relatively new host, but I don't know when prior to Sept it was brought online. But labeling 3 outages over 10 months as excellent is a stretch - or ought to be - in the context of a commercial data center. Then again, perhaps I'm being too harsh, as most of my prior experience has been with dedicated servers and/or colocation.

I do expect that on average (and have certainly seen postings reflect) host uptimes are significantly higher.

Quote:
As for the 45 minute downtime, I don't believe the host was down that long. I run a ventrilo server on my Linode and myself and a friend were talking the moment it went down. From that point to the full-boot of my Linode was around 15-20 minutes at the absolute maximum, and I'm fairly certain my Linode is one of the last to come up.

Thanks, that's helpful to know - not sure what to say, as I was sleeping during the outage, but I have munin logging at 5 minute intervals, so in addition to syslogs, it was fairly easy to see post-mortem the period when normal operations had halted, at least internal to my Linode

How can you judge your Linode being one of the last to come up? Actually that's an interesting thought - my "restart" job for the outage shows a host start date of 07/16/2009 02:36:34 PM - any chance of checking yours?

At some level, I suppose it isn't critical, since even if I'm having trouble accounting for an extra 5-10 minutes of outage, I'm not sure that changes all that much in terms of the impact of the outage.

Quote:
All this aside, I want to express my thanks to Linode once again. Your service is unmatched and despite the fact I barely use my Linode, you've got a customer for as long as I can afford to be one.

I'm in agreement with this - I've been extremely pleased with the setup, management and operations of my Linode and plan to continue forward with my production server shortly.

-- David


Top
   
 Post subject:
PostPosted: Wed Jul 22, 2009 7:55 am 
Offline
Senior Newbie

Joined: Tue Feb 03, 2009 3:38 pm
Posts: 9
db3l wrote:
Nexx wrote:
As far as I'm aware (looking through the forum posts), this is only the second time this host has needed to be rebooted since it was put on-line. In my eyes that's excellent.

I hesitated responding since it'll sound negative again (I'm really not, honest), but the above would depend on how long it's been operational. A forum search seems to show three reboots for newark41 (Sep 1 2008, Mar 21 2009 and now). I only got my Linode in May, so I'm assuming newark41 is a relatively new host, but I don't know when prior to Sept it was brought online. But labeling 3 outages over 10 months as excellent is a stretch - or ought to be - in the context of a commercial data center. Then again, perhaps I'm being too harsh, as most of my prior experience has been with dedicated servers and/or colocation.
Apologies, you're correct, it has been 3 reboots now, but considering the complexity of running 40 visualised OS's I don't think that's bad at all. My previous hosting experience has been with shared hosting and the awful speed (due to massively oversold servers), as well as the uptime issues is why I moved to Linode.

Quote:
Quote:
As for the 45 minute downtime, I don't believe the host was down that long. I run a ventrilo server on my Linode and myself and a friend were talking the moment it went down. From that point to the full-boot of my Linode was around 15-20 minutes at the absolute maximum, and I'm fairly certain my Linode is one of the last to come up.

Thanks, that's helpful to know - not sure what to say, as I was sleeping during the outage, but I have munin logging at 5 minute intervals, so in addition to syslogs, it was fairly easy to see post-mortem the period when normal operations had halted, at least internal to my Linode

How can you judge your Linode being one of the last to come up? Actually that's an interesting thought - my "restart" job for the outage shows a host start date of 07/16/2009 02:36:34 PM - any chance of checking yours?
Sure.

Start date: 07/16/2009 07:47:09 PM
Finish date: 07/16/2009 07:48:18 PM

I'm assuming the 5 hour difference is due to the Linode dashboard correcting for time zone (I'm in the UK). It certainly wasn't offline that long ;)

As for why I believe my Linode to be one of the last? When I signed up for my Linode account there were no 360's available in Newark. The staff told me they'd be installing more hosts in the next few days but someone must have removed their 360 as a single one became available and I grabbed it.

I'm assuming that put me at the end of the list, and looking at the difference in our timestamps I believe it was a fair assumption.

Quote:
At some level, I suppose it isn't critical, since even if I'm having trouble accounting for an extra 5-10 minutes of outage, I'm not sure that changes all that much in terms of the impact of the outage.

Quote:
All this aside, I want to express my thanks to Linode once again. Your service is unmatched and despite the fact I barely use my Linode, you've got a customer for as long as I can afford to be one.

I'm in agreement with this - I've been extremely pleased with the setup, management and operations of my Linode and plan to continue forward with my production server shortly.

-- David
If you have any further questions, just ask =)


Top
   
Display posts from previous:  Sort by  
Forum locked  This topic is locked, you cannot edit posts or make further replies.


Who is online

Users browsing this forum: No registered users and 0 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  
RSS

Powered by phpBB® Forum Software © phpBB Group