Linode Forum
Linode Community Forums
 FAQFAQ    SearchSearch    MembersMembers      Register Register 
 LoginLogin [ Anonymous ] 
Post new topic  Reply to topic
Author Message
 Post subject:
PostPosted: Tue Oct 27, 2009 9:08 pm 
Offline
Junior Member

Joined: Tue Jun 29, 2004 2:27 pm
Posts: 34
OverlordQ wrote:
A) Unplanned outage, how do they warn against those?
B) Do you want them to fix your box or post here?


Usually "unplanned outage" would imply that something unexpected happened outside of the control of the administrators. Chris initially said it was due to "a shared library update distributed to our hosts". Based on the thoroughness I've observed in the past from Linode, I would expect that that sort of update is (1) scheduled by Linode staff and (2) tested on a staging host before pushing to production hosts. If (1) is true then the update was planned, even if the outage was not. I think the point of many posters here is that such maintenance should be announced, even if no outage is expected. If they aren't doing (2), they should be, although that doesn't always catch the problem.

--John


Top
   
 Post subject:
PostPosted: Tue Oct 27, 2009 9:12 pm 
Offline
Junior Member

Joined: Sun Nov 16, 2008 4:35 am
Posts: 38
OverlordQ wrote:
A) Unplanned outage, how do they warn against those?


The original outage was unplanned, the maintenance that caused it was not.

OverlordQ wrote:
D) Run it yourself if you think you can do better.


I do. My linode use is for my personal business use.

I've been responsible for one particular corporate production service with thousands of customers since early 2006.

You know what each customer is paying us? The equivalent of a few US dollars per month. You know what our contractual uptime obligations are to them? Nothing. You know how much impact 24 hours of downtime would have on our customers? 95% probably wouldn't even notice.

You know how many people are involved with this service? At peak, it was 4. Now it's just me.

And yet, in all that time, all maintenance has taken place during off-peak hours, as has all planned downtime (which was communicated to customers well in advance). We have had approximately 30 minutes worth of unplanned downtime in that period, and about two hours of "partial" downtime due to one of our upstream ISP's flapping BGP (causing approximately 50% of customers to have intermittent difficulty connecting to the service).

I can do better, and I have done better. I do better every day. I'm confident Linode can and will, too, but they have seriously dropped the ball today, and need to be held accountable for it.


Top
   
 Post subject:
PostPosted: Tue Oct 27, 2009 9:12 pm 
Offline
Newbie

Joined: Tue Oct 27, 2009 8:06 pm
Posts: 3
dmuth wrote:
How hard is it to set up a mailing list of some sort that we can subscribe to to get announcements of upcoming maintenance? A former ISP of mine did just that, and it worked out great. They would email out a description of maintenance that was to be performed, who would be impacted, and a time window.

Not only would it save a lot of customer frustration, but it would make me look good to the other members of the non-profit I host when I warn them of upcoming maintenance. Right now it has the opposite effect, making look to them like *I* dropped the ball. Not cool.


I feel your pain, as I've had four angry customers call me and I'm left holding the bag, but remember that it's always hard to warn someone of unexpected downtime ;)


Top
   
 Post subject:
PostPosted: Tue Oct 27, 2009 9:13 pm 
Offline
Junior Member

Joined: Tue Jan 25, 2005 10:45 pm
Posts: 33
Infinito wrote:
Oh newbies.. take it easy. For me this is the first time that there has been a problem(at all, as relatively minor as it is) since I signed up, in 2007(my linode is on Freemont btw). And this isn't even a real problem so it seems, just some upgrade that went awry. Over two years up, I believe that fucking beats Amazon services. :)


+1 I've been with linode since Oct 2003 (also in Freemont) and this is the first real outage I'm aware of. The only other issues I've ever experienced are short outages due to DDOS etc..

Fortunately my box is up and running again :D


Top
   
 Post subject:
PostPosted: Tue Oct 27, 2009 9:21 pm 
Offline
Newbie

Joined: Tue Oct 27, 2009 8:05 pm
Posts: 2
OverlordQ wrote:
A) Unplanned outage, how do they warn against those?

Did you read the OP?
Quote:
To recover from this we may be issuing host reboots to upgrade their software to our latest stack, and then bringing the Linodes to their last state. We're working on this now and expect to have additional updates shortly. We'll also be notifying those affected via our support ticket system.


So no, I didn't get notified via the support system.

OverlordQ wrote:
B) Do you want them to fix your box or post here?

Oh yeah, because that's an either/or thing right?

I'm not bashing Linode and overall I've been very happy with the service, but today they fell short of my expectations.


Top
   
 Post subject:
PostPosted: Tue Oct 27, 2009 9:22 pm 
Offline
Senior Member

Joined: Thu Dec 04, 2008 10:55 am
Posts: 57
Location: New Jersey
JobID: 1505768 - Host initiated restart
Job Entered 01/04/1974 12:00:00 AM Status In Queue
Host Start Date Host Finish Date
Host Duration waiting on host Host Message

Thought that was kinda funny! My Newark node is all sorts of borked now. :shock:


Top
   
 Post subject:
PostPosted: Tue Oct 27, 2009 9:25 pm 
Offline
Senior Member

Joined: Thu Apr 03, 2008 12:02 am
Posts: 103
AOL: derole
spearson wrote:
Job Entered 01/04/1974 12:00:00 AM Status In Queue
Host Start Date Host Finish Date
Host Duration waiting on host Host Message

Thought that was kinda funny! My Newark node is all sorts of borked now. :shock:


Nah, they do that to force the boot job to the front of the queue.


Top
   
 Post subject:
PostPosted: Tue Oct 27, 2009 9:27 pm 
Offline
Senior Newbie

Joined: Sun Oct 21, 2007 10:07 pm
Posts: 8
ever since the downtime today I cannot get my linode to boot up correctly. I cannot reach it by ssh. When I use the ajax console it seems stuck and here is the message:
Code:
INIT: /etc/inittab[33]: rlevel field too long (max 11 characters)
INIT: /etc/inittab[34]: rlevel field too long (max 11 characters)
INIT: /etc/inittab[35]: rlevel field too long (max 11 characters)
INIT: /etc/inittab[36]: missing action field
Enter runlevel:


If I enter runlevel 3 it gives me this error:
Code:
                                                             
INIT: Entering runlevel 3                                         
INIT: no more processes left in this runlevel



What can I do at this point? I need help bad here. All my websites are down. I am dead in the water...


Top
   
 Post subject: Re: Ouch
PostPosted: Tue Oct 27, 2009 9:28 pm 
Offline
Senior Newbie

Joined: Sat Apr 18, 2009 6:21 pm
Posts: 8
Location: Europe
hiscom wrote:
Still, it would have been better to have this particular Stuff Happen at 3 or 4 AM when nobody would notice ...


Which 3 or 4 AM?

From Linode's 'Interesting stats': "131 countries customer diversity".


Top
   
 Post subject: Not smart of you.
PostPosted: Tue Oct 27, 2009 9:29 pm 
Offline
Senior Member
User avatar

Joined: Thu Jun 21, 2007 7:13 pm
Posts: 100
Website: http://neo101.org
Both my linodes (on different hosts) went down because of your mistake. One of them is still down. Why didn't you try your updates on just one host and then decide whether or not to push the updates on the others? You don't do this kind of thing on all hosts at once. You would have saved a lot of people a lot of trouble if you'd have used common sense!


Top
   
 Post subject:
PostPosted: Tue Oct 27, 2009 9:30 pm 
Offline
Senior Member

Joined: Mon Apr 27, 2009 7:36 pm
Posts: 59
Website: http://www.xenscale.com
Location: Boise, ID
Ofcourse noone likes downtime, but its a reality of the business.

I myself have to answer to my customers whom I referred to Linode and who pay me to make sure their services stay online. However, they understand that outages are a reality of the business.

Even the big guys like rackspace, amazon, the planet, google, and facebook have outages.

its not a matter of if...its a matter of when. It could be worse.

Two of my customers who I emailed to pass along information about this incident replied back and told me this is nothing in comparison to the outages they endured at media temple. Not uncommon for servers to remain offline for an entire day or more at a time without resolution.


Last edited by H3LR4ZR on Tue Oct 27, 2009 9:31 pm, edited 1 time in total.

Top
   
 Post subject:
PostPosted: Tue Oct 27, 2009 9:31 pm 
Offline
Newbie

Joined: Wed Nov 05, 2008 2:10 pm
Posts: 2
Earlier this week, I was bragging about my linode uptime on a mailing list. Oh well, that will teach me! Karma has its way ...

My linode appear to be up again, according to the dashboard. I got connection refused when ssh'ing, so I was fairly confident it was running indeed (otherwise, it would have timed out). I used the AJAX console to inspect why the ssh service was down, and lo and behold, it was stuck at the initramfs prompt complaining about a failed fask. Ran fsck -y, and the ... lost connection to the console, which have been down since then. Arrrgh!!!11

Rebooting now. Hopefully it will come up just fine. Hopefully.

_________________
"For Bruce Schneier, SHA-1 is merely a compression algorithm."
http://geekz.co.uk/schneierfacts/fact/164


Top
   
 Post subject:
PostPosted: Tue Oct 27, 2009 9:31 pm 
Offline
Senior Newbie

Joined: Tue Oct 27, 2009 9:26 pm
Posts: 15
While I am not impressed with the lack of advance notification, I am displeased even further in the lack of reboot notification.

It was stated that those affected would receive notification if any hosts require a reboot. Thus far, 2 of my Linodes have been restarted without any notification - after I'd read that I would be notified.

If you say you are going to notify customers, you should do just that. I was expecting notification if downtime was going to occur - not just the downtime!


Top
   
 Post subject: Re: Ouch
PostPosted: Tue Oct 27, 2009 9:31 pm 
Offline
Senior Member

Joined: Mon Feb 02, 2009 1:43 am
Posts: 67
Website: http://fukawi2.nl
Location: Melbourne, Australia
andrewz wrote:
What can I do at this point? I need help bad here. All my websites are down. I am dead in the water...

Open a new thread, or support ticket...

Rogi wrote:
hiscom wrote:
Still, it would have been better to have this particular Stuff Happen at 3 or 4 AM when nobody would notice ...


Which 3 or 4 AM?

From Linode's 'Interesting stats': "131 countries customer diversity".

Countries exist outside the USA?

:lol:


Top
   
 Post subject:
PostPosted: Tue Oct 27, 2009 9:32 pm 
Offline
Senior Newbie
User avatar

Joined: Fri Jun 05, 2009 11:31 am
Posts: 5
Location: Australia
OverlordQ wrote:
A) Unplanned outage, how do they warn against those?


By creating maint@linode which lists _all_ maintenance, without exception. That way I can look at that mail archive and see todays date and 'upgrading libraries on xen hosts' and go... ahhhh!

Quote:
B) Do you want them to fix your box or post here?


Actually I want them to post here first, then fix the problem

Ive just had to tell a customer "your servers seem ok but _may_ be rebooted at some point" because I dont know if they are going to reboot all hosts. I had to tell them that because there is _NO_ official information that I can find on the status of the problem (or even much on the cause, fix eta, etc)

Take 5 mins to post _all_ the info you have. Say what is going to happen to rebooted hosts (are they now ok) and what about un-rebooted hosts (will they be rebooted later or are they fine).

Part of taking credit and fanboy love we have for this wonderful thing (and I think linode is great) is also taking the responsibility for the fsck-ups that happen along the way.

Im a professional sysadmin, so I understand things 'go wrong' and that is fine, people have to live with that. But what you *can do* is be open and honest and fully informative about the problem. It takes 5 mins to do and often stopping and thinking about the problem enough to lay it out clearly can actually help.

For example xen instances can be _saved_ to disk. Is there some reason that admin's cant do the following:
* save all xen instances on a host
* reboot the host
* restore the xen instance
If that was workable, then maybe you could have a 2 minute 'hang' for each host and _not need_ a reboot. *shrugs* Maybe linode should look at trying that (xm save _savefile_) and see if it could be used to reduce the impact next time.

Quote:
C) You get what you pay for.
D) Run it yourself if you think you can do better.


I pay for a service and part of that service involves updating the:
* forums
* outage announcements
* blogs
* twitter
None of which have any useful information.

I have taken 5 mins to email my clients and say "linode hosted servers are to be taken as 'unreliable' until further notice". :-) There. I did better. ;-)

_________________
---
Monkeys for the win....
.... but Labradors for the snooze.


Top
   
Display posts from previous:  Sort by  
Post new topic  Reply to topic


Who is online

Users browsing this forum: No registered users and 4 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  
RSS

Powered by phpBB® Forum Software © phpBB Group