Linode Forum
Linode Community Forums
 FAQFAQ    SearchSearch    MembersMembers      Register Register 
 LoginLogin [ Anonymous ] 
Post new topic  Reply to topic
Author Message
 Post subject:
PostPosted: Wed Oct 28, 2009 8:49 am 
Offline
Senior Newbie

Joined: Tue Aug 12, 2008 9:39 am
Posts: 6
Website: http://www.sandycovesw.com
Location: CT, USA
Honestly, I've been very happy with Linode since I signed up with them a year and a half ago. However, as someone who was a professional UNIX admin for 5 years, there are two things that really bother me about this outage:

1. There was zero communication about the maintenance being done. I understand Linode has international customers, so they can't schedule downtime that will make everyone happy. On the other hand, it would have been nice to know this was coming so we wouldn't have all been caught unaware.

2. That every server in every datacenter was upgraded at once. Even if this upgrade was tested prior to pushing it out to production, there was no way of knowing for sure that there would be zero problems. Even the best tested upgrade can go awry. My business isn't big enough yet to have taken a serious hit from what looks like 4 hours of downtime. When it did get that big, I was planning on buying a second Linode in another datacenter, but if upgrading everything at once is going to be the policy going forward, I'll probably end up buying my secondary server from a different VPS provider.


Top
   
 Post subject:
PostPosted: Wed Oct 28, 2009 9:47 am 
Offline
Senior Member

Joined: Fri Jan 09, 2009 5:32 pm
Posts: 634
craversp wrote:
2. That every server in every datacenter was upgraded at once. Even if this upgrade was tested prior to pushing it out to production, there was no way of knowing for sure that there would be zero problems.


This is what bugs me as well. No matter how well it's been tested, at the very least, upgrades should be broken up by datacenter and performed with some gap between. ie more than 5 minutes, so that the upgrade can bake in and in this case only one DC would have been affected.

To address someone else's comment (too lazy to go back and find it), would I rather someone post a communication or work on the problem. To a sysadmin, the answer isn't intuitive, but it's post a communication. Working for a very large company, I've been angered when my boss has pulled me off of fixing things to communicate about the outage, but in the end, it's the right move. Taking 2 minutes to dash off a communication calms people and gets them off your back to a much greater degree than the cost of that 2 minutes getting the last server up.


Top
   
 Post subject:
PostPosted: Wed Oct 28, 2009 10:10 am 
Offline
Senior Newbie

Joined: Fri Jul 31, 2009 6:47 am
Posts: 12
glg wrote:
This is what bugs me as well. No matter how well it's been tested, at the very least, upgrades should be broken up by datacenter and performed with some gap between. ie more than 5 minutes, so that the upgrade can bake in and in this case only one DC would have been affected.


It could be that Linode's structure is such that pushing it to all datacentres at once was vital. But we won't know until they tell us :)


Top
   
 Post subject:
PostPosted: Wed Oct 28, 2009 10:26 am 
Offline
Senior Member
User avatar

Joined: Fri Oct 24, 2003 3:51 pm
Posts: 965
Location: Netherlands
gnummep-martin wrote:
It could be that Linode's structure is such that pushing it to all datacentres at once was vital. But we won't know until they tell us :)

I sure as hell hope not. If this turn out to be the case, then it adds serious weight to the "your hot standby machine should be with another provider" argument. If this is the case, Linode should (and most likely will) address the problem by changing the structure that required simultaneous upgrades.

_________________
/ Peter


Top
   
 Post subject:
PostPosted: Wed Oct 28, 2009 10:33 am 
Offline
Senior Newbie

Joined: Fri Jul 31, 2009 6:47 am
Posts: 12
pclissold wrote:
gnummep-martin wrote:
It could be that Linode's structure is such that pushing it to all datacentres at once was vital. But we won't know until they tell us :)

I sure as hell hope not. If this turn out to be the case, then it adds serious weight to the "your hot standby machine should be with another provider" argument. If this is the case, Linode should (and most likely will) address the problem by changing the structure that required simultaneous upgrades.


Well, perhaps it was the nature of the upgrade, I don't know. And we won't know unless they publish a better explanation of some sort.


Top
   
 Post subject:
PostPosted: Wed Oct 28, 2009 10:41 am 
Offline
Junior Member

Joined: Thu Feb 05, 2009 12:48 pm
Posts: 24
My linode was bounced, however I cannot find a support ticket from the linode team in my email or their support system.


Top
   
PostPosted: Wed Oct 28, 2009 10:44 am 
Offline
Senior Newbie

Joined: Tue Apr 18, 2006 9:43 am
Posts: 13
....Such as "General Discussion"?

I get notified of every post to "System and Network Status", and I suspect that I'm not the only one.

I want to be notified about status changes, but in the last 12 hours my inbox has been overflowing with notifications about postings that offer no additional information.


Top
   
PostPosted: Wed Oct 28, 2009 10:46 am 
Offline
Senior Member

Joined: Thu Apr 03, 2008 12:02 am
Posts: 103
AOL: derole
cirric wrote:
I want to be notified about status changes, but in the last 12 hours my inbox has been overflowing with notifications about postings that offer no additional information.


There is a link in the lower left corner of the page saying "Stop watching this topic" - maybe that's what you're looking for ?


Top
   
 Post subject: Create a support ticket.
PostPosted: Wed Oct 28, 2009 11:12 am 
Offline
Senior Member
User avatar

Joined: Thu Jun 21, 2007 7:13 pm
Posts: 100
Website: http://neo101.org
A tip to everyone who's Linode is still down: Create a support ticket asking them to fix your host. I did that, and 15 minutes later my Linode was alive and kicking. So now I've had a downtime of 18.5 hours but at least it's up again. I hope someone is bringing food to their admins as I imagine they've been working nonstop since the problem was first reported.

Good luck everyone!

EDIT:

Btw: This is the response I got from my ticket:

Quote:
Hello,

I repaired /etc/inittab and /etc/fstab and issued a boot job -- and the Linode appears to have booted correctly. We apologize for this inconvenience.

Please let us know if there's anything else we can assist you with.

Regards,
-Chris


So if you can mount your disk image with the Finnix rescue disk image (create a profile for booting the Finnix disk image and mount your real disk image as the second disk image) you could maybe fiddle around with /etc/inittab and /etc/fstab and perhaps fix the problem yourselves. But that is probably not possible because if it were, then Linode staff would probably have posted instructions on how we could fix our own Linodes. But someone who knows more than me and still have a nonfunctioning Linode could at least try while waiting.

EDIT2:

You can look at a copy of my inittab and fstab files on the below urls if you want to see a functioning version. I run my Linode on Debian and use ext3 as my filesystem.

http://neo101.org/fstab
http://neo101.org/inittab

EDIT3:

Can someone post a non-working fstab and inittab file? It would be interesting so see what the differences are. Maybe we could make a short howto for those who would want to fix the problem themselves instead of waiting for the admins.


Top
   
 Post subject:
PostPosted: Wed Oct 28, 2009 1:22 pm 
Offline
Senior Member

Joined: Wed Apr 11, 2007 8:23 pm
Posts: 76
I'll just throw in my two cents here:

Those who are complaining about communication problems were obviously looking in the wrong places. Their twitter account got several updates during the downtime. The staff was always on irc giving us updates.

The problem I got was that Linode was doing a shared library update that occurs every once in a while with no downtime. I assume that this type of thing has been done in the past without error and this was just another routine host update. Something about the libraries or the way they were installed caused issues on the host. They then decided to fix every host one at a time. Having over 500 hosts with a staff as small if Linode's leaves some downtime.


Top
   
 Post subject:
PostPosted: Wed Oct 28, 2009 1:49 pm 
Offline
Senior Newbie

Joined: Tue Aug 12, 2008 9:39 am
Posts: 6
Website: http://www.sandycovesw.com
Location: CT, USA
My complaint isn't so much about communication while they work the issue, they did a decent job there. My complaint is that they don't often announce it ahead of time when they are going to be doing system maintenance. And don't tell me that "it was minor" or "what could you have done in advance anyway". EVERY type of system maintenance, no matter how small, has SOME chance of bringing things to a screeching halt. And even if we couldn't have done anything, letting customers know in advance when there is a maintenance window, at a minimum, buys the sys admins a little working time before people start expecting updates if something goes wrong.


Top
   
PostPosted: Wed Oct 28, 2009 4:24 pm 
Offline
Senior Newbie

Joined: Tue Apr 18, 2006 9:43 am
Posts: 13
oliver wrote:
There is a link in the lower left corner of the page saying "Stop watching this topic" - maybe that's what you're looking for ?


That option doesn't appear until you reply to a particular thread. But even before I replied, I was subscribed to the forum. So, I got notification of any replies posted to the forum.

The notification email includes a link to stop watching this forum, but I don't want to disable that, because I'll miss the original postings of announcements by staff.

Yes, it's a limitation of the forum software. I'm just asking that folks use "General Discussion" for general discussion, and reserve "System and Network Status" for actual status updates.


Top
   
 Post subject: Business burden
PostPosted: Fri Oct 30, 2009 3:52 pm 
Offline
Junior Member

Joined: Wed Jan 23, 2008 9:49 pm
Posts: 34
While I admit that my linode is not much more than an experimental and vanity ground. I do not understand why people are so quick to blame Linode for their loss of business. Linode is not responsible for your revenue stream. You are. Stop making Linode responsible for your business. If you have a mission critical ZOMG CANT HAVE ANY DOWNTIME type of application then for pete's sake, have some redundancy. Don't just trust caker for your income. Trust no single point of failure. Get other servers somewhere else. Make it so that if caker and co suddenly decide it's not worth their time to run Linode anymore, you're not left with nothing. The Linode staff is responsible for their own business. Take care of yours yourself.

Cheers,
Antonio

It's not a question of if the ball will get dropped.... Ohhh it will believe you me. Like in all respects when dealing with humans, it's a question of when. So be prepared.


Top
   
Display posts from previous:  Sort by  
Post new topic  Reply to topic


Who is online

Users browsing this forum: No registered users and 3 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  
RSS

Powered by phpBB® Forum Software © phpBB Group