Linode Forum
Linode Community Forums
 FAQFAQ    SearchSearch    MembersMembers      Register Register 
 LoginLogin [ Anonymous ] 
Post new topic  Reply to topic
Author Message
 Post subject:
PostPosted: Fri May 06, 2011 11:09 pm 
Offline
Senior Newbie

Joined: Fri Oct 22, 2010 4:13 am
Posts: 6
Website: http://kenyonralph.com
Location: California
waldo wrote:
They do. Have you enabled Lassie? Log into the Linode manager, click on the Settings tab for your node.

It's been so long since I looked, I can't recall if it's enabled by default or not.

Description:
Quote:
Lassie is a Shutdown Watchdog that monitors your Linode and will reboot it if it powers off unexpectedly. It works by issuing a boot job when your Linode powers off without a shutdown job being responsible.
To prevent a loop, Lassie will give up if there have been more than 5 boot jobs issued within 15 minutes.

I see, thanks. I must have disabled lassie for some reason. Did it work in this case?


Top
   
 Post subject:
PostPosted: Fri May 06, 2011 11:18 pm 
Offline
Senior Newbie
User avatar

Joined: Mon Mar 02, 2009 12:06 am
Posts: 13
Linode and Linode support are wonderful, but I can't believe power outages repeatedly knock out a world-class datacenter like HE.

From the November outage RFO:

"The Fremont facility is consulting with the UPS manufacturer to make sure the system is more robust in order to protect against similar failures in the future. We plan to follow up with them and ensure that the reliability of Linode's infrastructure meets our expectations."


Top
   
 Post subject:
PostPosted: Fri May 06, 2011 11:26 pm 
Offline
Senior Member

Joined: Thu May 21, 2009 3:19 am
Posts: 336
kenyon wrote:
I see, thanks. I must have disabled lassie for some reason. Did it work in this case?


Yep.


Top
   
 Post subject:
PostPosted: Sat May 07, 2011 12:38 am 
Offline
Senior Member

Joined: Thu Jul 22, 2010 8:23 pm
Posts: 60
given this is 3 strikes for HE,

What steps will Linode be taking to ensure that its provider is fit to provide a Tier 1 service?


Top
   
 Post subject:
PostPosted: Sat May 07, 2011 4:56 am 
Offline
Senior Newbie

Joined: Thu Dec 23, 2010 6:09 pm
Posts: 11
Location: Australia
waldo wrote:
kenyon wrote:
I see, thanks. I must have disabled lassie for some reason. Did it work in this case?


Yep.


Worked for me too. I just spent an hour going through logs looking for the reason for re-start and came up blank. Came to the forums to ask a question and here's my answer :)


Top
   
 Post subject:
PostPosted: Sat May 07, 2011 11:29 am 
Offline

Joined: Sat May 07, 2011 11:25 am
Posts: 1
Being in this DC is starting to worry me :|

I had done some work on my Linode the night before and I usually sync backups to another machine at my place. Add to that I had turned off the machine for the night. Just my luck that the DC would go down and those same configuration files where lost :lol:
Oh well. That is what I get for not backing things up. :D Will have to rewrite them when I can again.


Top
   
 Post subject:
PostPosted: Sat May 07, 2011 12:01 pm 
Offline
Senior Member

Joined: Fri Jan 09, 2009 5:32 pm
Posts: 634
JeremyD wrote:
Being in this DC is starting to worry me :|

I had done some work on my Linode the night before and I usually sync backups to another machine at my place. Add to that I had turned off the machine for the night. Just my luck that the DC would go down and those same configuration files where lost :lol:
Oh well. That is what I get for not backing things up. :D Will have to rewrite them when I can again.


um, what? how were your configurations lost? the power went down, didn't see anything about any hosts blowing up.


Top
   
 Post subject:
PostPosted: Sat May 07, 2011 3:28 pm 
Offline
Senior Member
User avatar

Joined: Wed Mar 17, 2004 4:11 pm
Posts: 554
Website: http://www.unixtastic.com
Location: Europe
dbuckley wrote:
I'm convinced the highly paid, highly qualified, highly regarded enginers who design these systems are morons, because they have failed to notice that the systems they design and specify fail time and time again.


I'm not defending HE here because they did screw up, but building redundant power is way harder than it sounds. UPS's fail as the worst times even though they pass weekly tests. Diesel generators fail even though they pass weekly tests. Switching equipment jams, things overheat. Idiots wire dual PSU servers on one power circuit then swear blind they used both. Racks of servers all start up at the same time because the BIOS random start-up delay wasn't set and management didn't pay 10 times more for staged start-up PDUs. Air conditioning gets a bit old and starts to draw more current than the specs say. And humans normally screw things up big time when they realize they don't actually have a procedure for the current situation and start panicing.

Everything that can go wrong will go wrong. Everything that can't go wrong will go wrong anyway. And every bit of equipment that solves one problem introduces another one.

I don't think it's actually possible to do better than one rackmount UPS per server sitting in the rack right next to that server. That's what EMC do with their storage arrays. It's pretty expensive and hard to manage but it's the only thing I've seen that actually works.


Top
   
 Post subject:
PostPosted: Sun May 08, 2011 9:07 am 
Offline
Senior Member
User avatar

Joined: Tue Nov 24, 2009 1:59 pm
Posts: 362
sednet wrote:
I don't think it's actually possible to do better than one rackmount UPS per server sitting in the rack right next to that server. That's what EMC do with their storage arrays. It's pretty expensive and hard to manage but it's the only thing I've seen that actually works.

One rackmount UPS per power supply, not per server. Obviously with redundant power supplies.

_________________
rsk, providing useless advice on the Internet since 2005.


Top
   
 Post subject:
PostPosted: Sun May 08, 2011 10:06 am 
Offline
Senior Member

Joined: Mon Oct 27, 2008 10:24 am
Posts: 173
Website: http://www.worshiproot.com
sednet wrote:
Everything that can go wrong will go wrong. Everything that can't go wrong will go wrong anyway.


To paraphrase the late, great Douglas Adams... The only difference between something that can go wrong and something that go wrong is that when something that can't go wrong does go wrong, it's much harder to fix.


Top
   
 Post subject:
PostPosted: Sun May 08, 2011 2:37 pm 
Offline
Junior Member

Joined: Mon Apr 11, 2011 9:49 pm
Posts: 49
JshWright wrote:
To paraphrase the late, great Douglas Adams... The only difference between something that can go wrong and something that go wrong is that when something that can't go wrong does go wrong, it's much harder to fix.
I don't get you, or the other dude.. haha.. if fremont can't handle a occasional storm, it might be time to look for another facility near Fremont that can. And that might sound harsh, but you'd think by now they'd have some sort of plan that doesn't involve downtime.

Additionally, I think I'd be less worried about it if it happened with all the linode centers, instead it seems like fremont gets this online issue pretty often. And apparently.. I'm not the only one who isn't cool with downtime. :roll:


Top
   
 Post subject:
PostPosted: Mon May 09, 2011 10:05 am 
Offline
Senior Member

Joined: Wed Feb 13, 2008 2:40 pm
Posts: 126
superfastcars wrote:
And apparently.. I'm not the only one who isn't cool with downtime. :roll:
If you can't stand downtime, then you have already bought additional server(s) in other physical location(s) and set up failover so that one facility going down (which happens because, obviously, nothing is perfect) doesn't screw you over. If you haven't set that up, then you don't actually care about downtime.


Top
   
 Post subject:
PostPosted: Mon May 09, 2011 3:01 pm 
Offline
Senior Member

Joined: Thu Aug 28, 2003 12:57 am
Posts: 273
Alucard wrote:
superfastcars wrote:
And apparently.. I'm not the only one who isn't cool with downtime. :roll:
If you can't stand downtime, then you have already bought additional server(s) in other physical location(s) and set up failover so that one facility going down (which happens because, obviously, nothing is perfect) doesn't screw you over. If you haven't set that up, then you don't actually care about downtime.


That's a very nice example of the "No True Scotsman" logical fallacy. You are claiming that people who "really" care about downtime would do the thing you suggest, and anyone who doesn't doesn't "really" care about downtime.

The logical fallacy is in the presumption that "caring about downtime" implies caring about downtime to the exclusion of all else. Obviously different people balance their needs differently and it's perfectly possible to care about downtime to the extent that you buy into a service with a reasonable belief that it will maintain a given level of uptime and then find later that it does not meet your expectations. This does not mean that you didn't care about downtime to begin with because you didn't choose the most extreme option for minimizing it; it just means that either your initial assessment of the service was wrong, either because you assumed too much, or because the service promised too much.

I am not sure what is the case here; but it does certainly seem that H.E. does not meet the same level up of uptime as other data centers. It seems to me that there has been a history of problems with H.E. exceeding that of other data centers, and I think it is reasonable to express concern about this.


Top
   
 Post subject:
PostPosted: Mon May 09, 2011 4:20 pm 
Offline
Senior Member

Joined: Sat Feb 14, 2009 1:32 am
Posts: 123
The SANS NewsBytes had a great quote about the Amazon outage that fits perfectly for this situation.

John Pescatore wrote:
Anyone who plans on using cloud without planning on workarounds for outages is not doing their due diligence.


Top
   
 Post subject:
PostPosted: Mon May 09, 2011 6:38 pm 
Offline
Senior Member
User avatar

Joined: Sat Aug 30, 2008 1:55 pm
Posts: 1739
Location: Rochester, New York
carmp3fan wrote:
The SANS NewsBytes had a great quote about the Amazon outage that fits perfectly for this situation.

John Pescatore wrote:
Anyone who plans on using cloud without planning on workarounds for outages is not doing their due diligence.


s/cloud/Internet/ and I'm pretty much on the same page.

The Internet (and the various cloud-computing technologies within) exist on real equipment in a real world. There will be failures. It sucks when they happen, but they will, and usually not in the way you're expecting.

I'm not excusing this outage by any stretch, but, well, it'll happen again. Maybe not Fremont again (although I said that last time, didn't I?!), but electrical power is particularly tricky to do right. I'm personally a big fan of DC, but it's one of those IPv6-like chicken-and-egg problems, except actual real capex is involved and there are no dominant standards yet.

This is anticipated to change this year, though it would be unwise to get your hopes up for immediate adoption, and this is but just one failure mode of many.

(I do find it interesting that the most intricate and failure-prone utility spawns the biggest outrage when it breaks; if a failover router had blown a turboencabulator and seized the common-mode ambaphascient lunar wain shaft, taking out network reachability for a similar period of time, this thread probably would have not gone on this long.)

_________________
Code:
/* TODO: need to add signature to posts */


Top
   
Display posts from previous:  Sort by  
Post new topic  Reply to topic


Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  
RSS

Powered by phpBB® Forum Software © phpBB Group