Linode Forum
Linode Community Forums
 FAQFAQ    SearchSearch    MembersMembers      Register Register 
 LoginLogin [ Anonymous ] 
Forum locked  This topic is locked, you cannot edit posts or make further replies.
Author Message
 Post subject:
PostPosted: Tue May 20, 2008 9:45 pm 
Offline
Senior Member
User avatar

Joined: Tue Aug 17, 2004 11:37 pm
Posts: 262
Website: http://www.our-lan.com
WLM: nf@our-lan.com
Location: Brisbane, Australia
For reference, all the jobs that are 1974, are there to ensure they run before anything else.. The date is the Epoch of Caker.

_________________
ServerAdmin - www.our-lan.com
"Diplomacy is the art of saying nice doggy whilst looking for a really big stick"
"In my experiece, any attempt to make any system idiot proof will only challenge God to make a better idiot"


Top
   
 Post subject:
PostPosted: Tue May 20, 2008 9:49 pm 
Offline
Senior Member

Joined: Thu Apr 03, 2008 12:02 am
Posts: 103
AOL: derole
ah, ok, thought it was an init problem

btw, now it says the restart failed but the sytem is running (which i can confirm, i'm on the box via ssh).


Last edited by oliver on Wed May 21, 2008 8:50 am, edited 1 time in total.

Top
   
 Post subject:
PostPosted: Tue May 20, 2008 10:04 pm 
Offline
Linode Staff
User avatar

Joined: Tue Apr 15, 2003 10:32 pm
Posts: 246
Location: NJ, USA
I'd just like to point out that *all* data centers experience issues from time to time. While I am not suggesting Linode is ok with this recent outage, it is part of the business. Dallas and Fremont have both had their share of power and network problems in the past.

I think when evaluating any provider, the important thing to consider is how well the issue was communicated and how fast it was resolved.

We will accept any requests to be migrated to a different facility, however we do not suggest this outage as being the reason behind it.

Regards,
-Tom


Top
   
 Post subject:
PostPosted: Tue May 20, 2008 10:43 pm 
Offline
Linode Staff
User avatar

Joined: Tue Apr 15, 2003 10:32 pm
Posts: 246
Location: NJ, USA
RFO from AtlantaNAP:

Quote:
Severe storm cells came through North Georgia Region this evening. AtlantaNAP experienced an over current fault outage on one of our 2 main feeds. The feed is the original feed that has the most load currently connected to it. The amount of systems connected to the load is the amount of lightning and over current that will try to be passed to the system – i.e. if you don’t have very much load on it - like our new feed is currently only at 1/6th load - then current does not try to flow to it very much. Our first system is currently at 65% load so it tried to absorb much more of the lightning strike than the other one and hence the main breaker going into over current fault.

I have spoken with all of our key electrical engineers associated with the building at this point. According to Georgia power / our PSSI and Cummins engineers – we likely took a lightning strike to the utility very near the facility which caused an over current fault on our main incoming breaker on our first set of switchgear. The breaker is designed to trip in the event of this kind of fault to protect the gear (your computers) inside the building from being burned up by the lightning strike.

When this type of fault happens - the computer will not start the generators until an engineer verifies where the fault is. This is because a fault inside the wiring plant could also cause this kind of over current in the event of a main short if a feeder wire of main current in the building were to become damaged.
In that case it would be very dangerous to turn the power back on manually or to force a manual start of the gen sets and push current to the system with a fault remaining. Lives and machinery could be lost.

We dispatched several of our staff visually to inspect for faults – (we did not want to turn something on and have it fry everyone’s gear) and found none and verified it was likely a lightning strike and manually started the generators to restore power. Unfortunately the ups system is only designed to carry that load for 10 minutes which was not enough time for us to safely verify and do a manual start.

This is apparently a rare event – to get a direct utility strike like this – that close that does not get dissipated before it hits us. The farther away from your site the strike occurs - the more other load and grounds it has to dissipate before it gets to you.

The good news is we did not burn up any equipment.

Some of you did not lose power because you were connected to the other lightly loaded feed coming in and it was not enough load source to overwhelm the breaker since it is only 18% loaded at this point.

Some of you lost network connectivity because downstream feeder switches that your computers are connected to are only single power supply units.

They have told me that under normal operating conditions there is really nothing we could have done and we should simply be glad we had good equipment installed that kept our computers from being fried.


Top
   
 Post subject:
PostPosted: Tue May 20, 2008 11:05 pm 
Offline
Senior Newbie

Joined: Tue Apr 29, 2008 8:31 pm
Posts: 7
In short: due to random unidentified circumstances that might be lightening related our automated procedures were bypassed for safety.

Not abnormal at all given the circumstances.

What is important is how well the DC manages load on its cooling and power - the past few years as IT has started to grow by leaps and bounds (web 2.0 = scalability) a *lot* of facilities have done poor jobs managing capacity. Now they are scrambling to keep customers has the top tier of providers bring more and more capacity online.

(not saying thats the case here, just been the trend over the past 2-3 years)

If uptime needs to be above 3 nines then you should be geographically redundant. Cost savings might dictate otherwise, but the reality is you can have any two: cheap, scalable, redundant/resilient/stable.

The common approach these days is to go scalable and cheap while trying to get around the stability issue by overbuilding -- which as google is showing is possible with proper engineering however the facilities bit is whats usually missing. (aka the forgotten cost)

/ramble


Top
   
 Post subject:
PostPosted: Wed May 21, 2008 1:55 am 
Offline
Senior Member

Joined: Wed May 16, 2007 12:46 am
Posts: 71
tasaro wrote:
I think when evaluating any provider, the important thing to consider is how well the issue was communicated and how fast it was resolved.


I agree and have no complaints in regards to that. My ticket was responded to very quickly and my node was up shortly thereafter. I can't say that I've ever had such a positive support experience with a provider before (and I've had my share).

Thanks!


Top
   
 Post subject:
PostPosted: Wed May 21, 2008 3:53 am 
Offline
Senior Member

Joined: Mon May 14, 2007 8:20 am
Posts: 81
It would be nice to have a fourth choice of data center, maybe in New England, Chicago, or Canada, or Western Europe, or in Asia, so that failover systems can be more effective.


Top
   
 Post subject:
PostPosted: Wed May 21, 2008 5:36 am 
Offline
Junior Member

Joined: Wed May 21, 2008 5:34 am
Posts: 46
Website: http://www.eve-razor.com/forum
Location: Austin, Tx
tasaro wrote:
RFO from AtlantaNAP:



I just want to say I am very impressed with this. I work for Cisco ROS and we have to deal with vendors and carriers all day. And day after day I have to make phone calls for RFO and get utter crap fed to me by them..


It gives me a warm fuzzy feeling to see such a detailed RFO :)


Top
   
 Post subject:
PostPosted: Wed May 21, 2008 10:39 am 
Offline
Senior Member
User avatar

Joined: Sat Jul 01, 2006 7:36 am
Posts: 50
Location: Ghent, Belgium
jcr wrote:
It would be nice to have a fourth choice of data center, maybe in New England, Chicago, or Canada, or Western Europe, or in Asia, so that failover systems can be more effective.
I'm from Belgium, so I would find It great! But I don't think there will come a fourth choice of data center in Western Europe. Because the higher costs of the bandwith and the €. And the all of the Linode staff lives in America..


Top
   
Display posts from previous:  Sort by  
Forum locked  This topic is locked, you cannot edit posts or make further replies.


Who is online

Users browsing this forum: No registered users and 0 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  
RSS

Powered by phpBB® Forum Software © phpBB Group