Linode Forum
Linode Community Forums
 FAQFAQ    SearchSearch    MembersMembers      Register Register 
 LoginLogin [ Anonymous ] 
Post new topic  Reply to topic
Author Message
 Post subject: Data Centre Outages
PostPosted: Wed Jan 09, 2008 5:40 pm 
Offline
Senior Member
User avatar

Joined: Wed Jan 24, 2007 12:04 am
Posts: 90
Website: http://www.smiffysplace.com
Location: Rural South Australia
In view of the recent issues at Atlanta...

A little while ago, we were inconvenienced by a series of DDoS attacks on Hurricane Electric (Fremont). I'm sure that The Planet (Dallas) has also had its problems.

The simple fact is that you will never get 100% uptime, anywhere. Buildings burn down (or in the case of HE, fall into the St Andreas fault ;-)) and other disasters can occur.

No matter how many redundant anythings you have in a data centre, you will get outages. If we, as users, have mission-critical sites/applications running, it falls to us to provide contingencies on top of those provided by Chris and Team, data centre staff, etc.

My approach is to have a redundant Linode but NOT in the same data centre. I don't have an automatic failover system but dump my databases HE databases every night and transfer them over to The Planet. Likewise, any uploaded files get rsync'd over. These dumps/files are also copied down to the server in my office as part of the process. Hey, if we lost the USA, I could run the whole lot off my laptop although I wouldn't like to say what sort of shape the InterWeb would be in ;-)

If I get an outage that looks like it's going to persist, I load up the databases from the dumps at The Planet, change DNS (I have a short TTL set) and about half an hour later, am running on the secondary.

There are more elegant ways in which this can be done - multiple replicated databases, round-robin DNS, etc., but these are not something that I or my clients (all small businesses) can afford.

What we should not do is to turn round and blame Linode when we ourselves have failed to identify and make contingencies for a single point of failure in a critical system.


Top
   
 Post subject:
PostPosted: Wed Jan 09, 2008 5:45 pm 
Offline
Senior Member
User avatar

Joined: Mon Dec 10, 2007 4:30 pm
Posts: 341
Website: http://markwalling.org
I nominate this for sticky status


Top
   
 Post subject:
PostPosted: Wed Jan 09, 2008 5:46 pm 
Offline
Junior Member

Joined: Tue Sep 25, 2007 3:04 pm
Posts: 27
I agree completely, you can never be "too" careful with regards to things like this.

This was pretty much entirely out of linode's hands. They handled everything they could as well as they could. The server failed and it was replaced.

However, let me put this clearly.

With the exception of a major catastrophe or barring other extremely rare circumstances, what happened with the power at the Atlanta Datacenter is completely unacceptable.

Rackspace had an outage a while back but that was a big combination of multiple points of failing as well as on the part of the electric company.

A few batteries failing in the UPS system is unacceptable especially given their excuse that they check the thing every 6 months.

I work in a datacenter and today at the biweekly meetings we have on updates regarding happenings in our datacenter (any issues, outages, coverage issues with technicians) and we laughed at the reasons given by Atlanta for their problems.

Needless to say, it's generally most businesses that try to cut corners like they did--but it has now bit them in the ass (like it will at any point to any business that tries to do it).


Top
   
 Post subject:
PostPosted: Wed Jan 09, 2008 5:59 pm 
Offline
Senior Member

Joined: Tue Apr 27, 2004 5:10 pm
Posts: 212
Bravo!


Top
   
 Post subject:
PostPosted: Wed Jan 09, 2008 6:39 pm 
Offline
Senior Member

Joined: Mon May 14, 2007 8:20 am
Posts: 81
One of the reasons, we may use linode.com is their ability to assess a datacenter. I'd like to know their thoughts on their supplier from Atlanta. Are they going to dump that supplier? Does linode think the Atlanta Dadacenter guys were just unlucky? Incompetent?

On a more positive note, I love that linode.com is so open about those problems and uses the forum to communicate.


Top
   
 Post subject:
PostPosted: Wed Jan 09, 2008 6:49 pm 
Offline
Senior Newbie

Joined: Sat Jun 30, 2007 7:01 pm
Posts: 5
Website: http://www.silverblade.co.uk
Location: Oxfordshire, UK
A-KO wrote:
A few batteries failing in the UPS system is unacceptable especially given their excuse that they check the thing every 6 months.


I just re-read the copy of the message posted in the thread in the announcement announcement section, and noticed the way they worded this at the end:

Quote:
we are increasing the battery pm schedule to monthly from biannual.


I always thought bi-annual meant every other year. Like bi-weekly means every other week. But it makes sense to a degree if they meant twice a year ;)

_________________
Silver Blade
www.silverblade.co.uk


Top
   
 Post subject:
PostPosted: Wed Jan 09, 2008 6:53 pm 
Offline
Senior Member
User avatar

Joined: Wed Jan 24, 2007 12:04 am
Posts: 90
Website: http://www.smiffysplace.com
Location: Rural South Australia
Biennial = every other year
Bi-annual = twice per year

- and I hope that the DC has got this the right way round!


Top
   
 Post subject:
PostPosted: Wed Jan 09, 2008 7:26 pm 
Offline
Senior Member
User avatar

Joined: Thu Jun 21, 2007 7:13 pm
Posts: 100
Website: http://neo101.org
Will the disk image cloning feature of Linode work if I move one of my Linodes to a different datacenter? And will a cloning be "free" in terms of the traffic per month limit? Having your eggs in different baskets seems like a good idea.


Top
   
 Post subject:
PostPosted: Wed Jan 09, 2008 7:56 pm 
Offline
Senior Member
User avatar

Joined: Tue Apr 13, 2004 6:54 pm
Posts: 833
"Like bi-weekly means every other week."

This is the problem with the "bi-" terminology. Bi-weekly could, logically, mean "twice a week". Indeed, in England, it's more likely to mean that because we have the perfectly usable word "fortnightly" to mean "every 2 weeks".

Ain't the English language fun!

_________________
Rgds
Stephen
(Linux user since kernel version 0.11)


Top
   
 Post subject:
PostPosted: Thu Jan 10, 2008 2:07 pm 
Offline
Linode Staff
User avatar

Joined: Tue Apr 15, 2003 10:32 pm
Posts: 246
Location: NJ, USA
harmone wrote:
Will the disk image cloning feature of Linode work if I move one of my Linodes to a different datacenter? And will a cloning be "free" in terms of the traffic per month limit? Having your eggs in different baskets seems like a good idea.


Yes, you are able to clone images between data centers. Using the cloning utility does not accumulate against your monthly bandwidth quota.

-Tom


Top
   
 Post subject:
PostPosted: Thu Jan 10, 2008 2:14 pm 
Offline
Linode Staff
User avatar

Joined: Tue Apr 15, 2003 10:32 pm
Posts: 246
Location: NJ, USA
jcr wrote:
One of the reasons, we may use linode.com is their ability to assess a datacenter. I'd like to know their thoughts on their supplier from Atlanta. Are they going to dump that supplier? Does linode think the Atlanta Dadacenter guys were just unlucky? Incompetent?

On a more positive note, I love that linode.com is so open about those problems and uses the forum to communicate.


The dust is still settling here and we have yet to form a final opinion. We did visit the data center in person before deploying and we were impressed with it. They were also extremely reliable over the past year, with the last month aside. Two things did happen yesterday as a result of this, however:

1 - Ten brand new hosts were to be picked up yesterday morning by FedEx, on their way to Atlanta. FedEx was turned around and these hosts will now go to Dallas.

2 - We signed a new contract which effectively doubles our cage size in Dallas.


-Tom


Top
   
 Post subject:
PostPosted: Thu Jan 10, 2008 4:08 pm 
Offline
Senior Member
User avatar

Joined: Sat Mar 24, 2007 6:09 pm
Posts: 59
Location: South Africa
Hi,

Anybody thinking that they'd want to move from Atlanta must think twice. Especially after the incident. Here's why I think so:

Atlanta has been very reliable over the past while, but an unfortunate incident occurred which seems to be vary rare. Their backup systems *are* very impressive, but you can be sure that because of this incident they are now especially vigilant, perhaps even more so than at another DC at which something like this is just waiting to happen.

--deckert


Top
   
 Post subject:
PostPosted: Mon Jan 14, 2008 12:33 pm 
Offline
Linode Staff
User avatar

Joined: Tue Apr 15, 2003 10:32 pm
Posts: 246
Location: NJ, USA
On Friday we had a very lengthy conversation with Jeff Hinkle, president of the facility we use in Atlanta. Our feeling is that the reliability we came to expect from them has been restored.

A few key points from our conference call:

The intermittent network troubles we've been experiencing since early December have been caused by their Global Crossing backbone. Atlanta has been working with Global Crossing over the past month to troubleshoot the problem with no luck. It may not sound like much to you or me, but shutting off a backbone is a big deal to a datacenter. Not only in terms of the additional bandwidth now going over the other providers, but from a financial aspect as well. On Thursday afternoon Jeff made the decision to shut Global Crossing down. Since then we've noticed increased throughput and no latency issues. Hopefully this sticks. They are also adding Level3 to their network on February 1st.

Regarding the power outage, they have already implemented steps to help prevent this from happening again. They have purchased their own testing equipment and started conducting tests on a more frequent basis (daily and monthly, in addition to their regularly scheduled vendor maintenance). They have also installed a third string of batteries. When Chris and I toured the data center prior to deploying there, we were impressed with their redundant power configuration. Their cooling system is also rigged to their generators, to prevent overheating in such an event (I think I read somewhere that a data center has ~ 15 minutes before an unacceptable operating temperature is reached and they need to start shutting equipment down -- not so in Atlanta).

Outages (network, power, hardware failure, etc) are inevitable at any data center - no matter how many 9's they stick in their SLA. I think Atlanta just had a series of compounding issues, which we believe are now resolved.

-Tom


Top
   
 Post subject:
PostPosted: Mon Jan 14, 2008 1:11 pm 
Offline
Senior Member
User avatar

Joined: Fri Oct 24, 2003 3:51 pm
Posts: 965
Location: Netherlands
Before everyone raises a ticket to leave the Atlanta datacenter, it's worth remembering that there was a massive, 8-hour power outage at the Dallas datacenter on 31 March 2005, again caused by multiple failures and/or configuration errors in a (supposedly) redundant power system. ThePlanet presumably learned their lesson, because they have been rock steady ever since.

_________________
/ Peter


Top
   
Display posts from previous:  Sort by  
Post new topic  Reply to topic


Who is online

Users browsing this forum: No registered users and 5 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  
RSS

Powered by phpBB® Forum Software © phpBB Group