Linode Forum
Linode Community Forums
 FAQFAQ    SearchSearch    MembersMembers      Register Register 
 LoginLogin [ Anonymous ] 
Post new topic  Reply to topic
Author Message
 Post subject: Fremont reliablilty
PostPosted: Mon Aug 08, 2011 1:55 am 
Offline
Junior Member

Joined: Mon Feb 22, 2010 9:40 pm
Posts: 37
Creating a new thread for this to get a bit more visibility... We need a response from linode on what they are going to do about the HE datacenter.

I know that it's not linode's equipment that's failing, but in the end it doesn't matter, It's linode's reputation and service level which is going down the tubes at this DC.

At this point, The usual response of "We're talking to them about what they are doing to prevent this" isn't good enough - it clearly didn't work to solve the issues after the last 3 outages. There needs to be some real steps taken to improve the service from fmt1. I want to see a full explanation of how a single breaker can take the whole DC down, and exactly what is being done to fix what is clearly a systemic issue, not isolated incidents.

I don't really want to move DC, because (as seems to be the case with most of the people who haven't moved), my clients are in Australia/NZ so the lower RTT is advantageous. So this could probably be resolved best by Linode getting space in another DC with good latency to these areas.

Jordan


Top
   
 Post subject:
PostPosted: Mon Aug 08, 2011 10:19 am 
Offline
Senior Member
User avatar

Joined: Tue May 26, 2009 3:29 pm
Posts: 1691
Location: Montreal, QC
At this point it almost seems like Linode's only solution if they want to stay in the same facility is to invest in their own in-rack UPS systems, but boy can that get insanely expensive (and bulky) when you need at least two or more hours of runtime to survive all the HE outages.

It's a bit silly, it seems that the HE facility loses power every single time there's a power outage in fremont. It's like their UPS/genset is useless.

EDIT: Which is particularly worrying what with this:

http://hardware.slashdot.org/story/11/0 ... lar-Storms

Before you laugh at the idea of solar storms causing issues, in 1989, solar storms caused HydroQuebec's network (which, like Texas, has its own interconnect) to lose power for 9 hours.

Basically, the storms caused a surge in the region where most of the generation capacity was, taking out 9.6 gigawatts of capacity, which was about half the capacity at the time. The sudden loss of so much capacity caused the network to begin load-shedding by shutting off parts of the network, which caused voltage swings that took out the rest of the generating capacity.

The network has since been hardened against this sort of thing, but not everywhere in North America is...

EDIT2: Of course, as was pointed out in another thread, in-rack UPS won't keep HE's network equipment up, but if the Linodes never went down, the downtime would be reduced to just the actual length of the power outage and not the power outage plus the 2+ hours it takes to bring all linodes back up (not to mention hardware damage and data loss from sudden shutdowns)


Top
   
 Post subject:
PostPosted: Mon Aug 08, 2011 12:34 pm 
Offline
Senior Member
User avatar

Joined: Fri Dec 11, 2009 7:09 pm
Posts: 168
An aside- http://www.zdnet.com/blog/saas/lightnin ... ag=nl.e539
Quote:
Summary: A lightning strike last night knocked out servers at Amazon’s only European data center and the provider has warned some of those affected face delays of up to two days before they get back online.


While nothing guards 100% against a lightning strike, it seems that hardware vendors would be making more servers like Google's, which have on board battery backup.

_________________
--
Chris Bryant


Top
   
 Post subject:
PostPosted: Mon Aug 08, 2011 2:06 pm 
Offline
Senior Member
User avatar

Joined: Sat Aug 30, 2008 1:55 pm
Posts: 1739
Location: Rochester, New York
bryantrv wrote:
While nothing guards 100% against a lightning strike, it seems that hardware vendors would be making more servers like Google's, which have on board battery backup.


Batteries are big and tend to require replacement every few years, so in-server batteries are probably not too feasible in this situation. With the hardware Linode uses, there's barely room for a RAID BBU.

I think the best compromise nowadays is probably Facebook's Open Compute Project, but this is probably unrealistic for general-purpose multi-tenant datacenters.

_________________
Code:
/* TODO: need to add signature to posts */


Top
   
 Post subject:
PostPosted: Tue Aug 16, 2011 8:35 pm 
Offline
Junior Member

Joined: Mon Feb 22, 2010 9:40 pm
Posts: 37
Wow, another outage. And didn't get lucky with this one being in the middle of the night for us GMT+12 people.


Top
   
 Post subject:
PostPosted: Tue Aug 16, 2011 8:37 pm 
Offline
Senior Member

Joined: Tue Jun 21, 2011 4:25 pm
Posts: 118
Website: http://www.alohatone.com
Location: Hawaii
I think HE.net said something about a bad UPS which they're looking at changing out.


Top
   
 Post subject:
PostPosted: Tue Aug 16, 2011 8:38 pm 
Offline
Junior Member

Joined: Mon Feb 22, 2010 9:40 pm
Posts: 37
I certainly hope their doing a lot more than look at it with how long the power issues have been going on in this DC...


Top
   
 Post subject:
PostPosted: Tue Aug 16, 2011 8:42 pm 
Offline
Senior Member

Joined: Tue Jun 21, 2011 4:25 pm
Posts: 118
Website: http://www.alohatone.com
Location: Hawaii
Here is the HE report from 9 days ago : http://prgmr.com/~lsc/incident08072011.pdf


Top
   
Display posts from previous:  Sort by  
Post new topic  Reply to topic


Who is online

Users browsing this forum: No registered users and 3 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  
RSS

Powered by phpBB® Forum Software © phpBB Group