Linode Forum
Linode Community Forums
 FAQFAQ    SearchSearch    MembersMembers      Register Register 
 LoginLogin [ Anonymous ] 
Post new topic  Reply to topic
Author Message
PostPosted: Fri May 06, 2011 8:02 am 
Offline
Newbie

Joined: Fri May 06, 2011 7:55 am
Posts: 2
Website: http://www.micheas.net/
Location: Oakland CA
From the page at PG&E's website

http://www.pge.com/myhome/customerservice/energystatus/outagemap/

The fremont outage is scheduled for resolution at about 6:15 local time which is 13:15 UTC


Top
   
 Post subject:
PostPosted: Fri May 06, 2011 8:11 am 
Offline
Senior Newbie

Joined: Tue Jun 22, 2010 11:26 pm
Posts: 15
Quote:
7:53am (EDT): Power appears to have been restored and we are working on bringing Linodes up now.


http://status.linode.com/2011/05/outage ... ility.html


Top
   
 Post subject:
PostPosted: Fri May 06, 2011 9:05 am 
Offline
Junior Member

Joined: Fri Apr 22, 2011 11:53 pm
Posts: 29
Downtime lasted 2.2 hrs. I am glad it's back up.


Top
   
 Post subject:
PostPosted: Fri May 06, 2011 10:26 am 
Offline
Senior Member
User avatar

Joined: Tue May 26, 2009 3:29 pm
Posts: 1691
Location: Montreal, QC
Do these facilities not have UPS/genset? It seems relatively common for datacenters (not Linodes' in particular) to suffer from power failures, regardless of if they have backup power or not.


Top
   
 Post subject:
PostPosted: Fri May 06, 2011 11:27 am 
Offline
Senior Newbie

Joined: Wed Jan 20, 2010 6:13 am
Posts: 6
Guspaz wrote:
Do these facilities not have UPS/genset?

I've wondered the same thing, as I recall that there have been several power incidents at the Fremont DC in the last year or two.


Top
   
 Post subject: Power supply
PostPosted: Fri May 06, 2011 12:24 pm 
Offline
Senior Newbie

Joined: Fri Apr 30, 2010 1:06 am
Posts: 6
Hi!

Can someone on Linode weigh-in on this question:

1) Can you confirm that Linode does NOT use UPS services?
2) Can you confirm that Linode does NOT use generator backup?

And therefore since both of those are true, can we assume that Linode will go down even with the slightest of power interruptions?

Thanks,


Top
   
 Post subject:
PostPosted: Fri May 06, 2011 2:42 pm 
Offline
Senior Newbie

Joined: Tue Mar 23, 2004 6:15 pm
Posts: 15
Like pretty much every colocation centre, HE has UPS and generator backup facilities.

Like pretty much every colocation centre, they either failed when needed to deliver the goods, or some previous incident damaged part or all of all of the UPS system, and so the facility was running on street power whilst the UPS systems were being repaired, and then when street power goes, so do the servers.

I'm convinced the highly paid, highly qualified, highly regarded enginers who design these systems are morons, because they have failed to notice that the systems they design and specify fail time and time again.


Top
   
 Post subject:
PostPosted: Fri May 06, 2011 4:33 pm 
Offline
Senior Member
User avatar

Joined: Fri Oct 24, 2003 3:51 pm
Posts: 965
Location: Netherlands
dbuckley wrote:
I'm convinced the highly paid, highly qualified, highly regarded enginers who design these systems are morons, because they have failed to notice that the systems they design and specify fail time and time again.

The last time there was a major outage at HE Fremont 1 (20–21 November 2010), it was caused by a lightning strike that took out a bunch of UPS units. There was another outage on 23 November, caused by a one-second break in utility power that could not be protected against since the UPS systems were under repair.

Designers have to balance the cost of power protection systems against the severity of the events that they can withstand. No system that we would want to pay a part of the costs for will survive a lightning strike on a nearby switching station or utility pole. Also, testing production power protection systems is notoriously difficult; nobody wants to pull a breaker just to see if everything works, but no amount of simulation and half-assed exercises can really prove the system. Let’s wait for the RFO before calling them morons.

That being said, this is the second time in six months that Fremont 1 has had a power outage. I await the RFO with interest. If the cause of the outage was anything less that some kind of natural catastrophe, HE has some explaining to do.

_________________
/ Peter


Top
   
PostPosted: Fri May 06, 2011 7:58 pm 
Offline
Newbie

Joined: Fri Nov 19, 2010 5:41 pm
Posts: 4
Has anyone else noticed that the times/dates of updates on status.linode.com are back dated? ie, that the 9:18AM EDT update showed up sometime between 3PM and 5PM PDT. Is there some totally innocuous reason behind this or is it just to keep up appearances?

-John


Top
   
 Post subject:
PostPosted: Fri May 06, 2011 8:06 pm 
Offline
Senior Newbie

Joined: Tue Mar 23, 2004 6:15 pm
Posts: 15
pclissold wrote:
Let’s wait for the RFO before calling them morons.


Sorry Peter, I failed to make myself clear. I'm not suggesting that the specific folks responsible for the design at HE are more or less morons due to this one incident; I'm saying that decades of experience with datacentres that (if you believe their owners) are as unsinkable as the Titanic says that they simply aren't that good, and their designers are - as a group - morons.

The mistake that all the big datacentres I've seen make is that there is an opinion that there is economy of scale in power systems, and really there isn't; to get the so-called economy of scale cost efficiencies, sacrifices to system integrity are made, and availability suffers as a result.

At the data centre at my place of work (ok, its not a big facility, its about 400KVA, basically a tier 2 facility with tier 3 power, but only a single genset) the first wednesday of every month the supply to the datacentre is pulled (from upstream distribution, not even in the datacentre building) for an hour or two, just to see what happens.

This little datacentre has suffered from the "economy of scale" problem I mentioned above; it was originally comissioned with 200KVA UPSs with 400KVA infrastructure, but the UPSs were upgraded to 400KVA by parallelling another 200KVA set. Paralleled UPSs are less reliable than single UPSs, so additional risk has been accepted for a lower cost upgrade. Only time will tell if this has a deleterious effect on availability.

When the power protection was needed in anger, twice, (the two big earthquakes that damaged Christchurch in New Zealand), with widespread and prolonged utility outages, the datacentre (and all the IT services) didn't miss a beat.

I'm reasonably convinced the datacentre willl survive a lightning strike to the distribution; it is an anticipated possible event (even though it has never happened historically) and the protection is in place in case it should.

But even this little datacentre which was designed by guys (and a girl!) with many years in the high availability power field, responsible for many facilities in London, it still has unfixed flaws. In the early days there was a flaw (now fixed) which caused the cooling systems to shut down, and for a internal power outage to some systems. Despite the fact that these engineers are really nice people, and seem really competent, and have bags of experience and history under their collective belts, they still made design errors I was seeing twenty years ago.

And that is largely the reason I call this group collectively morons. They aren't learning from history, they are to this day building systems with the same shortcomings that we discovered 20 years ago that we know will lead to outages.

/rant


Top
   
PostPosted: Fri May 06, 2011 9:54 pm 
Offline
Senior Member

Joined: Thu May 21, 2009 3:19 am
Posts: 336
john.bloom wrote:
Has anyone else noticed that the times/dates of updates on status.linode.com are back dated? ie, that the 9:18AM EDT update showed up sometime between 3PM and 5PM PDT. Is there some totally innocuous reason behind this or is it just to keep up appearances?

-John


I don't understand what you're talking about. http://status.linode.com/ is hosted at http://www.sixapart.com/ on the West coast, somewhere around Oakland. The blog post for 9:18 AM EDT showed up at 6:18 AM PDT. So the different posts showed up between 3:12 AM PDT and 6:18 AM PDT, or adding 3 hours (the offset between EDT and PDT), 6:12 AM EDT and 9:18 AM EDT.


Top
   
 Post subject:
PostPosted: Fri May 06, 2011 10:23 pm 
Offline
Newbie

Joined: Fri Nov 19, 2010 5:41 pm
Posts: 4
I didn't mean to get timezones confused in this. What I wanted to know was whether other people were seeing updates to status.linode.com appear long after the time they were posted. At this point I've seen it on more than one internet connection, on Firefox, Chrome and Safari.

-John


Top
   
 Post subject:
PostPosted: Fri May 06, 2011 10:31 pm 
Offline
Senior Member

Joined: Thu May 21, 2009 3:19 am
Posts: 336
Ah, looking at other posts, I see. Their timezone is correct (EDT/EST). They made the Fremont post at 6:18 AM EDT with the first update for the 6:12 AM EDT. There was just a coincidence that they made the original post exactly 3 hours before the issue was resolved.

Looks like Typepad doesn't change the dates of posts when you go back and edit them.


Top
   
 Post subject:
PostPosted: Fri May 06, 2011 10:52 pm 
Offline
Senior Newbie

Joined: Fri Oct 22, 2010 4:13 am
Posts: 6
Website: http://kenyonralph.com
Location: California
Jeez, would be nice if Linodes automatically booted after failures like this so they aren't down all day until I can log in to the Manager and press the Boot button.


Top
   
 Post subject:
PostPosted: Fri May 06, 2011 10:55 pm 
Offline
Senior Member

Joined: Thu May 21, 2009 3:19 am
Posts: 336
They do. Have you enabled Lassie? Log into the Linode manager, click on the Settings tab for your node.

It's been so long since I looked, I can't recall if it's enabled by default or not.

Description:
Quote:
Lassie is a Shutdown Watchdog that monitors your Linode and will reboot it if it powers off unexpectedly. It works by issuing a boot job when your Linode powers off without a shutdown job being responsible.
To prevent a loop, Lassie will give up if there have been more than 5 boot jobs issued within 15 minutes.


Top
   
Display posts from previous:  Sort by  
Post new topic  Reply to topic


Who is online

Users browsing this forum: No registered users and 4 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  
RSS

Powered by phpBB® Forum Software © phpBB Group