Linode Forum
Linode Community Forums
 FAQFAQ    SearchSearch    MembersMembers      Register Register 
 LoginLogin [ Anonymous ] 
Forum locked  This topic is locked, you cannot edit posts or make further replies.
Author Message
PostPosted: Sun Jun 19, 2005 11:48 am 
Offline
Linode Staff
User avatar

Joined: Fri Oct 17, 2003 12:38 am
Posts: 287
Location: Dr Wierd's Lab, South Jersey Shore
This morning hosts 39-43 went down due to a power issue. ThePlanet hooked these hosts to a power strip rather then the remote power unit they were connected to causing a breaker to blow. This caused Hosts 27-43 to lose power as well as the uplink switch used by our other cabinets at ThePlanet. Durring the power outage, all Linodes at ThePlanet were unreachable. Hosts 27-43 have been restored and Linodes on these hosts just be booted or finishing booting with the exception of host 35 and 37. I am still waiting on these two hosts and will post another update regarding them.

Michael


Top
   
 Post subject:
PostPosted: Sun Jun 19, 2005 12:05 pm 
Offline
Linode Staff
User avatar

Joined: Fri Oct 17, 2003 12:38 am
Posts: 287
Location: Dr Wierd's Lab, South Jersey Shore
Linodes on hosts 35 and 37 should now be comming up.

Michael


Top
   
 Post subject:
PostPosted: Sun Jun 19, 2005 4:21 pm 
Offline
Senior Newbie

Joined: Mon Oct 27, 2003 5:33 pm
Posts: 17
This needs to stop happening. There is way too much frequency with these power outtages.

-jbl


Top
   
 Post subject:
PostPosted: Sun Jun 19, 2005 7:53 pm 
Offline
Junior Member

Joined: Fri Mar 18, 2005 11:04 pm
Posts: 32
AOL: surferdude18213
Location: the ssh window
My TP node at host4 wasnt affected... I guess that I am just lucky :)

_________________
-- Surferdude


Top
   
 Post subject:
PostPosted: Sun Jun 19, 2005 10:07 pm 
Offline
Senior Newbie

Joined: Sat Feb 12, 2005 6:33 pm
Posts: 6
I am on host 42 and while my CP shows that the Linode is running, my websites are not available and neither can I connect through SSH. When I log in through Lish, I get the following errors upon rebooting:

Your system appears to have shut down uncleanly
Press Y within 1 seconds to force file system integrity check...
Checking root filesystem
set_thread_area failed when setting up thread-local storage
[/sbin/fsck.ext3 (1) -- /] fsck.ext3 -a /dev/ubda

This looks like it happened as a result of the power failure. Any ideas about how I can bring my Linode back to life?


Top
   
 Post subject:
PostPosted: Sun Jun 19, 2005 10:46 pm 
Offline
Linode Staff
User avatar

Joined: Tue Apr 15, 2003 6:24 pm
Posts: 3090
Website: http://www.linode.com/
Location: Galloway, NJ
Yeah, I'm working on it per your ticket request .. I'll update it in a few.

-Chris


Top
   
 Post subject:
PostPosted: Sun Jun 19, 2005 11:14 pm 
Offline
Linode Staff
User avatar

Joined: Tue Apr 15, 2003 6:24 pm
Posts: 3090
Website: http://www.linode.com/
Location: Galloway, NJ
Just wanted to follow up on the power outage.

What happened was that a breaker blew on one of our remote power control units. This same unit's breaker blew some time ago, and I moved a host off of it to reduce the amperage load on that RPC. That should have solved the problem. What I suspect is even with the reduced load, this unit has a touchy breaker, and so it blew again. We could have easily done something about this -- either by moving one or two hosts off this unit onto another on the same circuit, or by powering down one of the currently unused hosts on this circuit until we could move it. But, here's the part where ThePlanet made things worse:

We filed a ticket with TP to have them reset the RPC unit. They power cycled the RPC unit and when the machines didn't come back online they disconnected the power plugs from the RPC and plugged them into the other two 20A circuits we have in that cabinet. Those two 20A circuits are already loaded with sets of hosts. Can you guess what happened next? Both of the remaining 20A breakers tripped (theirs, not ours). Their tech didn't ask us to change our hardware configuration, something I have a MAJOR issue with, nor did she know NOT to use already loaded circuits.

This cabinet has the main switch that feeds the other two cabinets of ours, so that's why the other two cabinets were affected. I'm going to set up STP (Spanning Tree Protocol) in a loop, with each feed from ThePlanet's HSRP set up going to two switches, so in the event that one cabinet goes down, there's no chance that network connectivity would be affected for the other two.

I have a ticket in with ThePlanet to make sure they don't mess with our hardware configuration without being instructed to do so, something I think should be their policy in the first place. I'm pretty upset with ThePlanet for droppig the ball like this

Sorry this had to happen, but .. live and learn. We'll do our best to prevent something like this from happening again.

-Chris


Top
   
 Post subject:
PostPosted: Sun Jun 19, 2005 11:15 pm 
Offline
Linode Staff
User avatar

Joined: Tue Apr 15, 2003 6:24 pm
Posts: 3090
Website: http://www.linode.com/
Location: Galloway, NJ
Forgot to mention, the twitchy RPC unit has been bypassed, so it's out of the picture for now.

-Chris


Top
   
 Post subject:
PostPosted: Mon Jun 20, 2005 6:07 am 
Offline
Senior Newbie

Joined: Wed Jul 21, 2004 10:49 am
Posts: 5
surferdude wrote:
My TP node at host4 wasnt affected... I guess that I am just lucky :)


I agree... Last week this happened twice in the spand of 3 days or so. It is also impossible to find out what is going on because linode.com main site was down as well.

Perhaps some sort of a secondary static backup page should be set up that we can all go to. The page shouldn't be hosted on linode.


Top
   
 Post subject:
PostPosted: Mon Jun 20, 2005 9:14 am 
Offline
Junior Member

Joined: Thu Apr 21, 2005 12:41 pm
Posts: 43
Website: http://www.jamesl.info
WLM: sipherx@gmail.com
Yahoo Messenger: sipherx598
AOL: sipherx1023
Location: Florida
I know there is a way to use mysql as master and a slave server, so whenever the master db is updated it automatically updates the slave server. So what you could do (if u use mysql) you could have linode's entire mysql db updating a slave db on an entirely diffferent network and server and whenever the linode.com site goes down, just change the dns to the new server, and it would be an exact up to date copy. Now what you could do from there, is set restrictions on this site, to not allow people to loging to the Members section, since their linode could or could not be down, and things like that. Just a suggestion, I could be way off, you guys may not even use mysql.

_________________
James Lenhart.


Top
   
 Post subject:
PostPosted: Mon Jun 20, 2005 9:33 am 
Offline
Senior Newbie

Joined: Fri Apr 01, 2005 11:24 am
Posts: 8
Sipherx wrote:
I know there is a way to use mysql as master and a slave server, so whenever the master db is updated it automatically updates the slave server. So what you could do (if u use mysql) you could have linode's entire mysql db updating a slave db on an entirely diffferent network and server and whenever the linode.com site goes down, just change the dns to the new server, and it would be an exact up to date copy. Now what you could do from there, is set restrictions on this site, to not allow people to loging to the Members section, since their linode could or could not be down, and things like that. Just a suggestion, I could be way off, you guys may not even use mysql.


So what you're saying, in more words is:
1. Become a customer of someone other than Linode for your "backup."
2. Most linode outages are long enough to exceed your DNS TTL.

As for #1, I sure don't want to bring in another provider until it's absolutely necessary. It may be approaching that.

As for #2, my DNS TTLs are mostly around one day, in some cases longer. My DNS zones are quite static. If there was a linode outage longer than a day, your approach would probably not be inaccurate -- I'd be migrating my mySql servers to another provider for good.

Neither address the fact that it seems Linode is using a lot of equipment that shouldn't be in a for-profit environment. A *known* flaky power unit shouldn't be left in service. ATA/SATA drives might not be the best choice for these servers if: 1. they have a documented higher failure rate than SCSI/FC, and 2. the RAID controllers used in linode servers do not support hotswap replacements or rebuilding. (A failed disk led to two recent outages because of the fix procedure provided by linode.)


Top
   
 Post subject:
PostPosted: Mon Jun 20, 2005 12:24 pm 
Offline
Junior Member

Joined: Fri Mar 18, 2005 11:04 pm
Posts: 32
AOL: surferdude18213
Location: the ssh window
ThePlanet has been going down hill lately from what I have heard. It almost seems like they are under new management.

_________________
-- Surferdude


Top
   
 Post subject:
PostPosted: Mon Jun 20, 2005 1:27 pm 
Offline
Senior Newbie

Joined: Sat Feb 12, 2005 6:33 pm
Posts: 6
Just an update to mention that my problem was due to some libraries that were preventing a clean boot of my Linode, and it has been resolved now.

Thanks again, Chris.


Top
   
 Post subject:
PostPosted: Mon Jun 20, 2005 1:30 pm 
Offline
Junior Member

Joined: Thu Apr 21, 2005 12:41 pm
Posts: 43
Website: http://www.jamesl.info
WLM: sipherx@gmail.com
Yahoo Messenger: sipherx598
AOL: sipherx1023
Location: Florida
Quote:



So what you're saying, in more words is:
1. Become a customer of someone other than Linode for your "backup."
2. Most linode outages are long enough to exceed your DNS TTL.

As for #1, I sure don't want to bring in another provider until it's absolutely necessary. It may be approaching that.

As for #2, my DNS TTLs are mostly around one day, in some cases longer. My DNS zones are quite static. If there was a linode outage longer than a day, your approach would probably not be inaccurate -- I'd be migrating my mySql servers to another provider for good.

Neither address the fact that it seems Linode is using a lot of equipment that shouldn't be in a for-profit environment. A *known* flaky power unit shouldn't be left in service. ATA/SATA drives might not be the best choice for these servers if: 1. they have a documented higher failure rate than SCSI/FC, and 2. the RAID controllers used in linode servers do not support hotswap replacements or rebuilding. (A failed disk led to two recent outages because of the fix procedure provided by linode.)


Dude read my post, I wasnt talking about you guys using this method as an alternative I was talking about linode.com's website.. and if u read above someone had said maybe linode should make a static site not hosted on a linode. Read shit before you start flaming people.

_________________
James Lenhart.


Top
   
 Post subject:
PostPosted: Sun Jun 26, 2005 3:44 am 
Offline
Senior Member

Joined: Sat Jun 05, 2004 12:49 am
Posts: 333
lol almost makes me wish I took that job at ThePlanet anyways :( THen I could be beating on cakers boxes for him lol


Top
   
Display posts from previous:  Sort by  
Forum locked  This topic is locked, you cannot edit posts or make further replies.


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  
RSS

Powered by phpBB® Forum Software © phpBB Group