Linode Forum
Linode Community Forums
 FAQFAQ    SearchSearch    MembersMembers      Register Register 
 LoginLogin [ Anonymous ] 
Post new topic  Reply to topic
Author Message
PostPosted: Thu Oct 28, 2010 11:26 am 
Offline
Senior Newbie

Joined: Fri Oct 15, 2010 8:09 pm
Posts: 6
Hi there,

A strange issue is (seemingly) randomly occuring. After a while of running in the HA configuration (from here) just fine, the servers just stop talking to each other.

If ha1 is the master, ha2 starts trying to take over the configured services. It tries to bring up the "floating" IP which it manages (and thus causes all my sites to go down), but it can't mount the DRBD drive as it's already mounted on ha1.

On ha1, crm_mon shows that ha1 is online and ha2 is OFFLINE.
On ha2, crm_mon shows that ha2 is online and ha1 is OFFLINE.

I'm not particularly sure on what logs I should be looking at, so if anyone could help that'd be appreciated.

Rebooting ha2 seems to work fine so I'm guessing it might be something to do with that server... I have not tried it the other way around yet.


Top
   
 Post subject:
PostPosted: Thu Nov 04, 2010 8:51 am 
Offline
Senior Newbie

Joined: Fri Oct 15, 2010 8:09 pm
Posts: 6
This happened again this morning. I asked Linode support, but of course it's not something they can really help with but they did point me to my LISH console where there were errors about drbd split brain. However I think that may be a result of whatever is going wrong in the cluster management - I followed the instructions (here) to manually fix the drbd split brain, but crm_mon still showed each other's nodes as offline. After a while, ha2 started completely taking over somehow and started all it's services, so I had to reboot from Linode console to fix, and it's fine again. I could really do with some help on this as my Googlefu is not bringing up much useful leads.


Top
   
 Post subject:
PostPosted: Fri Nov 05, 2010 1:22 pm 
Offline
Senior Member

Joined: Sun Oct 30, 2005 7:52 pm
Posts: 97
While I don't have much experience with the HA aspects of things, my first guess would the the communications between the two nodes (heartbeat). One node thinks the other is no longer available and is doing its configured job of taking over.

Hopefully someone else here will chime in.

--
Travis


Top
   
 Post subject:
PostPosted: Wed Dec 01, 2010 7:29 am 
Offline
Senior Newbie

Joined: Fri Oct 15, 2010 8:09 pm
Posts: 6
Yeah that sounds about right, otherbbs, that's about all I can gather too.

I opened a Linode ticket but they can't really do much, but they did point me in the direction of the LISH shell, which had an error about drbd split brain, as I mentioned in my previous post. Unfortunately as I suspected, it's not the cause of the problem, just a resulting factor. The error is something to do with the cluster management stuff which I'm clueless with.

At the moment I've set ha2 to standby (crm node standby ha2) which has caused it not to stop all my sites working, but the error still exists. It's worth noting that because ha2 hasn't tried taking over, the drbd split brain situation hasn't arisen, hence my logic that that's not the root of the problem.

Even stranger is that yesterday ha2 was standby + OFFLINE, but today (without restarting ha2), it is just standby (therefore online).

I don't even know what to look for in the logs... I'm considering just dropping the second Linode completely and going back to a single Linode, this hassle just isn't worth the extra money I'm spending...


Top
   
Display posts from previous:  Sort by  
Post new topic  Reply to topic


Who is online

Users browsing this forum: No registered users and 5 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  
RSS

Powered by phpBB® Forum Software © phpBB Group