Linode Forum
Linode Community Forums
 FAQFAQ    SearchSearch    MembersMembers      Register Register 
 LoginLogin [ Anonymous ] 
Post new topic  Reply to topic
Author Message
PostPosted: Sun Apr 13, 2014 5:12 am 
Offline
Senior Member

Joined: Mon Jan 02, 2012 12:45 pm
Posts: 365
For those of you wondering why you got unexpected "Host initiated restart" notices overnight, here's why:
http://status.linode.com/

One of our servers is still down. I can ping it but cannot connect to it vis lish, ssh, etc. Longview has no recent activity, but the graphs in the Linode Manager show that the CPU has a little activity (including a spike from a cron job).

Anyone else having issues?


Top
   
PostPosted: Sun Apr 13, 2014 7:11 am 
Offline
Senior Member

Joined: Mon Jan 02, 2012 12:45 pm
Posts: 365
Update:

I can connect via LISH, but nothing else.
- all normal services are running (including ssh, ftp, http, etc).
- I've turned off iptables in case it was a firewall issue.
- I've rebooted.
- I've recreated /etc/resolv.conf and restarted the networking service.

Any ideas?

I've tried connecting to it from one of our test VPSs located in the same data center. The test VPS is running normally.
- the test VPS cannot connect to the problem VPS via LISH, ssh, ftp, http, etc.
- the problem VPS cannot connect to the test VPS via LISH, ssh, ftp, http, etc.
- the problem VPS can ping domains not on the problem server and get the response.

I can use wget on the problem VPS to get webpages from sites located on the problem VPS, but not from any other server.

Support has suggested booting into 'Rescue Mode' and performing a filesystem check. I'm currently cloning the file system and will try rescue mode.


Top
   
PostPosted: Sun Apr 13, 2014 8:30 am 
Offline
Senior Member

Joined: Sun Mar 07, 2010 7:47 pm
Posts: 1970
Website: http://www.rwky.net
Location: Earth
Hrm could be a myriad of things. What's the contents of your network config files? What's the output of route -n?

_________________
Paid support
How to ask for help
1. Give details of your problem
2. Post any errors
3. Post relevant logs.
4. Don't hide details i.e. your domain, it just makes things harder
5. Be polite or you'll be eaten by a grue


Top
   
PostPosted: Sun Apr 13, 2014 8:38 am 
Offline
Senior Member

Joined: Mon Jan 02, 2012 12:45 pm
Posts: 365
obs,
I'll check 'route -n' once the fsck is done. The 'e2fsck -f' has been running for over an hour and it's still on 'Pass 1'. It's an 82 GB file system image.

I've never run into an fsck that has taken this long. Ugh.


Top
   
PostPosted: Sun Apr 13, 2014 8:41 am 
Offline
Senior Member

Joined: Sun Mar 07, 2010 7:47 pm
Posts: 1970
Website: http://www.rwky.net
Location: Earth
That's not good. I've run fsck on 7 boxes in Newark in less time than that. What's on the box? Are there lots of files? Is the disk pretty full?

_________________
Paid support
How to ask for help
1. Give details of your problem
2. Post any errors
3. Post relevant logs.
4. Don't hide details i.e. your domain, it just makes things harder
5. Be polite or you'll be eaten by a grue


Top
   
PostPosted: Sun Apr 13, 2014 8:46 am 
Offline
Senior Member

Joined: Mon Jan 02, 2012 12:45 pm
Posts: 365
It's a production web server with a few dozen websites. I think the free space on the drive was about 30%.

I'd hate to lose the hour that it's been running. Is there any way to check if the VPS is still running in recovery mode without losing the progress of the fsck (if there has been any)?


Top
   
PostPosted: Sun Apr 13, 2014 8:49 am 
Offline
Senior Member

Joined: Sun Mar 07, 2010 7:47 pm
Posts: 1970
Website: http://www.rwky.net
Location: Earth
If you're running via LISH and haven't started SSH in rescue mode then nope, you've only got one terminal session which you can access. You could try asking support if they can see what's going on.

_________________
Paid support
How to ask for help
1. Give details of your problem
2. Post any errors
3. Post relevant logs.
4. Don't hide details i.e. your domain, it just makes things harder
5. Be polite or you'll be eaten by a grue


Top
   
PostPosted: Sun Apr 13, 2014 8:53 am 
Offline
Senior Member
User avatar

Joined: Sat Aug 30, 2008 1:55 pm
Posts: 1739
Location: Rochester, New York
Or check the charts on the dashboard; fsck should show up as some nontrivial amount of disk I/O

_________________
Code:
/* TODO: need to add signature to posts */


Top
   
PostPosted: Sun Apr 13, 2014 9:06 am 
Offline
Senior Member

Joined: Mon Jan 02, 2012 12:45 pm
Posts: 365
The fsck finished. I had lost the LISH connection again (it's been only lasting a few minutes at a time but doesn't always respond when trying to reconnect).
Here's the output of 'route -n':
Code:
[root@www ~]# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         198.74.60.1     0.0.0.0         UG    0      0        0 eth0
0.0.0.0         66.175.213.1    0.0.0.0         UG    0      0        0 eth0
0.0.0.0         66.175.212.1    0.0.0.0         UG    0      0        0 eth0
0.0.0.0         66.175.210.1    0.0.0.0         UG    0      0        0 eth0
0.0.0.0         50.116.48.1     0.0.0.0         UG    0      0        0 eth0
50.116.48.0     0.0.0.0         255.255.255.0   U     0      0        0 eth0
66.175.210.0    0.0.0.0         255.255.255.0   U     0      0        0 eth0
66.175.212.0    0.0.0.0         255.255.255.0   U     0      0        0 eth0
66.175.213.0    0.0.0.0         255.255.255.0   U     0      0        0 eth0
169.254.0.0     0.0.0.0         255.255.0.0     U     0      0        0 eth0
198.74.60.0     0.0.0.0         255.255.255.0   U     0      0        0 eth0


I'm not sure why 198.74.60.1 is in that list, though I assume it's our gateway at Linode (resolves to gw-li557.linode.com).


Top
   
PostPosted: Sun Apr 13, 2014 9:08 am 
Offline
Senior Member

Joined: Mon Jan 02, 2012 12:45 pm
Posts: 365
hoopycat wrote:
Or check the charts on the dashboard; fsck should show up as some nontrivial amount of disk I/O

I was in rescue mode and I didn't see any activity on the graphs during the 1 1/2 hours it was in rescue mode.


Top
   
PostPosted: Sun Apr 13, 2014 9:34 am 
Offline
Senior Member

Joined: Sun Mar 07, 2010 7:47 pm
Posts: 1970
Website: http://www.rwky.net
Location: Earth
Main Street James wrote:
The fsck finished. I had lost the LISH connection again (it's been only lasting a few minutes at a time but doesn't always respond when trying to reconnect).
Here's the output of 'route -n':
Code:
[root@www ~]# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         198.74.60.1     0.0.0.0         UG    0      0        0 eth0
0.0.0.0         66.175.213.1    0.0.0.0         UG    0      0        0 eth0
0.0.0.0         66.175.212.1    0.0.0.0         UG    0      0        0 eth0
0.0.0.0         66.175.210.1    0.0.0.0         UG    0      0        0 eth0
0.0.0.0         50.116.48.1     0.0.0.0         UG    0      0        0 eth0
50.116.48.0     0.0.0.0         255.255.255.0   U     0      0        0 eth0
66.175.210.0    0.0.0.0         255.255.255.0   U     0      0        0 eth0
66.175.212.0    0.0.0.0         255.255.255.0   U     0      0        0 eth0
66.175.213.0    0.0.0.0         255.255.255.0   U     0      0        0 eth0
169.254.0.0     0.0.0.0         255.255.0.0     U     0      0        0 eth0
198.74.60.0     0.0.0.0         255.255.255.0   U     0      0        0 eth0


I'm not sure why 198.74.60.1 is in that list, though I assume it's our gateway at Linode (resolves to gw-li557.linode.com).


You should only have one entry starting 0.0.0.0 what's the contents of your network config file? And what's the primary IP of the node (ie. the one assigned to eth0).

_________________
Paid support
How to ask for help
1. Give details of your problem
2. Post any errors
3. Post relevant logs.
4. Don't hide details i.e. your domain, it just makes things harder
5. Be polite or you'll be eaten by a grue


Top
   
PostPosted: Sun Apr 13, 2014 9:39 am 
Offline
Senior Member

Joined: Mon Jan 02, 2012 12:45 pm
Posts: 365
obs,
The primary IP is 50.116.48.0. The 66.175.X.X IPs are additional IPs used for SSLs for ecommerce sites on that server.

Which config file(s) would you like to see?


James


Top
   
PostPosted: Sun Apr 13, 2014 9:41 am 
Offline
Senior Member

Joined: Mon Jan 02, 2012 12:45 pm
Posts: 365
I really suspect that the problem is in Linode's network somewhere. Perhaps a piece of network gear or a server that failed due to the power failure.

We've never had any problems with this server in the past and I haven't changed the configuration on this server for quite some time.


Top
   
PostPosted: Sun Apr 13, 2014 9:46 am 
Offline
Senior Member

Joined: Sun Mar 07, 2010 7:47 pm
Posts: 1970
Website: http://www.rwky.net
Location: Earth
The network ones I don't know what OS you're using but if it's ubuntu it'd be /etc/network/interfaces I suspect you've multiple gateway lines when you should only have one see here https://library.linode.com/networking/c ... ian-ubuntu

The routing table should look something like this
Code:
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         50.116.33.1     0.0.0.0         UG    100    0        0 eth0
50.116.33.0     0.0.0.0         255.255.255.0   U     0      0        0 eth0
50.116.37.0     0.0.0.0         255.255.255.0   U     0      0        0 eth0
50.116.38.0     0.0.0.0         255.255.255.0   U     0      0        0 eth0
50.116.39.0     0.0.0.0         255.255.255.0   U     0      0        0 eth0
173.230.133.0   0.0.0.0         255.255.255.0   U     0      0        0 eth0
192.168.128.0   0.0.0.0         255.255.128.0   U     0      0        0 eth0
198.74.52.0     0.0.0.0         255.255.255.0   U     0      0        0 eth0


This is from a box with multiple SSL certs on it.

_________________
Paid support
How to ask for help
1. Give details of your problem
2. Post any errors
3. Post relevant logs.
4. Don't hide details i.e. your domain, it just makes things harder
5. Be polite or you'll be eaten by a grue


Top
   
PostPosted: Sun Apr 13, 2014 10:42 am 
Offline
Senior Member

Joined: Mon Jan 02, 2012 12:45 pm
Posts: 365
To All,
Thank you for your help. This issue has been resolved by the Linode Support staff (who have graciously put up with my pestering nature while dealing with the aftermath of last night's power outage). Support has resolved a configuration issue on their end and now everything is responding correctly.

obs,
This VPS is running CentOS. I am in the planning stages of moving all the sites to Ubuntu LTS servers so I'm not going to try to figure out why my routing table seems to be a bit funky (though it may turn out to be a rabbit I chase anyway). I'm going to wait and see the reviews of 14.04 LTS before deciding whether to go with 14.04 or if I should stick with 12.04 (which I have on other VPSs).


Thanks again,
James


Top
   
Display posts from previous:  Sort by  
Post new topic  Reply to topic


Who is online

Users browsing this forum: No registered users and 5 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  
RSS

Powered by phpBB® Forum Software © phpBB Group