Linode Forum
Linode Community Forums
 FAQFAQ    SearchSearch    MembersMembers      Register Register 
 LoginLogin [ Anonymous ] 
Post new topic  Reply to topic
Author Message
 Post subject: BOTs attack
PostPosted: Fri Jun 25, 2010 2:22 am 
Offline
Junior Member

Joined: Sun Apr 11, 2010 7:06 am
Posts: 22
massive attack by bots and crawlers over the last 1 month which was basically resultant of the site being indexed and getting into the top 100k sites.

The bots are now pulling huge CPU load making it impossible to browse the the site.

I need a solution to ban every useragent other than MSN,GOOGLE, YAHOO and couple of others. I dont care if they are valid or not just dont want them to take any of my resources.

robots.txt and .htaccess have both failed in limiting these bad bots.

Am told that there are scripts that run in linux background that can detect bad bot behavior and automatically ban them thus adding them to a bad list and hence increasing efficiency of the site.


If you guys have ideas or such scripts, please do help.

Thanks


Top
   
 Post subject:
PostPosted: Fri Jun 25, 2010 2:28 am 
Offline
Senior Newbie

Joined: Sun Dec 27, 2009 6:34 am
Posts: 10
Do you mean referral site? Also, couldn't you just use something like fail2ban that would automatically ban that stuff if you configure it right?


Top
   
 Post subject:
PostPosted: Fri Jun 25, 2010 2:48 am 
Offline
Senior Newbie

Joined: Sun Jan 24, 2010 4:21 pm
Posts: 16
Location: Herning
as fresbee says, fail2ban would be a okay start: http://edin.no-ip.com/category/tags/fail2ban

It has a badbot part built in if you use debian


Top
   
 Post subject:
PostPosted: Fri Jun 25, 2010 3:53 am 
Offline
Junior Member

Joined: Sun Apr 11, 2010 7:06 am
Posts: 22
Thanks a lot.

I am trying out fail2ban.

Hopefully it should reduce the bots/crawlers.


Top
   
 Post subject:
PostPosted: Fri Jun 25, 2010 3:59 am 
Offline
Junior Member

Joined: Sun Apr 11, 2010 7:06 am
Posts: 22
the var/log/apache2/access.log
is empty.

Any ideas why it will be blank ?

Do I need to enable any config in apache2.conf for the log to start filling?

thanks in advance


Top
   
 Post subject:
PostPosted: Fri Jun 25, 2010 4:42 am 
Offline
Senior Newbie

Joined: Sun Jan 24, 2010 4:21 pm
Posts: 16
Location: Herning
Try checking the conf files in /etc/apache2/sites-available/*

If you have used any of the linode libary guides your logs will be located elsewere.


Top
   
 Post subject:
PostPosted: Fri Jun 25, 2010 8:05 am 
Offline
Junior Member

Joined: Sun Apr 11, 2010 7:06 am
Posts: 22
ok folks

fail2ban is not working.

CPU loads are continuosly at elevated levels.

Ny more ideas or any powerful script to ban spiders.


Top
   
 Post subject:
PostPosted: Fri Jun 25, 2010 9:08 am 
Offline
Senior Member
User avatar

Joined: Fri Dec 11, 2009 7:09 pm
Posts: 168
It's still in beta, but you could look at http://www.projecthoneypot.org/httpbl.php

_________________
--
Chris Bryant


Top
   
 Post subject:
PostPosted: Fri Jun 25, 2010 10:30 am 
Offline
Junior Member

Joined: Thu Jun 03, 2010 4:44 pm
Posts: 35
You could block the IP addresses with IP Tables. You will have to look at your logs to find the IP Addresses though...


Top
   
 Post subject:
PostPosted: Fri Jun 25, 2010 11:18 am 
Offline
Senior Newbie

Joined: Sun Jan 24, 2010 4:21 pm
Posts: 16
Location: Herning
Is there any way you can post some logs of what is doing the spikes?


Top
   
 Post subject:
PostPosted: Fri Jun 25, 2010 1:56 pm 
Offline
Junior Member
User avatar

Joined: Thu Apr 29, 2010 3:32 pm
Posts: 44
Website: http://devjonfos.net
Location: Oregon
If you find specific IP addresses, you can add them to your .htaccess for the affected websites:

Code:
Order Deny,Allow
Deny from xxx.xxx.xxx.xxx

And you can even block at the Class C network level:

Code:
Order Deny,Allow
Deny from xxx.xxx.xxx


If you see multiple attempts from different hosts at the same Class C network.

I do this to block IP's that are trying to break thru my CAPTCHA at one of my websites.


Top
   
 Post subject:
PostPosted: Fri Jun 25, 2010 4:47 pm 
Offline
Senior Newbie

Joined: Sun Dec 27, 2009 6:34 am
Posts: 10
If we can see some of the logs or maybe a screenshot or 5 of htop we can probably figure something else out.


Top
   
 Post subject:
PostPosted: Fri Jun 25, 2010 10:50 pm 
Offline
Junior Member

Joined: Sun Apr 11, 2010 7:06 am
Posts: 22
Am trying to install AWSTATS to study the spiders which are creating the problem.


While installing AWSTATS for debian, the paths are all not updated in awstats configuration file. Particularly I need help in updating the following paths variable which is being provided for an earlier version of debian.

$AWSTATS_PATH='';
$AWSTATS_ICON_PATH='/usr/share/awstats/icon';
$AWSTATS_CSS_PATH='/usr/share/doc/awstats/examples/css';
$AWSTATS_CLASSES_PATH='/usr/share/doc/awstats/examples/classes';
$AWSTATS_CGI_PATH='/usr/lib/cgi-bin';
$AWSTATS_MODEL_CONFIG='/etc/awstats/awstats.model.conf'; # Used only when configure ran on linux
$AWSTATS_DIRDATA_PATH='/var/lib/awstats';


Please provide any updated documentation on awstats for debian or if anyone has it installed and has these parameters set, please help.

Am not able to see stats through the browser which I think is a problem with thevar not being correctly set


Top
   
 Post subject:
PostPosted: Sat Jun 26, 2010 1:46 am 
Offline
Senior Newbie

Joined: Sun Jan 24, 2010 4:21 pm
Posts: 16
Location: Herning
http://www.debianadmin.com/apache-log-f ... ebian.html

The awstats_configure.pl should more or less make sure that all paths correspond with what you have on your system - Did you run the perl script?


Top
   
 Post subject:
PostPosted: Sun Jun 27, 2010 11:45 am 
Offline
Junior Member

Joined: Sun Apr 11, 2010 7:06 am
Posts: 22
Output from top command. The mysql is a runaway process and within 5 minutes of restarting it reaches 400% CPU usage virtualling blocking everything else.
I have made all possibloe changes to my.cnf and apache2.conf to have a stable system but to no avail.

I checked awstats and found that yahoo slurp is creating the maximum trouble.

So blocked yahoo slurp through robots.txt and .htaccess.

SLURP somehow still manages to hit my site. I even blocked its IP 67.195.*.* but now I find it crawling through another address:
*.crawl.yahoo.net.

Any ways to block yahoo completely off my site? It is one @#$@ hole company.


output from top command

lollipop:~# htop
-bash: htop: command not found
lollipop:~# top
top - 15:43:38 up 1 day, 2:30, 1 user, load average: 24.34, 22.14, 18.43
Tasks: 104 total, 1 running, 103 sleeping, 0 stopped, 0 zombie
Cpu(s): 18.1%us, 81.8%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.1%st
Mem: 1417440k total, 839028k used, 578412k free, 9284k buffers
Swap: 524280k total, 3996k used, 520284k free, 161652k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
7650 mysql 18 0 483m 44m 4576 S 399 3.2 58:13.00 mysqld
7872 root 15 0 2268 1140 880 R 0 0.1 0:00.84 top
1 root 15 0 1992 568 540 S 0 0.0 0:00.00 init
2 root RT 0 0 0 0 S 0 0.0 0:00.00 migration/0
3 root 34 19 0 0 0 S 0 0.0 0:00.00 ksoftirqd/0
4 root RT 0 0 0 0 S 0 0.0 0:00.00 migration/1
5 root 34 19 0 0 0 S 0 0.0 0:00.00 ksoftirqd/1
6 root RT 0 0 0 0 S 0 0.0 0:00.00 migration/2
7 root 34 19 0 0 0 S 0 0.0 0:00.00 ksoftirqd/2
8 root RT 0 0 0 0 S 0 0.0 0:00.00 migration/3
9 root 34 19 0 0 0 S 0 0.0 0:00.00 ksoftirqd/3
10 root 10 -5 0 0 0 S 0 0.0 0:00.00 events/0
11 root 10 -5 0 0 0 S 0 0.0 0:00.00 events/1
12 root 10 -5 0 0 0 S 0 0.0 0:00.00 events/2
13 root 10 -5 0 0 0 S 0 0.0 0:00.00 events/3
14 root 20 -5 0 0 0 S 0 0.0 0:00.00 khelper
15 root 11 -5 0 0 0 S 0 0.0 0:00.00 kthread
17 root 11 -5 0 0 0 S 0 0.0 0:00.00 xenwatch


Top
   
Display posts from previous:  Sort by  
Post new topic  Reply to topic


Who is online

Users browsing this forum: No registered users and 0 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  
RSS

Powered by phpBB® Forum Software © phpBB Group