Linode Forum
Linode Community Forums
 FAQFAQ    SearchSearch    MembersMembers      Register Register 
 LoginLogin [ Anonymous ] 
Post new topic  Reply to topic
Author Message
 Post subject: Rampant Web Spider
PostPosted: Fri Sep 21, 2007 12:00 pm 
Offline
Junior Member

Joined: Thu Jun 16, 2005 12:28 pm
Posts: 33
Has anyone else been having trouble getting massive amounts of connections from a web spider called Twiceler? I'm getting hit literally thousands of times per "session", and it's been happening for months it would seem, after having went and checked my logs. Each time it goes into one of its fits, it comes from the same IP. But the IPs are hardly ever the same between each rampant attempt.

They're not actually loading pages though; the bot, or at least certain ones, are getting hung at a 302 when my page redirects to have a www. in the url. Thank goodness they're not actually loading the root of my website that many times, or I'm pretty sure it'd be unusable. And I just came to the conclusion today that that probably means that it can't even get robots.txt (since it can't get past the redirection), because it's been blocked from there for a while now. But what's odd is that I've seen certain Twiceler bots actually crawling my site properly.

I wrote to the company before, but nothing's been done. They asked for my log files despite me telling them exactly what the problem was with lots of info on attempts and IPs. I mean, the log file was full of nearly identical lines of identical attempts, sometimes with the same timestamp even since it happens so quickly (as I explained to and showed them), but I sent them the huge log of thousands of nearly identical lines anyway.

I even checked Google a moment ago, and it appears I'm not the only one getting hit by this thing.

How annoying is it that people can't control their bots, and don't even pull them down despite knowing they have a problem? I'd bet this happens to a lot of folks, many without even realizing it, since a lot of websites have redirections in place. I really don't know what to do about it short of banning the dozens of IPs I've seen so far, or telling them to add me to a block list the guy mentioned. But that's not a solution, especially if it's still affecting others.

I wrote them again today because I'm pretty much growing tired of it (over 4000 attempts to crawl my site cluttering up my logs when I got up this morning), but I won't hold my breath after all the months that've gone by seeing the problem.


Last edited by FyberOptic on Fri Sep 21, 2007 2:47 pm, edited 1 time in total.

Top
   
 Post subject:
PostPosted: Fri Sep 21, 2007 12:46 pm 
Offline
Senior Member
User avatar

Joined: Thu Jun 21, 2007 7:13 pm
Posts: 100
Website: http://neo101.org
Can you make everything redirect but the robots.txt file? Maybe that would solve your problem for bots that cannot understand redirects.


Top
   
 Post subject:
PostPosted: Fri Sep 21, 2007 2:46 pm 
Offline
Junior Member

Joined: Thu Jun 16, 2005 12:28 pm
Posts: 33
harmone wrote:
Can you make everything redirect but the robots.txt file? Maybe that would solve your problem for bots that cannot understand redirects.


That's a good idea!


Though I just got an email back from them, and they said that the IP from this morning, and many of the others in the log file I sent before, weren't theres. They also put a list of IPs on their site, and they all resolve to *.cuill.com. So that's odd, it seems many bots are masquerading as one of theirs, according to them.

If that's the case, I almost feel bad for thinking poorly of them! But now I'm left to try and just ban all those other IPs I guess, since they're fakes, and probably the same fakes causing others problems too.


EDIT: Hmm, checking Google, it seems other people having problems are getting them from the real bots. So I really dunno what to think anymore.


Top
   
 Post subject:
PostPosted: Sat Sep 22, 2007 6:47 am 
Offline
Senior Member

Joined: Fri Feb 18, 2005 4:09 pm
Posts: 594
My current top-level .htaccess:

SetEnvIfNoCase User-Agent ".*noxtrumbot*." spambot=1
SetEnvIfNoCase User-Agent ".*Indy Library.*" spambot=1
SetEnvIfNoCase User-Agent ".*Zeus.*" spambot=1
SetEnvIfNoCase User-Agent ".*linko*." spambot=1
SetEnvIfNoCase User-Agent ".*imagefetch*." spambot=1
SetEnvIfNoCase User-Agent ".*urniti*." spambot=1
SetEnvIfNoCase User-Agent ".*kuloko-bot*." spambot=1
SetEnvIfNoCase User-Agent ".*nameprotect*." spambot=1
SetEnvIfNoCase User-Agent ".*grub-client*." spambot=1
SetEnvIfNoCase User-Agent ".*WebCopier*." spambot=1
SetEnvIfNoCase User-Agent ".*Zyborg*." spambot=1
SetEnvIfNoCase User-Agent ".*WebZIP*." spambot=1
SetEnvIfNoCase User-Agent ".*Downloader*." spambot=1
SetEnvIfNoCase User-Agent ".*Ninja*." spambot=1
SetEnvIfNoCase User-Agent ".*OmniExplorer_Bot*." spambot=1
SetEnvIfNoCase User-Agent ".*omni-explorer*." spambot=1
SetEnvIfNoCase User-Agent ".*NG/2.0*." spambot=1
SetEnvIfNoCase User-Agent ".*WebStripper*." spambot=1
SetEnvIfNoCase User-Agent ".*mafin*." spambot=1
SetEnvIfNoCase User-Agent ".*MAFin*." spambot=1
SetEnvIfNoCase User-Agent ".*Snapbot*." spambot=1
SetEnvIfNoCase User-Agent ".*QihooBot*." spambot=1
SetEnvIfNoCase User-Agent ".*Baiduspider*." spambot=1
SetEnvIfNoCase User-Agent ".*baiduspider*." spambot=1
SetEnvIfNoCase User-Agent ".*iaskspider*." spambot=1
SetEnvIfNoCase User-Agent ".*Scanner*." spambot=1
SetEnvIfNoCase User-Agent ".*IRLbot*." spambot=1
SetEnvIfNoCase User-Agent ".*HTTrack*." spambot=1
SetEnvIfNoCase User-Agent ".*MSNPTC*." spambot=1

SetEnvIfNoCase Referer www.addresses.com spambot=1
SetEnvIfNoCase Referer www.bwdow.com spambot=1

<Limit GET POST PUT>
Order allow,deny
Deny from 207.210.101.49
Deny from 210.82.118.14
Deny from 208.77.96.98
Deny from 82.99.30
Deny from 207.210.101.4
Deny from 64.79.219.5
Deny from 193.1.100.110
Deny from 86.95.251.198
Deny from 87.210.41.139
Deny from 164.100.111.69
Deny from 81.168.228.218
Deny from 200.73.70.195
Deny from 66.16.63.44
Deny from 88.45.219.250
Deny from 194.7.161.130
Deny from 65.112.42.83
Deny from 209.191.123.34
Deny from 212.241.248.10
Deny from 88.151.114.33
Deny from 65.88.178.10
Deny from 66.195.77.130
Deny from 194.27.13.195
Deny from 141.11.234.60
Deny from 65.19.150
Deny from 6.234.139
Deny from 38.99.203.110
Deny from 204.9.204.202
Deny from 60.28.17.43
Deny from 210.173.180.145
Deny from 64.27.31.205
Deny from 137.82.84.97
Deny from 88.151.114.37
Deny from env=spambot
Allow from all
</Limit>


Top
   
Display posts from previous:  Sort by  
Post new topic  Reply to topic


Who is online

Users browsing this forum: No registered users and 3 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  
RSS

Powered by phpBB® Forum Software © phpBB Group