Linode Forum
Linode Community Forums
 FAQFAQ    SearchSearch    MembersMembers      Register Register 
 LoginLogin [ Anonymous ] 
Post new topic  Reply to topic
Author Message
 Post subject: High CPU by web crawler
PostPosted: Wed Aug 13, 2014 10:26 am 
Offline
Senior Member

Joined: Fri May 02, 2014 5:20 pm
Posts: 58
Website: http://www.sturmkrieg.ru
Location: Richmond
I'm using a web crawler that is now using close to the maximum CPU.

https://github.com/KrasnayaSecurity/Wik ... crawler.py

Earlier designs did not use so much CPU, and I think it may have to do with the extra features added to prevent recrawling the same webpages. Memory management is better because of redundant URLs being removed. It used to be projected that it could not run for more than a day because the memory would fill up with the URLs in the crawl list.

If there were multiple crawlers running or other important processes running, would the CPU time just go to them instead and maybe make the crawlers run slightly slower, or would it cause a problem?

Image

EDIT

The memory seems like it's starting to plateau and the crawler seems to be slowing down as if it has been getting almost all the pages on that site.


Top
   
Display posts from previous:  Sort by  
Post new topic  Reply to topic


Who is online

Users browsing this forum: No registered users and 0 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  
RSS

Powered by phpBB® Forum Software © phpBB Group