Linode Forum
Linode Community Forums
 FAQFAQ    SearchSearch    MembersMembers      Register Register 
 LoginLogin [ Anonymous ] 
Post new topic  Reply to topic
Author Message
 Post subject: PHP Crawler Dev Help
PostPosted: Sat Nov 13, 2010 8:27 pm 
Offline
Junior Member

Joined: Sun Nov 07, 2010 10:36 pm
Posts: 33
Website: http://www.rent-matcher.com
Hi everyone,

I was wondering if there was a resource for PHP development. I've done a lot of web searching and most of what I'm finding is rather inadequate. What I mean by resource is, something along the lines of hire a coder or even just some hand-holding while I iron out the kinks in my code as some of what I'm doing is still beyond my current skills.

Any ideas/suggestions would be greatly appreciated. Thanks!


Top
   
 Post subject:
PostPosted: Sat Nov 13, 2010 8:35 pm 
Offline
Senior Newbie

Joined: Sat Oct 02, 2010 3:31 pm
Posts: 7
Hello,

http://www.php.net/manual/en/

together with the full function documentation available on that site, makes it a great reference.

There are thousands upon thousands of resources available on the internet regarding PHP. If you cannot find what you are searching for, you are most likely not being specific enough in your search terms.

If you are a beginner, picking up a book might be an idea.

If you have specific questions, I am sure someone will be able to give a hint or two here, or in the IRC channel.

Regards,
Ovron


Top
   
 Post subject:
PostPosted: Sun Nov 14, 2010 9:06 am 
Offline
Senior Newbie

Joined: Wed Apr 28, 2010 6:23 am
Posts: 10
Don't do it in PHP. It has horrible memory management and plenty bugs that leak memory. If you want to execute long running processes (like crawlers), PHP is your enemy.


Top
   
 Post subject:
PostPosted: Sun Nov 14, 2010 11:31 am 
Offline
Senior Newbie

Joined: Sat Oct 02, 2010 3:31 pm
Posts: 7
mst wrote:
Don't do it in PHP. It has horrible memory management and plenty bugs that leak memory. If you want to execute long running processes (like crawlers), PHP is your enemy.


{{citation needed}}


Top
   
 Post subject:
PostPosted: Sun Nov 14, 2010 11:39 am 
Offline
Senior Newbie

Joined: Wed Apr 28, 2010 6:23 am
Posts: 10
Ovron wrote:
{{citation needed}}


Years of experience and specialization in screen scraping. PHP is excellent for web applications, shell scripts and practically anything that doesn't run for a while or requires multithreading. I have written crawlers in PHP several times. The results are always horrible, and if you can't afford the performance hit of restarting the process (e.g. batch processing 250k+ URL contents), PHP isn't you friend.


Top
   
 Post subject:
PostPosted: Sun Nov 14, 2010 1:56 pm 
Offline
Junior Member

Joined: Sun Nov 07, 2010 10:36 pm
Posts: 33
Website: http://www.rent-matcher.com
mst wrote:
Ovron wrote:
{{citation needed}}


Years of experience and specialization in screen scraping. PHP is excellent for web applications, shell scripts and practically anything that doesn't run for a while or requires multithreading. I have written crawlers in PHP several times. The results are always horrible, and if you can't afford the performance hit of restarting the process (e.g. batch processing 250k+ URL contents), PHP isn't you friend.


Thanks for a helpful response first of all. Secondly, what would you recommend doing it in, as opposed to PHP. I'm not attached to the idea but I was hoping to keep it in PHP for the simple reason that I'd like to be able to initiate a crawl from the admin section of my site. Thanks again!


Top
   
 Post subject:
PostPosted: Sun Nov 14, 2010 4:46 pm 
Offline
Senior Member
User avatar

Joined: Tue Mar 17, 2009 5:11 am
Posts: 129
Location: UK
As you're running a VPS, I'd personally use python:

http://www.example-code.com/python/spid ... rawler.asp


Top
   
 Post subject:
PostPosted: Mon Nov 15, 2010 6:51 pm 
Offline
Senior Newbie

Joined: Mon Nov 15, 2010 6:30 pm
Posts: 18
Website: http://www.michaelhart.me/
mst wrote:
Don't do it in PHP. It has horrible memory management and plenty bugs that leak memory. If you want to execute long running processes (like crawlers), PHP is your enemy.


While this is true, I think the biggest failing here is to assume that a crawler needs to be a "long running process." I would suggest making a scheduler script, using a database like MySQL to build a queue, and using many short-lived php tasks. The result? Better resource usage and the crawler need only handle 1 small task at a time. The aggregate data can be stored in another database. There's very little reason that a crawler would need cross-crawl data, and just about any reason I can think of can be resolved using a queuing database. A very simple structure would be queueUid(primary key) | queuePriority(allows for more complex queuing) | queueData(serialized array; can include instructions, cookie data, referrer, or anything else you would want to pass on)

This, imo, is far better than just running a single long-term process in python. For one, multi-threading is far easier, and you can very easily control the number of threads. This method is far easier to multi-thread than a single long-term process multi-threaded in any language.


Top
   
 Post subject:
PostPosted: Tue Nov 16, 2010 7:15 am 
Offline
Senior Member

Joined: Mon Dec 07, 2009 6:46 am
Posts: 331
Code:
import os

print "Hello from process A"

if not os.fork():
  print "Hello from process B"


How much easier does it get?


Top
   
 Post subject:
PostPosted: Tue Nov 16, 2010 11:16 am 
Offline
Senior Member
User avatar

Joined: Tue May 26, 2009 3:29 pm
Posts: 1691
Location: Montreal, QC
Azathoth wrote:
Code:
import os

print "Hello from process A"

if not os.fork():
  print "Hello from process B"


How much easier does it get?


Code:
if ($pid = pcntl_fork())
    echo "Hello from process A";
else
    echo "Hello from process B";


PHP


Top
   
 Post subject:
PostPosted: Tue Nov 23, 2010 5:57 am 
Offline
Newbie

Joined: Tue Nov 23, 2010 5:44 am
Posts: 4
hello jefe78,

Although mst is true to some extend, I would still suggest you to stick with something easy and comfortable. If you have some experience with php, stick to it. Its easier to code in something you already know and then later port it to a different language for performance.

Well buddy I myself am working on a couple of php project too. If you get stuck somewhere just shoot me a message and I will try my best to guide you.

Take Care. :)


Top
   
Display posts from previous:  Sort by  
Post new topic  Reply to topic


Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  
RSS

Powered by phpBB® Forum Software © phpBB Group