Quote:
Why would a search engine indexer cause much disk IO except to do database updates?
Exactly, it's database updates. With a 4GB main index table and 720MB memory it's going to use some disk I/O. The spider is continuously downloading pages and updating them into the database.
I'm looking at ways to optimize this all the time, so any advice is welcome.
The particular problem I have is this:
Big main index table of 4GB with only 720MB ram makes for super slow queries. If someone searches for "web hosting in Europe" there will be a query into the database for "web", "hosting" and "Europe" asking for 100 entries sorted by score, the record is [keywordid=int] [siteid=int] [score=float] and there's an index on keyword-score for the sorting
A query for one keyword can take up to 30 seconds, not good.
My solution - I made a cache, a separate table with the record
[keywordid=int indexed] [sitedata=text]
Here the sitedata is basically text list of the top 150 keyword from the previous table.
It's super quick to query that table for searches and the search engine can answer a query for "web hosting in Europe" in 0.227 seconds instead of 37 seconds.
The problem is updating that cache, it has to be updated at the same time as I update the main index as it's basically a subset of the main index arranged for quick query.
The update means reading sitedata which is a 2KB text string, parsing it, updating it and writing the 2KB string back to the database. If ther's 400 keywords in a webpage it means read/write of 400 2KB records.
Doing this 4 times a minute generates the 6000-8000IO/sec
That's a read/write of 300*2KB*4 / 60 = 40K data/second which to me does not sound like a lot but it generates that IO rate.
By caching these updates to a file on disk and only updating to the database after 50 file updates I have been able to reduce the data rate to a third.
Somehow I have the feeling that amount of data should not generate that much IO activity but then again I don't know the database operations translates to IO.
Any MySQL guru that want to comment?
[/quote]