Linode Forum
Linode Community Forums
 FAQFAQ    SearchSearch    MembersMembers      Register Register 
 LoginLogin [ Anonymous ] 
Post new topic  Reply to topic
Author Message
PostPosted: Wed Sep 30, 2009 6:09 am 
Offline
Senior Newbie

Joined: Sun Sep 20, 2009 11:14 am
Posts: 8
Hullo, Linoders... Linudes... Lin...something-or-other. What DO we call ourselves, anyways?

Well, to the topic at hand. Disk I/O--it's something that I've never even really thought about before running a server, and I just don't really know much about it. What is a "good" value? "Bad?" Or does it depend on what type of Linode I have? What does it affect? Does it affect other Linodes if I have a "bad" value?

Apologies for the dumb questions, it's just that Google hasn't been very helpful.


Top
   
 Post subject:
PostPosted: Wed Sep 30, 2009 7:02 am 
Offline
Senior Member

Joined: Sun Aug 02, 2009 1:32 pm
Posts: 222
Website: https://www.barkerjr.net
Location: Connecticut, USA
There's no specific answer for this, as you found on Google. Myself, I used to hover 60-80 on 2.6.18, and now hover at 15-30 on 2.6.30. But it all depends on what you're running. If Apache is busy, it'll use a lot of IO writing access_log. If you server large files, then you'll spend a lot of IO reading files from disk (if they're too big to fit in cache).


Top
   
 Post subject:
PostPosted: Wed Sep 30, 2009 10:09 am 
Offline
Senior Member

Joined: Fri May 02, 2008 8:44 pm
Posts: 1121
I wouldn't worry about I/O unless I was on a small plan and constantly doing 500+


Top
   
 Post subject:
PostPosted: Wed Sep 30, 2009 1:29 pm 
Offline
Senior Member

Joined: Sat Mar 28, 2009 4:23 pm
Posts: 415
Website: http://jedsmith.org/
Location: Out of his depth and job-hopping without a clue about network security fundamentals
Generally,

(let me say that again before I'm quoted authoritatively on this in the future)

Generally,

Four digits of I/O, sustained, is when you should be thinking about it. I won't say if it's good or bad, just have it on your radar if your I/O is sustained at 1,000+ on Xen hosts.

_________________
Disclaimer: I am no longer employed by Linode; opinions are my own alone.


Top
   
PostPosted: Thu Jan 07, 2010 3:45 am 
Offline
Newbie

Joined: Thu Jan 07, 2010 1:45 am
Posts: 2
jed wrote:
Generally,

(let me say that again before I'm quoted authoritatively on this in the future)

Generally,

Four digits of I/O, sustained, is when you should be thinking about it. I won't say if it's good or bad, just have it on your radar if your I/O is sustained at 1,000+ on Xen hosts.


Which command should be used to measure this benchmark? vmstat? iostat? And what is the unit of measurement?


Top
   
 Post subject:
PostPosted: Thu Jan 07, 2010 4:00 pm 
Offline
Senior Member
User avatar

Joined: Tue May 26, 2009 3:29 pm
Posts: 1691
Location: Montreal, QC
It's reported in the linode control panel, that's the easiest way to check it.


Top
   
 Post subject:
PostPosted: Thu Jan 07, 2010 7:19 pm 
Offline
Senior Newbie

Joined: Wed Jul 29, 2009 12:34 pm
Posts: 10
Website: http://www.simonbyholm.com
Location: Finland
Talking about "right" IO rate I have been wondering about that for some time. I got the answer today though when a support ticket from Linode alerted me that my IO rate of 6000 sustained was too high on a 720. According to staff a rate of 1000-2000 is the max allowed.

It would of course be useful if Linode could post a table of recommended and max allowed peak/mean IO rates for the different size linodes. It would help planning growth and where to optimize.

If you wonder how I'm able to get a 6000 in IO it's because I run a search engine with indexer and I pretty much use as much CPU and disk cycles as I can get, would be great though to know how much is allowed :)

Simon
SecretSeachEngineLabs.com


Top
   
 Post subject:
PostPosted: Fri Jan 08, 2010 11:55 am 
Offline
Senior Member
User avatar

Joined: Tue May 26, 2009 3:29 pm
Posts: 1691
Location: Montreal, QC
sbyholm wrote:
Talking about "right" IO rate I have been wondering about that for some time. I got the answer today though when a support ticket from Linode alerted me that my IO rate of 6000 sustained was too high on a 720. According to staff a rate of 1000-2000 is the max allowed.

It would of course be useful if Linode could post a table of recommended and max allowed peak/mean IO rates for the different size linodes. It would help planning growth and where to optimize.

If you wonder how I'm able to get a 6000 in IO it's because I run a search engine with indexer and I pretty much use as much CPU and disk cycles as I can get, would be great though to know how much is allowed :)

Simon
SecretSeachEngineLabs.com


Why would a search engine indexer cause much disk IO except to do database updates? There are ways to mitigate that (commit to MEMORY tables and periodically update the on-disk table, or set up a database cluster).


Top
   
 Post subject:
PostPosted: Fri Jan 08, 2010 1:02 pm 
Offline
Senior Member

Joined: Sat Mar 28, 2009 4:23 pm
Posts: 415
Website: http://jedsmith.org/
Location: Out of his depth and job-hopping without a clue about network security fundamentals
sbyholm wrote:
According to staff a rate of 1000-2000 is the max allowed.

This is why I was careful to say generally above, because it's not that we "allow" a certain amount of I/O. There are a number of variables that influence our recommendation (or, in your case, request). Other customers being affected are one of the only reasons we will make such a request.

If you punish the disk for a while and nobody else is affected enough to complain, we don't get involved. I/O consumption is not an exact science, and we have to call it as we see it in each case.

As a generality, upgrading usually means you can get away with using more disk -- there are fewer Linodes sharing a host.

_________________
Disclaimer: I am no longer employed by Linode; opinions are my own alone.


Top
   
 Post subject:
PostPosted: Fri Jan 08, 2010 4:12 pm 
Offline
Senior Newbie

Joined: Wed Jul 29, 2009 12:34 pm
Posts: 10
Website: http://www.simonbyholm.com
Location: Finland
Quote:
Why would a search engine indexer cause much disk IO except to do database updates?


Exactly, it's database updates. With a 4GB main index table and 720MB memory it's going to use some disk I/O. The spider is continuously downloading pages and updating them into the database.

I'm looking at ways to optimize this all the time, so any advice is welcome.

The particular problem I have is this:

Big main index table of 4GB with only 720MB ram makes for super slow queries. If someone searches for "web hosting in Europe" there will be a query into the database for "web", "hosting" and "Europe" asking for 100 entries sorted by score, the record is [keywordid=int] [siteid=int] [score=float] and there's an index on keyword-score for the sorting

A query for one keyword can take up to 30 seconds, not good.

My solution - I made a cache, a separate table with the record

[keywordid=int indexed] [sitedata=text]

Here the sitedata is basically text list of the top 150 keyword from the previous table.

It's super quick to query that table for searches and the search engine can answer a query for "web hosting in Europe" in 0.227 seconds instead of 37 seconds.

The problem is updating that cache, it has to be updated at the same time as I update the main index as it's basically a subset of the main index arranged for quick query.

The update means reading sitedata which is a 2KB text string, parsing it, updating it and writing the 2KB string back to the database. If ther's 400 keywords in a webpage it means read/write of 400 2KB records.

Doing this 4 times a minute generates the 6000-8000IO/sec

That's a read/write of 300*2KB*4 / 60 = 40K data/second which to me does not sound like a lot but it generates that IO rate.

By caching these updates to a file on disk and only updating to the database after 50 file updates I have been able to reduce the data rate to a third.

Somehow I have the feeling that amount of data should not generate that much IO activity but then again I don't know the database operations translates to IO.

Any MySQL guru that want to comment?
[/quote]


Top
   
 Post subject:
PostPosted: Fri Jan 08, 2010 4:17 pm 
Offline
Senior Newbie

Joined: Wed Jul 29, 2009 12:34 pm
Posts: 10
Website: http://www.simonbyholm.com
Location: Finland
Quote:
This is why I was careful to say generally above, because it's not that we "allow" a certain amount of I/O


Ok, I'll raise the data rate slowly next time so noone noteices :)

No seriously I'll have to sit down for a couple of optimization sessions, the new caching system I did for searches increased disk IO with at least 600% and I feel the data moving I do should not have to generate that much IO.

Anyone know if you read one 2K record (one record) of data from a database how much disk I/O ops should that generate, about.


Top
   
 Post subject:
PostPosted: Fri Jan 08, 2010 6:14 pm 
Offline
Senior Member
User avatar

Joined: Tue May 26, 2009 3:29 pm
Posts: 1691
Location: Montreal, QC
I assume that you can't use MySQL's fulltext search functionality due to the scoring?

It sounds like you have database schema issues, but your schema isn't clear enough from your description for me to comment (I'm also not a MySQL guru :P)

If I understand it correctly, each keyword has an id, and you've got a table:

[keywordid=int] [siteid=int] [score=float]

With an index on (keywordid, score)

And you do something like:

SELECT * FROM table WHERE keywordid = 5 ORDER BY score DESC LIMIT 100

?

I don't see why that would possibly take 30 seconds for that query to run...


Top
   
 Post subject:
PostPosted: Sat Jan 09, 2010 4:56 am 
Offline
Senior Newbie

Joined: Wed Jul 29, 2009 12:34 pm
Posts: 10
Website: http://www.simonbyholm.com
Location: Finland
Code:
 $query = "SELECT * from " . $this->dbprefix . "keywordindex where keywordid='" . $keyword_id . "' order by score desc limit " . $limit . ";";


Where limit is normally 100 and the largest query times i get when there is like 50000 entries for that keyword and I fetch only the 100 first. A keyword with only 1000 enties will be a lot faster to get the 100 first.

Affecting the speed could be that the indexer is doing updates to the same database doing this

Code:
      $query = "DELETE from " . $this->dbprefix . "keywordindex where pageid=" . $pageid . ";\n";
      $query = "INSERT into " . $this->dbprefix . "keywordindex values \n";


Where the amount if inserted values can be anything between 100 and 2000 and these queries takes from 1 second up to 60 seconds to complete.


Top
   
 Post subject:
PostPosted: Sun Jan 10, 2010 5:51 am 
Offline
Senior Newbie

Joined: Wed Jul 29, 2009 12:34 pm
Posts: 10
Website: http://www.simonbyholm.com
Location: Finland
Code:
$query = "SELECT * from " . $this->dbprefix . "keywordindex where keywordid='" . $keyword_id . "' order by score desc limit " . $limit . ";";


Actually I found the problem with that query now, it's not using the index for the order by part only for the select. I don't quite understand though why it's not using the index.

This is the explain output

Code:
EXPLAIN SELECT * FROM `sbs_keywordindex` where keywordid=20 order by score desc limit 100

id    select_type    table    type    possible_keys    key    key_len    ref    rows    Extra
1    SIMPLE    sbs_keywordindex    ref    keywordid    keywordid    4    const    92347    Using where


Any ideas, anyone?


Top
   
 Post subject:
PostPosted: Sun Jan 10, 2010 7:38 am 
Offline
Senior Newbie

Joined: Tue Mar 03, 2009 4:12 pm
Posts: 5
sbyholm: It looks like MySQL does use index for sorting, you'd have Using filesort in the Extra column otherwise.

See: http://dev.mysql.com/doc/refman/5.0/en/ ... ation.html


Top
   
Display posts from previous:  Sort by  
Post new topic  Reply to topic


Who is online

Users browsing this forum: No registered users and 0 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  
RSS

Powered by phpBB® Forum Software © phpBB Group