Linode Forum
Linode Community Forums
 FAQFAQ    SearchSearch    MembersMembers      Register Register 
 LoginLogin [ Anonymous ] 
Post new topic  Reply to topic
Author Message
PostPosted: Tue Jan 13, 2009 10:52 am 
Offline
Senior Newbie

Joined: Mon Dec 08, 2008 10:59 am
Posts: 10
Website: http://www.tumbledesign.com
AOL: Nickmunstr
Location: Troy, NY
Hi all,

I am hosting www.nicetranslator.com on my linode (this is the only live site its hosting).

My goal is to have it logging every phrase translated (after the user waits 1.5/s without typing).

I have a table of phrases:

phraseID | phrase | phraseLang | sourceCount | resultCount
(sourceCount is the number of times the phrase is the source, result is number of times its the result)


And a table of translations

translationID | ip | timeSubmitted | sourceID | resultID

The following is the PHP I've got (including the functions I've used, rowExists() and getID() ):
Code:
<?php
   require_once('../startFiles/StartPhp.dinc.php');
   $userIP = $_GET['ip'];
   $source = addslashes($_GET['source']);
   $result = addslashes($_GET['result']);
   $srcLang = $_GET['srcLang'];
   $resLang = $_GET['resLang'];
   if(($source == '') || ($result == '')){
      die();
   }
   
   /* =Insert phrase if its new, otherwise increment its count
   ---------------------------------------------*/
   if(rowExists("phrases", "phrase = '$source' AND phraseLang = '$srcLang'")){
      $incSrcPhrase = mysql_query("UPDATE phrases SET sourceCount = sourceCount + 1 WHERE phrase = '$source' AND phraseLang ='$srcLang'")or die(mysql_error());
   }
   else{
      $insertSrcPhrase = mysql_query("INSERT INTO phrases(phraseLang, phrase, sourceCount) VALUES('$srcLang', '$source', 1)")or die(mysql_error());
   }
   
   if(rowExists("phrases", "phrase = '$result' AND phraseLang = '$resLang'")){
      $incResPhrase = mysql_query("UPDATE phrases SET resultCount = resultCount + 1 WHERE phrase = '$result' AND phraseLang ='$resLang'");
   }
   else{
      $insertResPhrase = mysql_query("INSERT INTO phrases(phraseLang, phrase, resultCount) VALUES('$resLang', '$result', 1)");
   }

   /* =Log translation [ I've kept this commented out as of recent testings, doesn't seem to lessen the load ]
   ---------------------------------------------*/
   $srcPhraseID= getID("phrases", "phrase", "phrase = '$source'");
   $resPhraseID= getID("phrases", "phrase", "phrase = '$result'");
   $insertTranslation = mysql_query("INSERT INTO translations(ip, sourceID, resultID) VALUES('$userIP', '$srcPhraseID', '$resPhraseID')");
?>
<?php
   function rowExists($table, $where){
      $numQuery = mysql_query("SELECT * FROM $table WHERE $where") or die(mysql_error());
      $numRows = mysql_num_rows($numQuery);
      return ($numRows>0);
   }
   function getID($table, $prefix, $where){
      $theID = $prefix . "ID";
      $queryRow = mysql_fetch_row(mysql_query("SELECT $theID FROM $table WHERE $where")) or die(mysql_error());
      return $queryRow[0];
   }
?>


When I run this, my CPU use instantly spikes to over 100%. I tried commenting out all the code querying the translation table, so it is only saving phrases and I get the same result.

I have run the optimization, which seemed to have some small affect when I first ran it, but no longer does.

The phrase table now has about 230,000 rows.

To be honest, this is my first project to draw such an audience so my first time dealing with a database this size. I was under the impression, though, that mySQL was quite apt to handle such a job.

What am I missing? Your help is very much appreciated.

Thanks for your time,
Nicky


Top
   
 Post subject:
PostPosted: Tue Jan 13, 2009 12:08 pm 
Offline
Senior Member

Joined: Fri May 02, 2008 8:44 pm
Posts: 1121
1. Do you have a primary key and some indexes defined on your tables? Every column that appears in your WHERE clause must have an index on it, otherwise MySQL has to scan all the rows.

2. Is addslashes() causing a problem? You might get unpredictable behavior if you escape a string that has already been escaped by magic quotes. addslashes() also has a problem with non-ASCII character sets, which you'll come across pretty soon as your site is about translating to/from foreign languages. Turn off magic quotes if you can, and use mysql_real_escape_string() instead of addslashes() for real escaping.

3. rowExists() seems rather inefficient for your purpose. What about INSERT ... ON DUPLICATE KEY UPDATE ?

4. Does MySQL have enough key_buffer to keep all the indexes in RAM?

etc.


Top
   
 Post subject:
PostPosted: Tue Jan 13, 2009 2:59 pm 
Offline
Senior Newbie

Joined: Mon Dec 08, 2008 10:59 am
Posts: 10
Website: http://www.tumbledesign.com
AOL: Nickmunstr
Location: Troy, NY
Here's the table structure

Code:
CREATE TABLE `phrases2` (
  `phraseID` bigint(20) unsigned NOT NULL auto_increment,
  `phrase` text character set utf8 collate utf8_roman_ci NOT NULL,
  `phraseLang` text character set utf8 collate utf8_roman_ci NOT NULL,
  `sourceCount` bigint(20) unsigned NOT NULL default '0',
  `resultCount` bigint(20) unsigned NOT NULL default '0',
  PRIMARY KEY  (`phraseID`),
  KEY `sourceCount` (`sourceCount`),
  KEY `resultCount` (`resultCount`),
  FULLTEXT KEY `phrase` (`phrase`),
  FULLTEXT KEY `phraseLang` (`phraseLang`)
) ENGINE=MyISAM  DEFAULT CHARSET=utf8 COLLATE=utf8_bin AUTO_INCREMENT=19312 ;


As you can see I have made 'phrases2' (identical to the original phrases table) which has 19312 entries since about an hour ago when I made the new table... The CPU is running, of course, much lower now but steadily climbing as entries increase.

hybinet:

Thanks for the tip on magic quotes/mysql_reaL_escape_string(); using that now.

As for INSERT ... ON DUPLICATE KEY UPDATE, I don't think this will work in my situation because the phrase itself is not a unique key. The phrase-language combination is unique. So, if a phrase happens to be common among two languages, I wouldn't want it to be reported as otherwise. Any other ideas on that though?

I hadn't known about the key_buffer before, but looked it up and set it to 64M as the manual suggests for systems with over 256MB of RAM... Does that seem to make sense?


Given what you have seen from the situation, does it seem like there is definitely a problem or that we are truly approaching the limit of our CPU?

I imagine this will force us to design a smarter way of storing phrases, but I am still very curious about what the limits of servers of this size actually are.

Thanks again for the help


Top
   
 Post subject:
PostPosted: Tue Jan 13, 2009 6:16 pm 
Offline
Senior Member

Joined: Fri May 02, 2008 8:44 pm
Posts: 1121
MyISAM fulltext indexes are only helpful if you're trying to look up a few keywords from a large chunk of text and then rank the results by relevance. Since you're only interested in finding out if an exact phrase already exists in the table, the fulltext index is totally useless. You'll get much better performance by changing the two "text" columns to "varchar" and creating a regular index on those two columns.

The phrase itself might not be unique, but the combination of the phrase and the language is, right? In that case, you can have a char(40) column as your primary key and put sha1($srcLang.$source) into it. That'll serve as a unique key across all phrases and languages, and spare you from having to keep large indexes on the phrases themselves. (This might or might not work for you, depending on what else you're trying to do with this table.)

Here's another way to avoid having to perform two queries at every single insert. Do an UPDATE first, and call mysql_affected_rows() to see if any rows were updated. If the result is zero, then go ahead and do an INSERT.

Use utf8-unicode-ci throughout if you expect any non-latin characters (such as Japanese) to be inserted into the DB. Also look into mysql_set_charset() if you haven't already done so. Foreign languages need special care!

64M key_buffer seems more than enough for the purpose. Just get rid of those meaningless fulltext indexes.

Hitting the limits of the CPU? No way... I used to have a 4.5GB table with 300K rows in it. With proper indexing, I could perform SELECT's on it all day long without ever hitting 10% CPU. And that was on a Linode 360.


Top
   
 Post subject:
PostPosted: Tue Jan 13, 2009 7:27 pm 
Offline
Senior Member

Joined: Thu Apr 03, 2008 12:02 am
Posts: 103
AOL: derole
As hybinet pointed out, keys are absolutely essential to get some decent query performance from your DB.

But try to avoid indices on columns that don't need an index on (e.g. the count columns). The index needs to be updated for each modification of the table --> expensive.

Another thing you could improve is the way you collect your stats.
Do you really need 1oo% real-time information about which phrase was translated how often ? I doubt it... Row updates are expensive and appending new rows is much cheaper. You could log the phrase/language records to a separate table as they are queried from the web app. Then aggregate the data every 5/10/60/whatever minutes, delete the old log records and store the results in table.


Top
   
 Post subject:
PostPosted: Wed Jan 14, 2009 1:09 am 
Offline
Senior Newbie

Joined: Mon Dec 08, 2008 10:59 am
Posts: 10
Website: http://www.tumbledesign.com
AOL: Nickmunstr
Location: Troy, NY
Wow, amazing stuff.

Using sha1() and changed phrases to a varchar as you suggested (using on duplicate key) and that seems to have completely eliminated the problem; running smooth as silk at the moment.

Thanks so much hybinet

oliver, I think I've cleaned up the indices now. I like your suggestion, and that's sord of what I had in mind when I was talking about coming up with a smarter way of handling this data. If there are future problems we'll likely go ahead with something along those lines.

Thanks again for all the help, very nicely done.


Top
   
 Post subject:
PostPosted: Wed Jan 14, 2009 9:31 am 
Offline
Senior Member

Joined: Thu Apr 03, 2008 12:02 am
Posts: 103
AOL: derole
For future reference, this is an excellent article covering pretty much what you're trying to do:

http://www.mysqlperformanceblog.com/2008/12/22/high-performance-click-analysis-with-mysql/


Top
   
Display posts from previous:  Sort by  
Post new topic  Reply to topic


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  
RSS

Powered by phpBB® Forum Software © phpBB Group