Linode Forum
Linode Community Forums
 FAQFAQ    SearchSearch    MembersMembers      Register Register 
 LoginLogin [ Anonymous ] 
Post new topic  Reply to topic
Author Message
PostPosted: Mon Oct 03, 2011 1:19 pm 
Offline
Senior Newbie

Joined: Sat Aug 07, 2010 12:57 pm
Posts: 16
Hello!

I am writing an API that'll be running on my Apache Linode instance. This API will not be calculation heavy, but it could be very read-heavy (many users requesting stored data). Since I have not written the database component of the project yet, I am looking at not using MySQL. There are several ports or improvements upon MySQL out there, and I was wondering what others have thought of the newer systems available.

For example: Percona Server offers itself as a "backwards-compatible replacement for MySQL that is much faster and more scalable, easier to monitor and tune, and has features to make operational tasks easier." Meanwhile, Drizzle is a MySQL port that has "removed non-essential code, re-factored the remaining code, and converted the code to modern C++ and modern libraries." Both claim to have better performance, and you can find plenty of testimonials talking about the new systems. Frankly, given the stagnation of MySQL I'd be happier using an alternative. The main concerns I have are:

1) Ease of use for multiple applications: Whatever I end up using, I'd like to use for future projects as well. The current project will probably talk to the database with Perl, which I suspect can connect to any database, but large compatibility is a plus.

2) Ease of installation: I'm running Debian 6, and (for example) it appears Drizzle is not yet available through apt-get. None of my work is production-critical so I am okay playing with early versions of software, as long as it's not too painful to install and maintain.

3) Efficient use of memory: I am currently using a Linode 512, which also hosts a few other small programs. I will upgrade to a 1024 if my API ends up being very widely used, but I'd like to remain on a small host if possible.

4) Choosing a sane system: I'll be honest, I don't know enough about SQL to make an informed decision. For example. the benchmarks for Percona are more impressive, but it feels very "astroturfed" and it's hard to get solid information about it. And I've only looked at these two systems, I'm sure there are many more.


So, my open-ended question: which do I actually use?!
I only mentioned the two most obvious choices, but of late the SQL world seems to have exploded.
I would be interested to hear any thoughts or comments.

Thank you for your time :D


Top
   
 Post subject: great question
PostPosted: Mon Oct 03, 2011 2:47 pm 
Offline
Junior Member

Joined: Fri Jul 08, 2011 7:46 pm
Posts: 44
Website: http://ericsonwilkinson.me
Location: United States
If you're willing to try newer software for better performance, try nginx instead of apache.

I'd also like to know a more scalable/faster mysql product to use.


Top
   
 Post subject:
PostPosted: Mon Oct 03, 2011 5:26 pm 
Offline
Senior Member
User avatar

Joined: Tue May 26, 2009 3:29 pm
Posts: 1691
Location: Montreal, QC
You left out MariaDB, which is seen by many as the "official" continuation of MySQL, since it's developed by Monty Widenius, the creator of MySQL (My and Maria are the names of his daughters).

Drizzle seems to be more for cloud use (and while linode is a cloud provider, one VPS does not a cloud make), and isn't updating from MySQL as it goes; they want to use MySQL only as a starting point. It's also not stable; they're still in "beta".

That leaves Percona and MariaDB. My understanding is that MariaDB ships with Percona's XtraDB storage engine (replaces InnoDB) anyhow, so that's not really an advantage either way. I'd personally lean towards MariaDB since it seems to want to hew closer to MySQL in terms of compatibility, but they're both drop-in replacements, so I can't really say which is better.


Top
   
 Post subject:
PostPosted: Mon Oct 03, 2011 6:00 pm 
Offline
Senior Member

Joined: Fri May 02, 2008 8:44 pm
Posts: 1121
Are you limited to databases that are compatible with MySQL?

If not, you might want to take a look at PostgreSQL.


Top
   
 Post subject:
PostPosted: Mon Oct 03, 2011 6:25 pm 
Offline
Senior Member
User avatar

Joined: Sat Aug 30, 2008 1:55 pm
Posts: 1739
Location: Rochester, New York
+1 for PostgreSQL. I know a heck of a lot about tuning MySQL and working around its problems, but very little about tuning PostgreSQL. It just does its job, and supports (most) all of the stuff you expect a RDBMS to support.

Also, one word: caching. You don't want to do the same (read) query twice if you can avoid it. Any decent framework/ORM supports memcached(*), which is a great way to get key-value performance with relational consistency. -rt

(*) I'm using the word "decent" here like I'd use it in "Any decent RDBMS supports transactions", which is to imply that any RDBMS that doesn't support transactions is crap. (Hi there, defaults for MySQL pre-5.5)

_________________
Code:
/* TODO: need to add signature to posts */


Top
   
 Post subject:
PostPosted: Tue Oct 04, 2011 10:48 am 
Offline
Senior Member
User avatar

Joined: Tue May 26, 2009 3:29 pm
Posts: 1691
Location: Montreal, QC
hoopycat wrote:
Also, one word: caching. You don't want to do the same (read) query twice if you can avoid it. Any decent framework/ORM supports memcached(*), which is a great way to get key-value performance with relational consistency. -rt


I haven't played with memcached, but most RDBMS will do query caching if you do pass it the same read query twice; I'd tend to lean towards letting the RDBMS worry about caching rather than doing it by hand, but perhaps there are big performance improvements from using memcached rather than the database server's query cache?

hoopycat wrote:
(*) I'm using the word "decent" here like I'd use it in "Any decent RDBMS supports transactions", which is to imply that any RDBMS that doesn't support transactions is crap. (Hi there, defaults for MySQL pre-5.5)


Defaults or not, let's be fair, MySQL has supported transactions with InnoDB for over a decade. Then again, I frequently mock Apache for being completely out of touch with reality and choosing ludicrous httpd default settings, so default settings do have some weight.


Top
   
 Post subject:
PostPosted: Tue Oct 04, 2011 3:09 pm 
Offline
Senior Member
User avatar

Joined: Sat Aug 30, 2008 1:55 pm
Posts: 1739
Location: Rochester, New York
On caching: There's ordinarily some processing that happens after the query results are returned, which can be fairly intensive depending on your application. It's not too much of a stretch to think of a HTML page being built of 10-20 queries and perhaps a half-dozen components (header, footer, content, content tools, ads, ...). The closer to the end you can cache the output, the happier things will be. (tl;dr: cache the steak, not the grass)

On the MyISAM jab: Yes, I was hesitant to include it, but it's an important reflection on MySQL-as-a-project's decision-making process, and I did qualify it to reflect the change in 5.5. That, and Apache, have assured me that even blatantly dangerous and poorly-justified default settings in software won't get you sued out of existence. :-)


Top
   
 Post subject:
PostPosted: Tue Oct 04, 2011 4:32 pm 
Offline
Senior Member

Joined: Mon Dec 07, 2009 6:46 am
Posts: 331
Postgresql does not have query caching because such thing is theoretically impossible in the context of a transaction.

HOWEVER, Pgsql relies a lot on filesystem cache and mmaps, so at least something is cached.

In my Python apps I can easily integrate Beaker caching framework with SQLAlchemy and basically have query-level caching.


Top
   
 Post subject:
PostPosted: Tue Oct 04, 2011 5:39 pm 
Offline
Senior Member
User avatar

Joined: Tue May 26, 2009 3:29 pm
Posts: 1691
Location: Montreal, QC
Azathoth wrote:
Postgresql does not have query caching because such thing is theoretically impossible in the context of a transaction.


Umm, no, don't be silly. Transactions don't make query caching terribly more difficult, and a lot of transaction-supporting databases (SQL Server, MySQL, etc) do support it.


Top
   
 Post subject:
PostPosted: Tue Oct 04, 2011 6:56 pm 
Offline
Senior Member

Joined: Mon Dec 07, 2009 6:46 am
Posts: 331
Guspaz wrote:
Umm, no, don't be silly. Transactions don't make query caching terribly more difficult, and a lot of transaction-supporting databases (SQL Server, MySQL, etc) do support it.


Well, I could be wrong of course, but as far as I know, PostgreSQL does not have query cache like MySQL for the same reason it can't cache the row count but has to count row by row -- because each transaction can see different set of available rows. PostgreSQL's caching is done differently and relies on OS system cache.


Top
   
 Post subject:
PostPosted: Tue Oct 04, 2011 8:23 pm 
Offline
Senior Member

Joined: Fri May 02, 2008 8:44 pm
Posts: 1121
MySQL invalidates the query cache every time something changes in the same table. So if your app does anything as simple as incrementing a hit-count field at the beginning of a request, every query that involves that table will have to start from scratch instead of using the query cache. This happens on top of the caching that the OS already does, so there is a bit of overhead. According to the Manual, the overhead can be as large as 13% if none of your queries are cache-able.

PostgreSQL works differently. It just tells the OS to cache parts of the database file(s), and then reads from those files as usual. The OS decides whether to read from RAM or the hard drive. Unlike MySQL, PostgreSQL does not maintain a separate cache for itself, except for some things that make sense to keep in RAM at all times. This approach makes the caching system simpler and sometimes even faster, especially in write-heavy use cases. But you often need to fiddle with kernel parameters in order for PostgreSQL to cache things properly.

Or at least that's what I remember. Databases change all the time, so the caching behavior may be different in current versions.


Last edited by hybinet on Tue Oct 04, 2011 8:26 pm, edited 1 time in total.

Top
   
 Post subject:
PostPosted: Tue Oct 04, 2011 8:26 pm 
Offline
Senior Member

Joined: Fri May 02, 2008 8:44 pm
Posts: 1121
[repeat post deleted]


Top
   
 Post subject:
PostPosted: Wed Oct 05, 2011 10:13 am 
Offline
Senior Member
User avatar

Joined: Tue May 26, 2009 3:29 pm
Posts: 1691
Location: Montreal, QC
hybinet wrote:
PostgreSQL works differently. It just tells the OS to cache parts of the database file(s), and then reads from those files as usual. The OS decides whether to read from RAM or the hard drive. Unlike MySQL, PostgreSQL does not maintain a separate cache for itself, except for some things that make sense to keep in RAM at all times. This approach makes the caching system simpler and sometimes even faster, especially in write-heavy use cases. But you often need to fiddle with kernel parameters in order for PostgreSQL to cache things properly.


That's just caching the disk access, not any of the data manipulation. A query cache doesn't cache the source data that is being accessed, it caches the *results* of the query. If there is math involved, or sorting, or even if you're just selecting a small amount of data from a large table, the query cache is going to save you all that work. A disk cache doesn't help with any of that.


Top
   
 Post subject:
PostPosted: Wed Oct 05, 2011 11:26 am 
Offline
Senior Member

Joined: Fri May 02, 2008 8:44 pm
Posts: 1121
Guspaz wrote:
That's just caching the disk access, not any of the data manipulation. A query cache doesn't cache the source data that is being accessed, it caches the *results* of the query. If there is math involved, or sorting, or even if you're just selecting a small amount of data from a large table, the query cache is going to save you all that work. A disk cache doesn't help with any of that.

You're absolutely right. But unless you want your database to return stale data, the database needs to keep track of a lot of things if it is going to maintain a cache for query results. MySQL does this in a rather rudimentary fashion: whenever anything changes in a table, invalidate all cache entries that are related to that table. Even this approach results in a non-negligible amount of overhead.

A disk cache, on the other hand, works fine even after a table changes, because the changed blocks are also cached by the OS. There is no need to invalidate any cache entries. So MySQL's query caching would be faster in read-heavy use cases, whereas PostgreSQL's approach would be faster in write-heavy use cases. (Of course, there are many other factors to DB performance, too.)

I don't think this characterization is too far off from the way the two databases are usually thought of. It seems, however, that there is now a program called PostgreSQL Query Cache. I wonder if this would give us the best of both worlds.


Top
   
 Post subject:
PostPosted: Wed Oct 05, 2011 11:39 am 
Offline
Junior Member

Joined: Tue Aug 16, 2011 7:58 pm
Posts: 25
Guspaz wrote:
Transactions don't make query caching terribly more difficult


Can't speak to transactions, but MVCC, which both Innodb and Postgres use, do. See http://bit.ly/rdASo5.

The net of all of this is that caches some times help and sometimes hurt, they're hard to get right, and MySQL has query cache but Postgres does not. (There's a recent add-on query cache for Postgres but it's not well vetted at this point.)


Top
   
Display posts from previous:  Sort by  
Post new topic  Reply to topic


Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  
RSS

Powered by phpBB® Forum Software © phpBB Group