Linode Forum
Linode Community Forums
 FAQFAQ    SearchSearch    MembersMembers      Register Register 
 LoginLogin [ Anonymous ] 
Post new topic  Reply to topic
Author Message
 Post subject:
PostPosted: Sun May 08, 2011 11:01 am 
Offline
Junior Member

Joined: Wed May 04, 2011 8:06 pm
Posts: 35
And the table has below 500 rows. I'm lost.


Top
   
 Post subject:
PostPosted: Sun May 08, 2011 12:06 pm 
Offline
Junior Member

Joined: Wed May 04, 2011 8:06 pm
Posts: 35
Still the same pattern though.

Most SELECT/UPDATE on that table get executed within 3ms. And some (identical in structure) go beyond 1s. I don't get it.


Top
   
 Post subject:
PostPosted: Sun May 08, 2011 1:14 pm 
Offline
Senior Member
User avatar

Joined: Sat Aug 30, 2008 1:55 pm
Posts: 1739
Location: Rochester, New York
How about with an index on 'id' alone?

Still sounds weird, whatever's going on here... is it reproducible for particular queries, or does it happen on a periodic basis (every x seconds/minutes)?

Might be worthwhile watching "vmstat 1" and seeing if there's a spike in I/O around the time of a slow query. Sounds like there's also a fair amount of writes to go along with the reads, so maybe it is picking that moment to sync to disk...

(I really oughta read up on PostgreSQL more. I have a fairly active database running on it, sure, but it hasn't needed a lot of tuning, and most situations like this get covered up by considerable memcacheding.)

_________________
Code:
/* TODO: need to add signature to posts */


Top
   
 Post subject:
PostPosted: Sun May 08, 2011 1:26 pm 
Offline
Junior Member

Joined: Wed May 04, 2011 8:06 pm
Posts: 35
I tried an index on id, it didn't change anything.

Then I removed the index altogether and I still have slow queries, but they're in the 100-200ms range instead of 1s.


I am obviously monitoring I/O, CPU load and others... the server is really underused.


Top
   
 Post subject:
PostPosted: Sun May 08, 2011 2:09 pm 
Offline
Senior Member
User avatar

Joined: Sat Aug 30, 2008 1:55 pm
Posts: 1739
Location: Rochester, New York
On my bookshelf lives SQL Antipatterns, which does, indeed, suggest that indexes can make things worse. This might be one of those times.

How closely are you watching IO? This is one of those cases where you almost need as much resolution as you can get; five minutes won't cut it. Watching vmstat on a one-second interval should help cut through the noise and point towards where a problem might be, without capturing too much data and creating its own IO storm.

_________________
Code:
/* TODO: need to add signature to posts */


Top
   
 Post subject:
PostPosted: Sun May 08, 2011 4:09 pm 
Offline
Senior Member

Joined: Wed May 13, 2009 1:18 am
Posts: 681
When it comes to trying to tune PostgreSQL I'd probably suggest you try the PostgreSQL mailing lists as you'll have a far greater pool of experienced readers there.

With that said, I suspect that focusing on the table structure, or indices, or certainly the data types is not going to make any difference. From what I can see you're talking quite small amounts of data, and even full scans probably shouldn't be subject to that much variation. Certainly worrying about an integer vs. 64 character strings as an index, or btree vs. hash indices is way premature optimization at this point (and unlikely to be a space-I/O-performance issue until you have a much larger data set). Yes, a large and frequent update cycle will create some bloat due to dead rows but if that was your problem I'd expect a gradual and overall slowdown in transaction rate, not spikes.

My first inclination is that given the sporadic nature and the significant deviations in time, that you're just running into host contention for disk I/O. That is, it's not really your schema structure at all.

However, it's also possible that there are some things you can check for which is increasing the odds of your Linode having a spike in I/O itself which could then make any host contention worse. A few thoughts (roughly oriented in decreasing priority):
  • If you haven't, enable long duration query logging to ensure that the delay is internal to the database. Plus you'll then have timestamps to correlate with other database logging.
  • If you haven't, enable checkpoint logging and see if it correlates at all to the increased timings. If your data patterns are creating large and brief checkpoints that can delay other I/O, you could increase checkpoint frequency and lengthen a checkpoint duration to more evenly distribute the load.
  • Do you have autovacuum enabled and is it running? Enable logging for it if not sure. Depending on your access patterns its default configuration may not be aggressive enough for a table that is frequently updated. Or conversely, it may be kicking in too aggressively and interfering a little, so you might wish to take more explicit control and only do the vacuums when you're not doing heavy updates. (Though again, total quantity of your data would seem to make this not a big deal either way).
  • When checking your query, be sure to use "explain analyze" and not just "explain" so it can include actual results vs. estimates. This is a pretty trivial planner result, but if you have a complicated one, http://explain.depesz.com/ can help highlight bottlenecks.
  • Does your Linode have enough memory allocated to buffer/cache and have you configured PostgreSQL reasonably. It's default configuration is extremely conservative. At a minimum check effective_cache_size so it has a reasonable estimate of system cache available to it. (Although given your data size and the query explain you posted, I doubt this is an issue specifically in this case)
  • Check what I/O scheduler your kernel is using. If it's defaulting to cfq, change to deadline or even noop. There's little benefit from cfq in a Linode environment (and with a shared RAID array with BBU) and it can delay PostgreSQL I/O in favor of other processes.
  • Try testing with fsync=off. Part of the bottleneck may be PostgreSQL waiting on WAL or table syncs, but the Linode RAID+BBU environment makes the need to wait on a sync much less of an issue. I don't think I'd run with fsync=off, but it is a useful test to see if it changes behavior. If you're using 8.3+, synchronous_commit=off should help with performance just as much but be safe from a consistency perspective (and data loss perspective given the BBU in the host RAID).
  • Do you have any long lived transactions (I suspect not, but...). If so, they can keep additional dead rows alive until they exit since they technically need visibility as of then they started.
  • Is anything else on your Linode generating sporadic I/O traffic? You might try something like iotop or collect to watch per process, and at a high frequency while you try to generate a delayed query. If the increase in I/O at that point in time isn't huge, then it's good odds that it's something beyond your control.
  • If you can watch your I/O Wait % during such a problem, if that jumps very high, it's also a good indication that you're purely waiting on I/O - which can be normal in some circumstances - but only if you are generating a lot.

And finally, in terms of testing the basic setup, another choice is to spin up another Linode, clone your current one over there, and try the same tests. If you get different behavior, it again may point at heavier than usual host load. Another similar alternative would be to set up a VM at home with the same configuration - where you know there is no additional contention for disk I/O - and see how the same configuration and data performs there.

Assuming you find a discrepancy compared to your current Linode under the same configuration, at that point you could open a ticket and Linode will probably offer to migrate you to another host. Or you could just do it yourself by switching to your second Linode as primary and deleting the first if feasible.

-- David


Top
   
 Post subject:
PostPosted: Sun May 08, 2011 9:55 pm 
Offline
Junior Member

Joined: Wed May 04, 2011 8:06 pm
Posts: 35
Hi David,

Thank you for all these suggestions. I enabled query logging for long queries as a first step, I'll see tomorrow what the results are when users come back to the site.
Then I'll try to play with the linux I/O scheduler and fsync as you advised.

Thanks!


Top
   
 Post subject:
PostPosted: Sun May 08, 2011 9:57 pm 
Offline
Junior Member

Joined: Wed May 04, 2011 8:06 pm
Posts: 35
I'm measuring I/O with iostat every 5 min indeed. Tried to vo "vmstat 1" but there's really nothing going on, a few blocs out here and there but not much.


Top
   
 Post subject:
PostPosted: Sun May 08, 2011 11:07 pm 
Offline
Senior Member

Joined: Fri Jan 09, 2009 5:32 pm
Posts: 634
sob wrote:
I switched to a 32 bit integer for id, generated by rand().


why rand instead of using auto_increment?

though if you go auto_increment, do not use another value in the primary key


Top
   
 Post subject:
PostPosted: Sun May 08, 2011 11:54 pm 
Offline
Senior Member

Joined: Wed May 13, 2009 1:18 am
Posts: 681
sob wrote:
I'm measuring I/O with iostat every 5 min indeed. Tried to vo "vmstat 1" but there's really nothing going on, a few blocs out here and there but not much.

5 minute cycles are not going to be anywhere near granular enough for something like this. You'll want at least every second, or more frequently if the tool permits and you're able to reproduce the issue in a relatively tight time frame.

One thing I do sometimes is use a tool like vmstat, iostat, etc.. in a small shell loop where I let it run once a second for maybe 60 seconds, then generate a timestamp and then go back and loop again. To avoid polluting the I/O with the stat logging itself, just leave an ssh connection running the script and log the information on a remote host.

That way if you have a timestamp (in this case from the database long duration logging) you can find the nearest timestamp in the stat output logging and see what was happening in the vicinity.

Oh, and to your prior response, if you're enabling any logging, you may as well ensure that all of long duration queries, checkpoints and autovacuums are all logged - no reason to wait to at least turn on the logging for diagnostic purposes.

-- David


Top
   
 Post subject:
PostPosted: Mon May 09, 2011 12:04 am 
Offline
Junior Member

Joined: Wed May 04, 2011 8:06 pm
Posts: 35
>glg

rand() so that they're not predictable

>db3l

of course 5 min is not granular enough, but I did run vmstat 1 on screen after your comment, trying to find spikes in I/O, and didn't see anything crazy

I enabled logging for long queries and reduced other I/O from other sources on the guest (such as debug logging of a couple custom apps), that way I hope to verify where the delays comes from (postgres or outside)

autovacuum is disabled (I prefer to run it myself at night, and there's not much to vacuum anyway as I move rows only at night in my archiving batch, and VACUUM ANALYZE after)


Top
   
 Post subject:
PostPosted: Mon May 09, 2011 12:33 am 
Offline
Senior Member

Joined: Wed May 13, 2009 1:18 am
Posts: 681
sob wrote:
of course 5 min is not granular enough, but I did run vmstat 1 on screen after your comment, trying to find spikes in I/O, and didn't see anything crazy

Did you get one of the longer queries occurring while you were running the vmstat? If so, and if there wasn't any noticeable increase in I/O load at the time, that's probably a reasonably good indicator that you're running into contention on the host external to your own guest. Though in that case I would expect you to see a noticeable spike in I/O wait percentage (final column in vmstat).

Of course, if you didn't have a long query simultaneous with the vmstat logging, then that particular run isn't all that useful.

-- David


Top
   
 Post subject:
PostPosted: Mon May 09, 2011 4:22 pm 
Offline
Senior Member
User avatar

Joined: Tue May 26, 2009 3:29 pm
Posts: 1691
Location: Montreal, QC
To rule out someone else on your linode causing this, spinning up a second linode to do the tests is a good idea. It's cheap, too. Pro-rated by the hour, I think, so 3 hours of testing with a 512 should cost you 8 or 9 cents?


Top
   
 Post subject:
PostPosted: Mon May 09, 2011 9:10 pm 
Offline
Junior Member

Joined: Wed May 04, 2011 8:06 pm
Posts: 35
Back with more info.

I'm logging queries > 20 ms in Postgresql, and the queries that php logs as >200ms (for instance) don't even show up as >20ms in the postgresql logs. That goes in the direction of contention for host IO right?

Checkpoints and long queries don't coincide, although I must admit I'm not sure what to make of such debug:

Code:
2011-05-09 16:30:00 EDT LOG:  checkpoint complete: wrote 194 buffers (6.3%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=38.671 s, sync=0.002 s, total=38.675 s


I have 3 segments per checkpoint, of 16 MB each. So does this mine the 2*16MB write was spread over 38s?

I enabled a vmstat 1 60 && date in a loop to see if I get huge IO spikes around the time the long queries take place.

Thanks again.


Top
   
 Post subject:
PostPosted: Mon May 09, 2011 9:48 pm 
Offline
Senior Member

Joined: Wed May 13, 2009 1:18 am
Posts: 681
sob wrote:
I'm logging queries > 20 ms in Postgresql, and the queries that php logs as >200ms (for instance) don't even show up as >20ms in the postgresql logs. That goes in the direction of contention for host IO right?

Actually, no, that argues for no contention, unless your PHP script is also doing a lot of I/O. In other words, the database is able to do all of it's work (including I/O) for all the queries made while that logging threshold was in place, without exceeding the threshold, which means that out of those 200ms, less than 10% is the database and disk I/O.

I suppose if I were you I might very briefly drop the threshold a bit further - something that has to be exceeded - just to make sure the logging was in fact working.

What sort of processing is the PHP code doing - are there any potential bottlenecks or ways it could get delayed?

Quote:
I'm not sure what to make of such debug:

Code:
2011-05-09 16:30:00 EDT LOG:  checkpoint complete: wrote 194 buffers (6.3%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=38.671 s, sync=0.002 s, total=38.675 s


I have 3 segments per checkpoint, of 16 MB each. So does this mine the 2*16MB write was spread over 38s?

Not quite. You get a checkpoint when either checkpoint_segments WAL segments have been written or the checkpoint timeout exceeded (whichever comes first). But how much data needs to be forced out at checkpoint time depends on how well the background writer has been keeping up with the actual changes.

See http://developer.postgresql.org/pgdocs/ ... ation.html for some details (adjust as needed for your PostgreSQL release), and http://www.westnet.com/~gsmith/content/ ... bgw-83.htm has some useful stuff from when the newer checkpoint logic was being released.

At the point of a checkpoint, any remaining data that hasn't made it to the actual table/index files is flushed, at which point the WAL records are no longer needed for recovery (a checkpoint record is written to the WAL at that point). It is also at that point that earlier WAL segment files are permitted to be re-used.

In terms of the checkpoint, the log indicates how many remaining buffers were dirty and needed to be written, and the overall time to do so. The latter aims to spread your checkpoint over a time equal to the configured completion target as a percentage of the overall checkpoint timeout. This helps spread the I/O overhead.

So in your case, at the checkpoint time you had 194 remaining buffers to flush. I think a buffer is 8K, so that's about 1.5MB of data. The I/O was spread over 38 seconds. And the final fsync to ensure it was on the disk took almost no time. That latter point in particular is a good indication that there is little contention, since otherwise you'd block there waiting for the disk to fully acknowledge the sync.

If you wanted to minimize the checkpoint spike even further you could increase the checkpoint frequency (either by dropping the timeout or segment limit) which would then most likely have less dirty data to flush each time. But this is a pretty tiny amount of data.

To be honest, if the database isn't logging any queries beyond 20ms I'm not sure your issue is at the database level at all. You might need to work your way up the stack, and perhaps add some intermediate logging in your PHP code to try to determine where the remaining 180+ms is coming from in the longer queries. To minimize intrusion of the monitoring you could just store a timestamp at various points in the code path and then have a single top-level handler dump all the timestamps/deltas as part of some final output/log.

This assumes that you've definitely caught one of those 1000ms+ queries that started the thread. If not, then I suppose those extreme cases might still be contention within the database layer.

-- David


Top
   
Display posts from previous:  Sort by  
Post new topic  Reply to topic


Who is online

Users browsing this forum: No registered users and 3 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  
RSS

Powered by phpBB® Forum Software © phpBB Group