Linode Forum
Linode Community Forums
 FAQFAQ    SearchSearch    MembersMembers      Register Register 
 LoginLogin [ Anonymous ] 
Post new topic  Reply to topic
Author Message
PostPosted: Thu May 05, 2011 10:41 am 
Offline
Junior Member

Joined: Wed May 04, 2011 8:06 pm
Posts: 35
So I have this table:

Quote:
=> \d tokens
Table "public.tokens"
Column | Type | Modifiers
------------+--------------------------+-----------
session_id | character varying | not null
username | character varying | not null
expires | timestamp with time zone | not null
id | character varying | not null
created | timestamp with time zone | not null
Indexes:
"tokens_pk" PRIMARY KEY, btree (id)


I measured the performance of UPDATEs/SELECTs on this table using the primary key in the where clause, and while most of the time the query completes in 2-3 ms, sometimes it can take 500-1000ms (0.3% of the time, but still).

My first two reactions were to REINDEX and DELETE/VACUUM/ANALYZE since old records are useless to me. That didn't change anything.

According to my measurement over the 250k last SELECTs/UPDATEs, the average query execution time was 3 ms on average with a standard deviation of 30 ms (variance of 1000 ms2).

Now something worth knowing is that my id is quite large and the query plan is the following:

Code:
                                              QUERY PLAN
-------------------------------------------------------------------------------------------------------
 Index Scan using tokens_pk on tokens  (cost=0.00..8.27 rows=1 width=128)
   Index Cond: ((id)::text = '3aaa6faaab3aaadbaaaaaae181faaabc6aaaaa7c7aaaacce1da3aaa1222a5aaa'::text)
(2 rows)


I should add that the load on the server is quite low, there's CPU/RAM to spare and I/O is low also.

Would anyone know what's wrong with that table?


Top
   
 Post subject:
PostPosted: Thu May 05, 2011 2:32 pm 
Offline
Senior Member
User avatar

Joined: Tue May 26, 2009 3:29 pm
Posts: 1691
Location: Montreal, QC
While I can't comment on the slowdowns, your schema is odd. Why is your id a varchar rather than a bigint, or bigserial? If your session key is a fixed length, why do you use a varchar?

Your indexes are also odd, or rather your complete lack of them on anything but your primary key. You only *ever* do selects/updates with the id, never with the session_id, username, expires, created, or anything else in the where (or order by) clause?


Top
   
 Post subject:
PostPosted: Thu May 05, 2011 2:50 pm 
Offline
Junior Member

Joined: Wed May 04, 2011 8:06 pm
Posts: 35
id has to be random without collusion, it's as hash(rand()) to speak loosely

but I agree it's quite large (64 characters) and could/should very much be reduced by 3/4

the where clause is on (id,expires) 95% of the time and 5% on (id,username,expires) (the latter will change so that it's always on (id,expire)

keep in mind this works well in 99.% queries (<20 ms), and the table doesn't have a huge row count (it should oscillate between 5k and 20k)

for indexes, I assumed (maybe wrongly) that having one on id would be sufficient since id is unique and it's only a matter of checking if the token has expired or not

thanks for taking the time to answer


Top
   
 Post subject:
PostPosted: Thu May 05, 2011 3:14 pm 
Offline
Senior Member
User avatar

Joined: Sat Aug 30, 2008 1:55 pm
Posts: 1739
Location: Rochester, New York
I would second the recommendation to store it as an integer... converting the hash to a hexadecimal representation then converting it to an ASCII string and storing it in a text column of varying length doesn't make the database's job easier. This will also reduce the size of the column by half: instead of 512 bits (64 characters at 8 bits/character), you've got 256 bits (64 hexadecimal digits each representing 2^4 possible states of 0000..1111).

That will very likely grease the weasel all the way to market, but if not, an index on expires or (id, expires) might help too.

_________________
Code:
/* TODO: need to add signature to posts */


Top
   
 Post subject:
PostPosted: Thu May 05, 2011 4:33 pm 
Offline
Junior Member

Joined: Wed May 04, 2011 8:06 pm
Posts: 35
I absolutely agree that this design is by no means perfect. And I'd fix it immediately if it was critical. However, I have no reason whatsoever to think it's the cause of the issue I'm experiencing right now.

Edit: on second thought, the btree on that huge string is most likely sub-optimal, and if not balanced properly could lead to high lookup times for certain values (which could cause the issue) but again the table is small so even if the tree is the worse possible, I don't expect the performance to be that bad even in the worse case. Something to investigate anyway.


Top
   
 Post subject:
PostPosted: Thu May 05, 2011 5:02 pm 
Offline
Senior Member
User avatar

Joined: Tue May 26, 2009 3:29 pm
Posts: 1691
Location: Montreal, QC
Are you caching the entire table/index in memory? If you aren't, and your query happens in the middle of somebody else on the box doing a disk copy or other IO-intensive operation...

Is there anything that differentiates the slow queries from the fast queries? Are they only UPDATEs? Only SELECTs?


Top
   
 Post subject:
PostPosted: Thu May 05, 2011 5:05 pm 
Offline
Junior Member

Joined: Wed May 04, 2011 8:06 pm
Posts: 35
both SELECT and UPDATES are slow, and even when the where clause is id,expires

the only thing that differs from a fast and a long query is the value of id... the rest of the clause being "expires > now()"

I'll try to see if some values cause of id are recurrently the source of issues

---

I didn't specify any caching of the table or index in memory, I'll look into it


Top
   
 Post subject:
PostPosted: Thu May 05, 2011 6:47 pm 
Offline
Senior Member
User avatar

Joined: Sat Aug 30, 2008 1:55 pm
Posts: 1739
Location: Rochester, New York
sob wrote:
Edit: on second thought, the btree on that huge string is most likely sub-optimal, and if not balanced properly could lead to high lookup times for certain values (which could cause the issue) but again the table is small so even if the tree is the worse possible, I don't expect the performance to be that bad even in the worse case. Something to investigate anyway.


The ASCII representations for 0..9 and a..f don't lend themselves to a good random distribution, that's for sure. :-)

Caching is another good thing to check for. Unfortunately, I'm not too familiar with PostgreSQL's tuning knobs, but disk will take forever (comparatively speaking).

The more I think about it, the more I like the idea of sticking an index on (id, expires)... if you're just looking for the existence or not of a row where id=bleh,expires>now(), I have reason to believe a btree can just crap that out straight from the index. 'course, if you're doing SELECT session_id,username,etc,etc, you probably won't buy too much except on a miss.

_________________
Code:
/* TODO: need to add signature to posts */


Top
   
 Post subject:
PostPosted: Fri May 06, 2011 2:27 pm 
Offline
Junior Member

Joined: Wed May 04, 2011 8:06 pm
Posts: 35
I will change that field to int over the week end and see how it goes.

Another questions: indexes are btree by default but can be hash is specified.

It seems btree is good at comparisons < > and hash at equality test =

My where clause involving expires will obviously always be a comparison, whereas my id is always an equality test.

Should I have an index btree on id and hash on expires? or btree on (id, expires)?

It's not obvious to me what would work best...


Top
   
 Post subject:
PostPosted: Fri May 06, 2011 2:27 pm 
Offline
Junior Member

Joined: Wed May 04, 2011 8:06 pm
Posts: 35
I will change that field to int over the week end and see how it goes.

Another questions: indexes are btree by default but can be hash is specified.

It seems btree is good at comparisons < > and hash at equality test =

My where clause involving expires will obviously always be a comparison, whereas my id is always an equality test.

Should I have an index btree on id and hash on expires? or btree on (id, expires)?

It's not obvious to me what would work best...


Top
   
 Post subject:
PostPosted: Fri May 06, 2011 2:39 pm 
Offline
Senior Member

Joined: Fri May 02, 2008 8:44 pm
Posts: 1121
If the ID is unique and you're always doing queries on (id, expires), there's no need for an index on expires. Postgres will probably just search its index for id, fetch one (and only one) matching row, and either return it or throw it away depending on the value of expires. In that case, an index on expires would have negligible benefits, and only slow down updates.


Top
   
 Post subject:
PostPosted: Fri May 06, 2011 2:41 pm 
Offline
Junior Member

Joined: Wed May 04, 2011 8:06 pm
Posts: 35
That makes perfect sense, which is why I originally had a btree on id only :)


Top
   
 Post subject:
PostPosted: Fri May 06, 2011 5:33 pm 
Offline
Senior Member
User avatar

Joined: Tue May 26, 2009 3:29 pm
Posts: 1691
Location: Montreal, QC
Also, btree is the only index type that supports unique constraints.


Top
   
 Post subject:
PostPosted: Fri May 06, 2011 8:26 pm 
Offline
Senior Member
User avatar

Joined: Sat Aug 30, 2008 1:55 pm
Posts: 1739
Location: Rochester, New York
I'm closer to CE than CS, so I'll defer to the algorithms folks on what to index and how to index it. I'll defend my statements on how to store the hashes, though ;-)

_________________
Code:
/* TODO: need to add signature to posts */


Top
   
 Post subject:
PostPosted: Sun May 08, 2011 10:50 am 
Offline
Junior Member

Joined: Wed May 04, 2011 8:06 pm
Posts: 35
I switched to a 32 bit integer for id, generated by rand().

As I don't want to run the risg of having two identical integers, my UNIQUE constraint is on id and expires.

Code:
pilotest=> \d tokens
               Table "public.tokens"
   Column   |           Type           | Modifiers
------------+--------------------------+-----------
 session_id | character varying        | not null
 username   | character varying        | not null
 expires    | timestamp with time zone | not null
 created    | timestamp with time zone | not null
 id         | integer                  | not null
Indexes:
    "tokens_id_key" UNIQUE, btree (id, expires)


And I still have queries that are extremely slow (some reach 1.5s), things actually seem to have gotten worse.


Top
   
Display posts from previous:  Sort by  
Post new topic  Reply to topic


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  
RSS

Powered by phpBB® Forum Software © phpBB Group