Wednesday, December 15, 2004

PostgreSQL performance of "where exists"

Today I was looking into the performance of a PostgreSQL query with a "... where exists (select 1 from ...)" subquery:
select foo_id from foo

where exists (select 1 from bar where bar.foo_id = foo.foo_id)
I was surprised to find out that this query actually ran faster when I restructured it with a SELECT DISTINCT and a JOIN:
select distinct(foo_id) from bar

join foo on bar.foo_id=foo.foo_id
Some references on the web I've found suggest that EXISTS is the preferred way to write the above query in general. Because it's a boolean condition, in theory the database needs to scroll fewer rows because it can stop as soon as the first match is found; and the DISTINCT can be expensive if the results from the join version would not have been unique.

An ancient PostgreSQL mailing list post indicates that rewriting the query as a JOIN may be faster than EXISTS in PostgreSQL, because the join can take advantage of indexes while EXISTS does a nested loop. But, then again, I'm still using PostgreSQL 7.3.x, and EXISTS handling may well have been improved in 7.4.

1 comment:

Joseph Scott said...

I tried out these two different types of queries on some existing data that I have on PostgreSQL 7.4.6 and found that the first type of query is faster than the second.

http://joseph.randomnetworks.com/archives/2004/12/16/postgresql-performance-of-where-exists/