I ran into an interesting discussion at work yesterday regarding SQL and how a sane DBMS would handle the following:
SELECT * FROM a INNER JOIN b ON a.id = b.game_id WHERE a.name = 'wow'
Jeff Atwood has provided a very handy visual guide to how SQL joins work, and I feel I have a good grasp on the basics. What his examples do not show, however, is how these joins are performed when you throw a WHERE clause into the mix. I asked about this on Twitter yesterday, and I’m pretty sure that 140 bytes isn’t enough to clearly state the question.
Let’s populate our tables for illustrative purposes. First table “a”:
id name __ ____ 0 halo 1 halo2 2 halo3 3 wow
Then table “b”:
user_id game_id rating _______ _______ ______ 0 2 5 5 3 2 7 0 3 14 3 5
The result of a naked INNER JOIN produces only the set of records that match in both tables:
SELECT * FROM a INNER JOIN b ON a.id = b.game_id id name user_id game_id rating __ ____ _______ _______ ______ 0 halo 7 0 3 2 halo3 0 2 5 3 wow 5 3 2 3 wow 14 3 5
Adding a WHERE clause to that can reduce the result set even further:
SELECT * FROM a INNER JOIN b ON a.id = b.game_id WHERE a.name = 'wow' id name user_id game_id rating __ ____ _______ _______ ______ 3 wow 5 3 2 3 wow 14 3 5
In the interests of performance, I’m curious about when in the execution of this query the WHERE clause is taken into account. Specifically, does the naked join occur before the WHERE is even applied, or are most DBMSes smart enough to reduce the working set early like so:
SELECT * FROM (SELECT * FROM a WHERE a.name = 'wow') x INNER JOIN b ON x.id = b.game_id
…and only perform the join on the rows we know we want from the left side of the join?