Monday, October 19, 2009

Mini-Lecture: Why I hate "comma joins"

A colleague of mine who is not a DBA asked about comma joins in an email. I replied with the below:

From:
Sent: Monday, October 19, 2009 3:32 PM
To:
Subject: RE: SQL question

Happy to help. Thanks for the hyperbole. :)


There is no functional difference, however, the first method – using a “comma join” is undesirable for a few reasons.

1) Using “comma joins” is old code and only supported at the pleasure of Microsoft. Its considered bad syntax. Its considered lazy. JOINS are the standard for TSQL. It may not be supported in future versions of SQL server.

2) It mixes your JOIN logic with your SELECTION logic. Imagine you added WHERE EMPID = ‘10’ to the sample query. Both functionally and in business logic, the EMPID = ‘10’ and the A.EMPLID = C.EMPLID are two very different statements. Functionally, it will probably work (not 100%), yet it is far more difficult to understand and maintain.

3) It is really difficult to read. Imagine a much larger query that has a bunch of commas in the FROM and then a tangle of statements in the WHERE Clause. Using JOINs and ONs are far easier to read for the user that comes behind you.

4) Screw up your WHERE clause and you’ll end up with a Cartesian product. Row count explosion.


Attached is a lab to demonstrate the various types of joins from a functionality standpoint.



From:
Sent: Monday, October 19, 2009 3:23 PM
To:
Subject: SQL question


Hey man, I have a SQL question for you.

What is the difference between this:

FROM A, C
WHERE A.EMPLID = C.EMPLID

And this:

FROM A
INNER JOIN C on A.EMPLID = C.EMPLID

And the benefit of one over the other?


Hope all is well with the “Grand SQL Wizard”!!!


(Yeah, I'm not sure either if that last bit was sarcasm or hyperbole either. It certainly wasn't genuine or factual. :) )

Here's that handy .sql script I keep around for demonstrating basic JOIN principles and syntax. Note that all of the syntax in this script is valid, but a lot of it is unacceptable and improper - that is sort of the point. This lab usually goes along with a short, stern lecture :) (see above) on why proper JOIN syntax is preferable.


DECLARE @T1 TABLE
(
tkey int
)

DECLARE @T2 TABLE
(
tkey int
)

INSERT INTO @t1 VALUES (1)
INSERT INTO @t1 VALUES (2)
INSERT INTO @t1 VALUES (3)
INSERT INTO @t2 VALUES (3)
INSERT INTO @t2 VALUES (4)
INSERT INTO @t2 VALUES (5)

SELECT * FROM @T1 T1 INNER JOIN @T2 T2 ON T1.tkey = T2.tkey
SELECT * FROM @T1 T1 JOIN @T2 T2 ON T1.tkey = T2.tkey
SELECT * FROM @T1 T1 LEFT OUTER JOIN @T2 T2 ON T1.tkey = T2.tkey
SELECT * FROM @T1 T1 RIGHT OUTER JOIN @T2 T2 ON T1.tkey = T2.tkey
SELECT * FROM @T1 T1 CROSS JOIN @T2 T2
SELECT * FROM @T1 T1 FULL OUTER JOIN @T2 T2 ON T1.tkey = T2.tkey
SELECT * FROM @T1 T1 , @T2 T2
SELECT * FROM @T1 T1 , @T2 T2 WHERE T1.tkey = T2.tkey


"Programmers can be lazy." -Larry Wall

No comments: