A Closer Look at SQL Joins

SQL is by far one of the most popular and powerful languages for dealing with databases, and, to me, one of its greatest (and basic) features is the ability to join tables. Doing so is a necessity for anyone who finds himself working with two or more tables which are interconnected.

In the business world, it’s common practice to have, for example, a table with customer data and order ids, a second table containing – for each order id – the products ordered, and another table in which products descriptions are stored.

The problem arises whenever we have the need to find what products a customer, identified by a number, has ordered.

Table Setup

For clarity and ease of explanation and understanding, we’ll simplify the above situation just a bit.

Our company needs two tables: one, called customers, storing customer information, while the other one, called products is filled with our magnificent limited edition pens details (yeah, we sell unique pens…).

The resulting table layout of the query

CREATE TABLE `customers` (
	`customer_id` INT( 11 ) NOT NULL ,
	`customer_name` VARCHAR( 255 ) NOT NULL ,
	PRIMARY KEY ( `customer_id` )
) ENGINE = MYISAM;
 
<br />
CREATE TABLE `products` (
	`pen_color   ` VARCHAR( 255 ) NOT NULL ,
	`customer_id` INT( 11 ) ,
	PRIMARY KEY ( `pen_color   ` )
) ENGINE = MYISAM

will be something like:

+---------------+--------------+------+-----+---------+-------+
| FIELD         | Type         | NULL | KEY | DEFAULT | Extra |
+---------------+--------------+------+-----+---------+-------+
| customer_id   | int(11)      | NO   | PRI | NULL    |       |
| customer_name | varchar(255) | NO   |     | NULL    |       | 
+---------------+--------------+------+-----+---------+-------+
<br />
+--------------+--------------+------+-----+---------+-------+
| FIELD        | Type         | NULL | KEY | DEFAULT | Extra |
+--------------+--------------+------+-----+---------+-------+
| pen_color    | varchar(255) | NO   | PRI |         |       | 
| customer_id  | int(11)      | NO   |     | NULL    |       |
+--------------+--------------+------+-----+---------+-------+

It’s important to notice how both tables have a customer_id column that can tie them together. This field represents the customer details (customers table) or what customer has ordered that particular pen (products table).

Your first simple JOIN

Alright, we got a lot of orders from our fellow aficionados and we want to start putting things together and ship the first few packages.

We have the following tables:

+-------------+---------------+
| customer_id | customer_name |
+-------------+---------------+
|           1 | John          | 
|           2 | Tim           |
|           3 | David         | 
|           4 | Aaron         | 
+-------------+---------------+
 
<br />
+--------------+-------------+
| pen_color    | customer_id |
+--------------+-------------+
| red          |           1 |
| blue         |           3 |
| yellow       |           4 |
| purple       |           1 | 
| black        |           0 |
| green        |           1 | 
| white        |           3 |
| brown        |           0 | 
+--------------+-------------+

Hm, looks like someone wanted to test a few of our goodies!
So, to find out what colors John (ID #1), our longest-time customer has chosen this week, we’d run the following code:

SELECT * FROM customers
	JOIN products ON
		customers.customer_id = products.customer_id
			WHERE customers.customer_id = 1;

And here we go!

+-------------+---------------+--------------+-------------+
| customer_id | customer_name | pen_color    | customer_id |
+-------------+---------------+--------------+-------------+
|           1 | John          | red          |           1 |
|           1 | John          | purple       |           1 | 
|           1 | John          | green        |           1 |
+-------------+---------------+--------------+-------------+

On to LEFT JOIN

What the hell is a left join, you may be wondering.

Well, as the name very clearly states, it gives extra attention to the left table (being the first mentioned, on the left of the LEFT JOIN statement, customers in our example).

Each item of the left table will show up in the result, even thought there is no match with the right table.

Referring to the previous setup, the query

SELECT * FROM customers
	LEFT JOIN products
		ON customers.customer_id = products.customer_id;

yields

+-------------+---------------+--------------+-------------+
| customer_id | customer_name | pen_color    | customer_id |
+-------------+---------------+--------------+-------------+
|           1 | John          | red          |           1 |
|           1 | John          | purple       |           1 | 
|           1 | John          | green        |           1 |
|           2 | Tim           | NULL         |        NULL | 
|           3 | David         | blue         |           3 | 
|           3 | David         | white        |           3 | 
|           4 | Aaron         | yellow       |           4 | 
+-------------+---------------+--------------+-------------+

The result shows the query preserved every customers, including those without any product ordered.

RIGHT JOIN is just the same

This time, though, the dominating table is the right one, so we expect to see all the products, regardless of their order status.

SELECT * FROM customers
	RIGHT JOIN products
		ON customers.customer_id = products.customer_id;

results in

+-------------+---------------+--------------+-------------+
| customer_id | customer_name | pen_color    | customer_id |
+-------------+---------------+--------------+-------------+
|           1 | John          | red          |           1 | 
|           3 | David         | blue         |           3 | 
|           4 | Aaron         | yellow       |           4 | 
|           1 | John          | purple       |           1 | 
|        NULL | NULL          | black        |           0 | 
|           1 | John          | green        |           1 | 
|           3 | David         | white        |           3 |
|        NULL | NULL          | brown        |           0 | 
+-------------+---------------+--------------+-------------+

Selective RIGHT or LEFT JOIN

You can get a particular set of results specifing the WHERE clause after the JOIN statement.

SELECT * FROM customers
	RIGHT JOIN products
		ON customers.customer_id = products.customer_id
		WHERE products.customer_id = 0;

You’ll then select from the set you’d get from a regular RIGHT JOIN query all of the records that have not been bought yet.

+-------------+---------------+--------------+-------------+
| customer_id | customer_name | pen_color    | customer_id |
|        NULL | NULL          | black        |           0 | 
|        NULL | NULL          | brown        |           0 | 
+-------------+---------------+--------------+-------------+

FULL JOIN: get ‘em all!

Produces a of all records in the two tables, with matching records from both sides where available. If there is no match, though, the missing side will contain null.

SELECT * FROM customers
	FULL JOIN products
		ON customers.customer_id = products.customer_id;

results in…

+-------------+---------------+--------------+-------------+
| customer_id | customer_name | pen_color    | customer_id |
+-------------+---------------+--------------+-------------+
|           1 | John          | red          |           1 | 
|           3 | David         | blue         |           3 | 
|           4 | Aaron         | yellow       |           4 | 
|           1 | John          | purple       |           1 | 
|        NULL | NULL          | black        |           0 | 
|           1 | John          | green        |           1 | 
|           3 | David         | white        |           3 |
|        NULL | NULL          | brown        |           0 | 
|           2 | Tim           | NULL         |        NULL | 
+-------------+---------------+--------------+-------------+

INNER, OUTER and friends

You might have heard something about INNER and OUTER JOINS, but never understood what they really are used for.
By surprise, an INNER JOIN does a full join, thus acts in the same exact way as our first example does.

On the other side, the keyword OUTER is used when dealing with two or more tables only and may be added after the word LEFT or RIGHT or FULL in the last examples.

The reason of this all? ODBC compatibility!

CROSS JOIN

There’s also a so-called cartesian product available for SQL users.
This particular quesry joins everything to everuthing, resulting in a huge set of results that, sometimes can harm the system.

SELECT * FROM customers
	CROSS JOIN products;

The would result in 4 x 8 = 32 rows.
It should be pretty strightforward what would happen running this join against very large tables.

Self JOIN instead of a subquery

A Self join is basically a join on the same table.

Take the following example, and assume our customers table now looks like

+------------------+--------------+------+-----+---------+-------+
| FIELD            | Type         | NULL | KEY | DEFAULT | Extra |
+------------------+--------------+------+-----+---------+-------+
| customer_id      | int(11)      | NO   | PRI | NULL    |       | 
| customer_name    | varchar(255) | NO   |     | NULL    |       |
| customer_country | varchar(255) | NO   |     | NULL    |       | 
+------------------+--------------+------+-----+---------+-------+
 
<br />
+-------------+---------------+------------------+
| customer_id | customer_name | customer_country |
+-------------+---------------+------------------+
|           1 | John          | USA              | 
|           2 | Tim           | Germany          | 
|           3 | David         | Italy            |
|           4 | Aaron         | USA              |
+-------------+---------------+------------------+

Now, suppose we found out John has’t paid his order and all the people from his country have learned how to trick our payment system. We want to find out who they are. But first, we must check which country John is from.

SELECT * FROM customers
	WHERE customer_country = (SELECT customer_country
	                                 FROM customers
	                                 	WHERE customer_id = 1);

With the use of a subquery, the result is

+-------------+---------------+------------------+
| customer_id | customer_name | customer_country |
+-------------+---------------+------------------+
|           1 | John          | USA              | 
|           4 | Aaron         | USA              |
+-------------+---------------+------------------+

which is the same of (using aliases).

SELECT customers_1.* FROM customers AS customers_1
 
	JOIN customers AS customers_2
		ON customers_1.customer_country = customers_2.customer_country
			AND customers_2.customer_id = 1;

Joining and grouping

Have you ever wondered who is the customer making us always out of stock for certain colors?
A nice way to find out how many products a customer has ordered is by means of GROUP BY.

Here’s how to do it, instead of fetching all the customers and then, after each row, query again to count the products.

SELECT customers.customer_id, customers.customer_name, COUNT(products.pen_color) AS total_orders FROM customers
	JOIN products
		ON customers.customer_id = products.customer_id
			GROUP BY customers.customer_id;

And, to see people who haven’t ordered anything yet, use or beloved LEFT JOIN!

SELECT customers.customer_id, customers.customer_name, COUNT(products.pen_color) AS total_orders FROM customers
	LEFT JOIN products
		ON customers.customer_id = products.customer_id
			GROUP BY customers.customer_id;

+-------------+---------------+--------------+
| customer_id | customer_name | total_orders |
+-------------+---------------+--------------+
|           1 | John          |            3 | 
|           2 | Tim           |            0 | 
|           3 | David         |            2 | 
|           4 | Aaron         |            1 | 
+-------------+---------------+--------------+



17

Comments
  • Vasili says:

    YES. Thanks for this article/tutorial. I’ve never understood joins. Thanks! :D

  • U says:

    Excellent work in explaining. I’ve never come a cross a tutorial that explains joins as simple and clear like this article..

  • sven says:

    excellent!

  • Anya says:

    Great stuff! If you get a chance, can you do a tutorial on indexing?

  • Karsten says:

    Nice tutorial and good examples.

    I used this from Coding Horror back in 2007:
    http://www.codinghorror.com/blog/archives/000976.html

  • payam says:

    very nice
    thanks

  • parminder says:

    Great articles. I am one of those that have stayed away from joins.

    I did find certain parts were a little heavy without explanation such as the cross joins bit and the “which is the same of (using aliases).” bit. These lacke explanation I felt.

    But on the whole it was a great article.

  • parminder says:

    Would this also be applicable to sql server

  • Mack says:

    Joins are something some people never touch because you do not have too ever use them. Nothing absolutely requires a join. However, if you are an efficiency expert then this is something you will benefit from knowing well.

    I think you did a good job explaining them and your visuals were very helpful.

  • Ivan Novak says:

    In your example, what happens when one color has multiple customer id’s or (the reverse) when customer orders overlap pens of the same color.

    This must be where the order number table you describe in the introduction comes in, but I can’t seem to wrap my head around the fact that each order, while having one customer id, would have multiple pens. Would you get around this by providing multiple generic fields (product1, product2, etc) or would you want to store multiple product id’s in one field by concatenating them ($a = product1; $b = product2; $c = product 3; $delim = ‘,’; $storstrng = $a.$delim.$b.$delim.$c; //returns a string “product1,product2,product3″) and using a php explode to return the results in a readable format?

    I’m curious which would be optimal. I hope there is an easier, more appropriate solution.

  • dev says:

    Good tutorial but i still dont really understand coz i to new to SQL, hope to see more this type of tutorial.

  • Chris M says:

    Brilliant article, thank you so much for taking the time to share this with all of us, will definitely help in the future :)

  • sarfraz says:

    great stuff :)

  • ronverdonk says:

    Great article! I use SQL databases (almost) all the time and I wouldn’t know how to survive without JOINs.

    Of course I don’t agree at all with Mack. He/she either never uses relational databases with inter-related tables or he/she writes all the logic necessary to ‘join’ table data in his/er code.

    Thanks again.

  • ronverdonk says:

    Great article! I use SQL databases (almost) all the time and I wouldn’t know how to survive without JOINs.

    Of course I don’t agree at all with Mack. He/she either never uses relational databases with inter-related tables or he/she writes all the logic necessary to ‘join’ table data in his/her code.

    Thanks again.