NOT IN and NULL: a dangerous combination

Non-correlated subqueries (where the inner query does not have any column from the outer query tables) are usually implemented using the IN clause. But, when you use the NOT operator to construct a NOT IN query some unexpected results may arise…

Let’s consider two simple tables: employees and departments. Some of the columns of employees are id and name. The former is the table’s primary key and represents the employee number, while the latter stores the employee name. The departments table has several rows, including id (the department identification number), name (that stores the department’s name) and manager_id (which stores the id of the employee that is the department’s manager).

In the following, we present the current data of such tables:

Suppose you want to list the names of the employees that are managers of a table. You can do that by using the following query:

You can also use a non-correlated subquery in order to list the employees that are not managers of any department. A simple approach would be to use the NOT IN operator in the outer table:

Sophia is the only employee that is not a manager of any department.

A NULL value: what will happen?

Now, a new department is created: IT. Such department still don’t have a manager:

What would be the result if you re-execute the query that uses the NOT IN operator to list the employees that are not department managers? Let’s re-execute the query in the DBMS:

What happened to Sophia? She is not a manager, but after creating the IT department, the NOT IN query does not list her name!

Since NULL represents an unknown value, looking for values in the outer table that are not
unknown (i.e. employees’ ids that are not in a list that contains NULLs) would return 0 rows.

Missing employee

Let’s now see an example using Count queries. In the first one, we count all the rows in the Employees table. You may see that there 5 employees.

The next query shows that four of them are managers.

But the NOT IN query returns Zero! So, where is the 5th employee?!

Avoiding unexpected results

There are several ways you can avoid such “unexpected result”. One of them, would be to replace NULL values in the subquery. You can do that, using functions like Coalesce and NVL (see here our post on such functions to decide which one you should use).

If we use Coalesce in the inner table, Sophia would be listed:

But you can also avoid having the “unexpected result” generated by the NULL value, using a correlated subquery (i.e. referencing a column of the outer query in the inner query), just like in the following:

We can also solve this problem using a IS NOT NULL in the subquery, like we show in the next figure (where we solve the Count non-managers employees example).

If you usually writes noncorrelated subqueries (using IN) but are not familiar with correlated subqueries (with Exists), take a look at our post on how to rewrite noncorrelated subqueries (using IN) into correlated subqueries (with Exists) in 3 simple steps.

So remember: be careful when using NOT IN clauses!

3 comments

  1. Hello and thanks for the nice post. Just a small remark about a typo error in the first listing:
    select id, name, manager_id from departments order by 1;
    The manager_id of the “Operations” department should be 5, not 2, for the results to be consistent.

    Thanks again
    Ahmed

Leave a Reply

Your email address will not be published. Required fields are marked *