In SQL, The Count function counts the number of values in a set. Such function accepts a parameter, that specifies which should be counted. Common values for the count´s parameter, include *, a column name, a column from a table’s primary key and a constant. In this post, we discuss how Count works depending on the used parameter, presenting samples on queries with inner and outer joins.
The dataset
Our sample dataset is composed by two tables: Employees and Dependents, each one with just a few rows, as represented in the following figures.
The Empid column in the Dependents table contains the Id of the employee on which the depends on. The column Name stores the name of the dependent.
Counts and Inner Joins
Suppose we want to list the id and name of all the employees that have dependents together the dependents name. This can be achieved with the use of an inner join between such tables, as in the following query.
Let’s first take a look on the use of count together with several possible parameters and inner joins. The following query presents two of the most common uses of count: count(*) and count(constant). Both uses of count are equivalent: they return the count of all rows that would participate in the query’s result if it is transformed into a select * or into a select constant (instead of a select count). In old old days, count(constant) led to better performance than count(*), but in current versions of top DBMS both formats should have similar performance.
Count can be used together with table_name.* and with a table’s primary key, as in the following query. Some people say that this would return the number of rows in a table. In fact, these two parameters would provide the same result, which is the number of non-null rows from the considered table that would appear in the query’s result if it is transformed into a select * query. Note that in the following query, 3 is not the number of employees, it is the number of times a employee has a dependent (2 + 1: Michael has 2 dependents + Julia has 1 dependent).
In the following query, count(d.*) has the same result (i.e. 3) than count(e.*) in the previous image. This happens because the query contains an inner join between employees and dependents. Also count(d.empid) has the same result of count(e.id) due to the same reason. Latter in this post, we will discuss the use of counts together with outer joins. Then, we will show that count(d.*) and count(e.*) may have distinct results.
Also, in the following query, count(distinct d.empid) counts the number of distinct non-null values of empid (i.e. 2) that would appear if one used select * .
Counts and Outer Joins
Inner and outer joins lead to distinct results (see this post for the uses of inner and outer joins). Thus, the results of count in queries with outer joins may be different from the ones in queries with inner joins. For instance, in the following, counting e.* and e.id would return 5, because this query would return all the rows from the employees table no matter if the employees have or not dependents. The result of such counts when using an inner join was 3, as we presented before.
Suppose we want to list all the employees that does not have dependents. This can be achieved by the following query.
Let´s now replace the column names in the above query for some counts, as in the following query. Now, count(e.*) and count(d.*) would lead to distinct values (i.e. 2 and 0, respectively). Also, count(e.id) would provide a value distinct from count(d.empid). This happens because all such counts are counting non-null values, and there are non-null values from dependents table in the query’s result.
Let’s see all together in the following query. Count(*) and count(constant) have the same result. Count(table.*) and count(table primary key) have the same result. In, Count(column name) and count(distinct column name), only non-null values are considered (i.e. count(e.*) <> count(d.*)) .
Hope this is helpful. Cheers.