LEAD and LAG: accessing multiple rows without self join

LEAD and LAG are window functions that enable the simultaneous access to a table’s row and to other row related (that follows or precedes) to it without the use of a self join (i.e. join the table to itself). Avoiding the use of the self join can improve the performance of a SQL command because it may reduce the number of read operations (hint: always check the execution plan of your SQL command when trying to improve its performance).

In this post, we presente LEAD and LAG and how to use them to find missing values in a table (there are also other SQL constructions that can be used to list missing values – e.g. read our posts on using recursive queries when looking for missing numbers and dates). LEAD and LAG are available in several DBMS, including Oracle, MySQL, PostgreSQL and Microsoft SQL Server.

The LAG function

LAG is a window function that lets you access the value from a column in a row that lags (precedes) the current row. Such function receives three parameters: the first one is the column name you want to access (you may use a built-in function instead of a column). The second parameter determines an offset from the current row (it is an optional parameter and its default value is 1). The third parameter (optional with default value NULL) is the value to be returned if offset goes beyond the bound of the table. LAG is used together with OVER, where the Partition By and Order By clauses may be used.

Consider the following Products table:

Now, let’s use the LAG function in order to list each product’s Id, the Id of the precedent row and the Id from the product that precedes the current row by an offset of 2. The Order By clause in OVER is used to order the ids.

Now, let’s use LAG find the values that are missing in the ID column.

First of all, we need to look for rows whose gap between the current row and the one that precedes it is greater than 1. In order to do that, we will use a subquery:

The values returned as a result of the above query are the ones near to the bounds of the interval of missing values. Let’s now list the missing values. For intervals of missing values, we list only the boundaries (in the following query, we use Oracle’s TO_Char function and || concatenate operator to format the result, but similar behavior can be obtained in others DBMS that support the LAG function).

Using the LEAD function

While LAG provides access to a row that precedes the current row, LEAD function provides access to a row that follows the current row. It supports the same three parameters that may be used with LAG: (i) column name (or built-in function), (ii) offset value and (iii) default value for out of bounds access. LEAD is also used together with OVER and supports Partition by and Order by clauses.

In the following figure, we present how to use LEAD to count the number of missing Ids in Products table. The inner query lists each existing Ids and its subsequent value in the table. The outer query filters the rows where there are gaps and computes the number of missing values.

We can also use LEAD to list missing values, as we did with LAG. In the following, for intervals of missing values, we list only the boundaries, and use Oracle’s TO_Char function and || concatenate operator to format the result (similar behavior can be obtained in others DBMS that support the LEAD function):

Don’t forget: removing self join can lead to performance improvement, but you should compare execution plans in order to verify how the use of LEAD and LAG affects the performance of your queries!

Leave a Reply

Your email address will not be published. Required fields are marked *