How to delete millions of rows from a table

To delete a great number of rows from a table may become a pain in the neck. A never ending command which degradates performance, causes lock contention and a huge amount of I/O intensive writing operations – including on archived/transactions logs.  In this post we present some of the main actions to consider if you have to remove millions of rows from a table.

Check if you can use Truncate

If you need to remove all the rows from a table, the most efficient way is to use the Truncate Table command. Usually, this command deallocates the data pages used by the table, instead of removing row by row as the Delete command does (in some DBMS, like MySQL, the Truncate command drops and re-creates the table). This makes Truncate usually quicker than the Delete command.  Usually, Truncate does not fire any triggers on the table.

But be careful, in some DBMS, like Oracle and MySQL, the truncate command does an implicit commit and, then, you cannot rollback the Truncate command. In Oracle, you cannot even use a flashback table statement to recover deleted rows.  

The basic syntax of the Truncate command is:

Truncate table table_name;

When you need to remove almost all the rows from a huge table (e.g. more than 80%) and the rollback/recover restriction is not an issue for you, you may also evaluate the use of Truncate. In this case, you should:

  • Create a new table with the rows that should not be truncated, using a Create Table as Select or the similar command in your DBMS (e.g. use Select * into Table from for MS SQL Server);
  • Truncate the original table;
  • Write the rows back to the original table, using an Insert as Select command;
  • Drop the table you created in the first step.

In the following image, we want to remove all the rows from Sales where SalesYear <> 2019 (using Oracle syntax).

Again, as there may be restrictions on data recover, you MUST CAREFULLY TEST this procedure before using it.

Truncate may also be subject to additional restrictions, like the table cannot be referenced by enabled foreign keys. You must check your DBMS specific documentation to verify rollback options and execution restrictions. Here are some useful links:

Disable foreign keys that reference the table

 You verified Truncate is not for you. Then, you must use a delete command. When a delete is executed over a table, the DBMS verifies all the foreign keys that reference such table in order to apply the On Delete rules (i.e. restrict, cascade and set null).

When a delete affects millions of rows, that may take a reasonable time and computational effort.

Also, in many DBMS, such a delete would cause a lock escalation (i.e. the DBMS would lock more than just the pages with rows that are being deleted) and may significantly impact DBMS users.

Therefore, if you can guarantee referential rules while the delete is being executed, you should benefit from disabling all foreign keys that references the table. The best way to do that is to use the metabase to generate the disable and enable commands.

Disable (almost) all indexes

All indexes in a table must be updated during a delete. To reduce this overhead during a huge delete, you should disable table’s indexes before the delete and rebuild them after the delete.

There are just two exceptions to this rule:

– Cluster indexes – in some DBMS, a table becomes inaccessible when clustered indexes are disabled (see the example of MS SQL Server here).

– Indexes useful for the delete – Before executing the delete, you should check its execution plan (using the Explain Plan command, for example). If you are deleting up to about 15% of the table’s data, the DBMS may use an index to select the rows that would be removed. If it does so, does not disable such index. 

Delete chunks of data – verify partitioning rules

Executing a long running delete of millions of rows can affect the system and end users in several ways, including in terms of locks. The effects (in terms of lock contention and performance degradation) on DBMS users would be smaller if you execute several delete commands over a “small” chunks of rows each instead of a single long-running delete.  After each delete, locks would be released and you can commit the transaction, releasing system resources. So, if you intend to delete 200 million rows from a table, you may evaluate execute 10 deletes of about 20 million rows each.

The command execution would also be greatly benefited if you combine your chunk creation strategy with table partitioning strategy.

For instance, suppose some of the columns in your table are SaleDate and SaleDep. Your table is partitioned on Month/Year of SaleDate and you need to delete of rows on which SaleDep = ‘Sports’. 

You may execute several deletes, each one on SaleDep = ‘Sports’ and SaleDate = certain Month/Year. In the following, we present an example of such delete for 3 months from year 2000. If you have a great number of partitions, than you may build your delete commands using a loop…

The DBMS would execute each delete over a single table partition, which may increase the execution performance of each delete significantly. 

Verify your recovery model

In a production environment, the DBMS usually runs in a recovery model that enables the maximum recovery level. Examples are the Archived Log mode of Oracle (see our post on Archived Log Mode) or the Full Recovery Model of SQL Server. 

Executing a delete on hundreds of millions of rows in such recovery model, may significantly impact the recovery mechanisms used by the DBMS. For instance, it would lead to a great number of archived logs in Oracle and a huge increase on the size of the transaction logs in MS SQL Server.

If you are not in production environment (or if recovery in time during delete execution is not a concern for you), you may think about on reducing the recovery level of you instance during delete execution. Again, think carefully before doing that and check you DBMS specific documentation, as this would reduce disaster recovery mechanisms.

Hope this was helpful. Share with us if you have any other tip. Cheers!

2 comments

  1. Above syntax itself wrong. It wont work in sql srever

    create table address1 as
    select * from [Person].[Address]
    where AddressID=1

    1. Hi Rajasekhar,

      This syntax is for Oracle DBMS. For MS SQL Server, use the following:

      select * into address1
      from [Person].[Address]
      where AddressID=1

Leave a Reply

Your email address will not be published. Required fields are marked *