In a previous post, we present an introduction to database tuning. In this post, we’ll talk about SQL execution plans and how to rewrite queries to improve their performance, which is a key database performance tuning technique.
SQL Execution Plan
SQL is a declarative language, which means in SQL commands we tell the DBMS what we want but we do not tell how the DBMS can access such information in database files. Therefore, in order to execute a SQL command, the DBMS has to execute several steps, which include the parsing of the SQL command and its optimization. As a result of the SQL optimization, the DBMS prepares a command execution plan. Such execution plan (that is commonly represented as a tree) contains a series of steps that the DBMS should take in order to execute our SQL. The following image presents a sample of SQL execution plans from Oracle and MS SQL Server.
The execution cost usually has operations of several types: (i) access operations (i.e. table access full, index seek, index scan), (ii) joins algorithms (e.g. nested loops, hash join, merge join) and (iii) others (mostly related to sorting and grouping data). Although there is a certain equivalence between several operators of different DBMS, you should carefully read the documentation of each to be able to do a detailed analysis of the execution plans. Some usefull links are:
- Oracle
- Microsoft SQL Server:
During the optimization process, the DBMS usually evaluates several possible plans in order to execute a command and chooses the one with the lowest foreseen execution cost.
The execution cost is the execution time? Can I compare the execution cost of distinct DBMS?
The SQL execution cost is calculated in no specific unit. It is not the necessary time to execute the command. Indeed, each DBMS has its own formula to calculate the execution costs and its own unit of measure. So, if you got a command whose execution cost is of 1,000 in Oracle and of 100 in MS SQL Server, you cannot state that it would take longer to execute such command in Oracle than in SQL Server.
Distinct SQL commands can have the same execution plan?
Yes, two SQL commands may have the execution same plan. It means that the DBMS would execute the same steps in order to execute both commands and that executing each of them would have the same execution cost.
Distinct SQL commands that have the same result may have distinct plans? A higher execution cost leads to a higher execution time?
The DBMS optimizer choose a plan for each SQL command. Then, it may choose distinct execution plans for distinct commands, even if such commands have the same result.
If you compare the execution cost of two commands in the same DBMS, then you have great probability that the one with the lowest execution cost would execute faster than the other.
But it is not fully guaranteed that a higher cost will lead to a longer execution time. Several factors that can not be measured during assembly of the execution plan (and their costing) can influence the execution of a SQL command, such as the execution of other commands at the same time (which affects, for example, I / O time and CPU availability) and the preexistence of the data used in the query in the main memory.
Also, you should not forget that costs in execution plans are foreseen and that during the actual SQL execution they may turn out to be much higher (or lower) than they were foreseen (e.g. due to imprecise or not up to date statistics). That may make the execution time of a command A higher than the one of a command B, even though the execution cost of B is higher than the cost of A.
SQL Rewrite
The rewriting of SQL commands looking for new ones that leads to the same result of the original command but with lower costs is a common database tuning technique.
There are no rigid rules that lead to a lower-cost command, and for each command, you should evaluate your execution plan, estimated cost and, if necessary, run several tests on the runtime (be sure to clear the cache buffer between otherwise the second execution of the command will probably be faster than the first). However, some action tips to accomplish are:
- Test the use of Exists and Not Exists instead of In and Not In (see how to do that in SQL: from [Not] In to [Not] Exists in 3 steps). It can affect performance and results in some situations (see our post on Not In x Null). In some situations, subqueries can be replaced by outer joins (see this post on how to use outer joins);
- The Union operator eliminates repetitions and that has a cost. If it is guaranteed that there are no repetitions or if duplicate data in the result is not a problem, use Union ALL;
- If the same table appears more than once in the query, try rewriting so that it only appears once;
- Use Like only if necessary. A comparison with ‘=’ (equal) can be more efficient. And remove unnecessary wildcards (i.e.%);
- Use Distinct, Order By and Group By only if really necessary;
- Avoid using the ‘*’ in the Select. Some older versions of DBMS do not handle this well. Specify the columns you want in your result. It is a good technique and can aid in performance;
- Check if you really need Outer Joins or if they can be replaced by Inner Joins (see this post inner joins x outer joins);
- Check if you are using Having properly (see our post on Where x Having);
- Nested views, nested tables and recursive queries may be trick. Test carefully!
- Try, try, try…. Write several commands, verify if they have the same result… Test, test, test….
Do you have more tips? Share with us!
Adaptive query plans are not feasible for all kinds of plan changes. For example, a query with an inefficient join order might perform suboptimally, but adaptive query plans do not support adapting the join order during execution. At the end of the first execution of a SQL statement, the optimizer uses the information gathered during execution to determine whether automatic reoptimization has a cost benefit. If execution information differs significantly from optimizer estimates, then the optimizer looks for a replacement plan on the next execution. The optimizer uses the information gathered during the previous execution to help determine an alternative plan. The optimizer can reoptimize a query several times, each time gathering additional data and further improving the plan.