10
tips about SQL Server that every developer should know
Working
as DBA to a corporative environment for over 5 years, I have seen things that
even the least experienced SQL Server developer would believe. For example,
every developer who programs for SQL Server knows, or at least should know,
that one of the main requirements to assure good performance of his query is to
analyze its execution plan and to assure that it is making adequate use of the
table’s indexes.
However,
what I have evidenced in my day-to-day is that still many are the developers
that so much as worry about the table’s indexes. What one notices, is that at
least at first, the developer is more worried about making his query work and
in delivering the data to the user.
As
a result, in medium term what you have is high waste of the server’s resources
and the so hated delay in the application. It is clear that there exist many
other points that lower an application’s performance as, for example: outdated
data access statistics, connections’ blocks (most of the times, due to the lack
of indexes), excessive use of cursors etc.
But
you can be certain: the bad use of index and even their absence is the greatest
cause of SQL Server applications’ performance problems. In this article I will
present 10 important tips that every developer must know when working with the
SQL Server.
Tips
on how to analyze the execution plan, methods for the substitution of cursors,
use of sub-queries, use of indexed column in the where clause, at end, tips
that certainly will help you to gain a greater benefit from the SQL Server.
Well,
given this small introduction, let us move on to what matters.
1. I always analyze the queries execution plan.
As
I have said previously, the execution plan analysis is one of the main
requirements to assure the query’s good performance. The execution plan
describes the path used by the SQL Server optimizer to arrive at the requested
data and shows which operations had been executed during the query processing.
In the execution plan, each operation is represented by an icon and a set of
arrows that unite these icons. This makes it easy to understand a query‘s
performance characteristics.
To
see a query’s execution plan, enter the SQL Server 2000’s Query Analyzer or the
SQL Server 2005’s Query Editor, write the query and type Ctrl+L or select the
menu option Query> Display Estimated Execution Plan in the menus bar.
You
will notice that the query will not be actually executed, that will only occur
before the creation of the execution plan. When analyzing an execution plan,
you must have in mind that it is generated based on the existing statistics for
the table or indexes used by the query; therefore it is very important that
when analyzing the execution plan the objects access statistics are updated.
If
the statistics are not updated, the execution plan will be generated on top of
inconsistent data that may not reflect reality. You must always update the
statistics after performing operations that put into motion great bundles of
data or the creation and alteration of indexes.
In
Figure 1 we have the example of the execution plan of a query executed over the
tables Publishers and Title of the SQL Server 2000’s Pubs database.
Figure 1. Example of an execution plan
Observe
that each icon represents a specific operation and the arrows indicate the path
to be followed. The red arrow indicates that the analysis must always be
performed from right to left and from top to bottom.
Execution Plan
The
execution plan describes the path used by the SQL Server’s optimizer to reach
the requested data and shows which operations were executed along the query’s
processing.
2. When analyzing the execution plan, start by
searching for operations with high consumption.
During
the execution plan analysis, begin looking for operations that have a high
percentage of consumption. Looking for operations with high consumption allows
arranging in order of priority which problem should be “attacked” first.
Amongst the operations that have greater consumption and that, therefore, must
be prevented we have:
o
Table and Index Scan
operations;
o
Operations that have
arrows too “thick”;
o
Bookmark Lookups
operations;
o
Sort operations.
Table and Index Scan operations
Table
Scans and Index Scans operations are slow operations and that generate high
server consumption. This because these are operations that navigate through all
the lines of the table or the index, performing a sequential sweeping and
returning the lines that satisfy the where clause (assuming that you use a
where clause).
It
is true that depending on the size of the table, the amount of lines being
returned and the quality of the filter rule, Table Scan may not point a
problem, but mainly, when we speak of great tables, the Table Scan is the worst
of all the operations and indicates that the table does not have index or, if
it has, it is not being used adequately by the query.
Whenever
you find a Table Scan in your execution plan, do not restrain from investigating
it. The Index Scan and Clustered Index Scan perform a sequential sweeping in a
table’s index pages. Since they act on an index, they are better than the Table
Scan, but they also deserve an investigation. This because in general if you
have Index Scan, then a great amount of data is being returned and in the
majority of the times you do not need all this data.
The
scans, in their majority, are solved through the modification or creation of
proper indexes. Some solutions also include modifying the queries in such a way
as to be more selective, that is, to use the where clause to filter at the
maximum possible the returned registers.
Figure
2 shows the graphic representation of the Table, Clustered Index Scan and Index
Scan operators.
Figure 2. Graphic representation of the Table, Clustered Index Scan and Index
Scan operators
Operations that have arrows too “thick”
The
arrows are not operators: they are simply used to unite an operator to another.
Through the arrows we have an estimate of the amount of lines affected by the
operation since its thickness is directly related to the amount of lines
returned by the operation.
The
bigger the thickness of the arrow, the greater is the amount of lines involved
in the operation or the amount of lines passed from an operator to another. In
order for you to see the quantity and size estimate of the affected lines, it
is enough to place the cursor over the arrow.
When
analyzing the execution plan, always give a special attention to the thickest
arrows, for a very thick arrow can indicate a high operation of I/O and,
consequently, containment in the server’s subsystem disk. Another important
point is that most of the times, the very thick arrows are associated to a
Table Scan.
To
solve this kind of problem we must once more make use of the where clause to
filter the returned data and make the arrow as thin as possible. If the arrow
is associated to a Table Scan, first analyze the latter, because probably when
solving the Table Scan you will also solve the arrow’s thickness. The
suggestion here is: avoid obtaining more lines than necessary.
In
Figure 3 we have a query‘s execution plan (modified Employee table of the Pubs
database) which returns almost 700 thousand records. Observe that the query
does not use the where clause.
Figure 3. Execution plan for a query without where clause
Bookmark Lookups Operations
The
Bookmark Lookup operator takes place when the index can be used to satisfy the
search criterion, but it does not have all the data requested by the query.
Normally it occurs together with an Index Scan when the query requires
information on columns that are not part of the index key. In this scenario,
look for Bookmark Lookup that has a high consumption percentage.
In
Figure 4 we can observe that, to obtain the data requested by the query, the
SQL Server executed a Bookmark Lookup operation which consumed 41% of the
query’s total time of execution.
Figure 4. Execution plan with a Bookmark Lookup
This
happened because the fname and lname columns are not part of the index key,
which is composed only of the hire_date column, and with that the SQL Server
needs to access the table’s data pages in order to obtain the data regarding
fname and lname.
If
the Bookmark Lookup operation cost is too high, check if a cluster index or a
non-cluster index, composed by the researched columns, can be used. In Figure
5, the creation of a cluster index composed by the fname and lname columns
solved the problem.
Figure 5. Execution plan after the creation of a cluster index
A
word of advice: always when possible avoid the Bookmark Lookup operation in
your query. This because, although the Bookmark Lookup which deals with small
amounts of data is not a problem, this operation in large amounts of data
increases the I/O rate and consequently damages the query performance.
Sorting Operations
The
sort operation ajust all the lines in an ascending or descending order,
depending on the Order By clause of its query. Sort operations, besides using
the TEMPDB system database for a temporary storage area, also add a great I/O
rate to the operations.
Therefore,
if you are used to seeing the Sort operator frequently in its queries and this
operator has a high consumption operation, consider removing the mentioned
clause. On the other hand, if you know that will always organize your query by
a specific column, consider indexing it.
In
the Create Index command you can determine the ordering direction (Asc or Desc)
of a particular index. Figure 6 presents a Sort operation consuming 23% of the
query’s total execution time.
Figure 6. Sort operation with 23% consumption
3. Avoid the use of cursors and whenever possible
substitute them for “while”.
The
great problem in the use of cursors is that these, by nature are slow and
consume a great deal of the server resource. This happens because the
relational databases are optimized to work with sets of records.
Each
set of records is known as Result Set and treated as a single unit. As example,
the set of records returned by a Select statement consists of all the lines
that satisfy the where clause condition.
The
cursor goes in the opposite direction of this concept, since it was developed
thinking about the work line by line. That is, you use it to navigate line by
line inside a set of records or a Result Set returned by a Select statement.
As
consequence of this use, we have a great volume of packages being sent through
the network, high compilation time and parse of the Fetch statements, blocking
of connections due to the locks in tables or records, at last, high consumption
of the server’s resources and low performance of its application.
In
the face of this, some methods emerged using Transact SQL which can be used to
replace the use of the cursors. Following, I will be presenting two of these
methods.
First,
so that you can understand how the cursor is used, in Listing 1 we have a very
simple example of cursor that navigates line by line in the Authors table of
the Pubs database, displaying the information of the id, last name and first
name fields.
Listing 1. Example of
cursor utilization
Hide Copy Code
1. DECLARE @au_id varchar(15)
2. DECLARE @au_fname varchar(15)
3. DECLARE @au_lname varchar(15)
4. DECLARE cur_authors CURSOR
5. FOR SELECT au_id,
au_fname,au_lname FROM authors
6. OPEN cur_authors
7. FETCH NEXT FROM cur_authors INTO @au_id,@au_fname,
@au_lname
8. WHILE @@FETCH_STATUS=0
9. BEGIN
10. SELECT @au_id,@au_fname,@au_lname
11. FETCH NEXT FROM cur_authors INTO @au_id,
@au_fname,@au_lname
12. END
13. CLOSE cur_authors
14. DEALLOCATE cur_authors
In
the example, from lines 1 to 3 we have the statement of the variables that will
be used to store the data of the fields returned by Select. It is important to
observe that these variables must have the same data type as the columns of the
table.
In
lines 4 and 6, we have the declaration and opening of the cursor itself.
Observe that the Result Set generated at the opening of the cursor will include
all the records and only the columns au_id, au_fname and au_lname of the
authors table. In line 7, Fetch Next takes charge of taking the next record and
filling in the variables with the data obtained from the record.
From
lines 8 through 12, we have the Loop which will keep on being executed for as
long as there are still records (@@FETCH_STATUS=0). In practice, line 8
verifies if still it has records, if there are, in line 10 “prints” the content
of the variables on the screen and in line 11, takes the next record again and
fills in the variables with the data obtained in the record.
In
line 13, we have the closing of the cursor and in line 14 the release of the
memory used by the cursor takes place. Given the example, we will see two
methods that can be used to accomplish the same task, however without the use
of cursors.
The
first method, which is presented in Listing 2, makes use of temporary table and
the Top clause. With this method, you create a snapshot of the desired
information throwing the result of Select in a temporary table.
Listing 2. Method to
replace cursor using temporary table
Hide Copy Code
1. DECLARE @au_id char(11)
2. SELECT au_id,
au_fname,au_lname
INTO #tb_tmp_authors FROM authors
3. SELECT TOP 1 @au_id = au_id FROM #tb_tmp_authors
4. WHILE @@rowcount <> 0
5. BEGIN
6. SELECT au_id,
au_fname,au_lname
8. FROM #tb_tmp_authors WHERE au_id = @au_id
9. DELETE #tb_tmp_authors WHERE au_id = @au_id
10. SELECT TOP 1 @au_id = au_id FROM #tb_tmp_authors
11. END
12. DROP TABLE #tb_tmp_authors
In
the method of Listing 2, line 1 simply declares a variable to store the content
of the au_id column. In line 2, we have the same Select that was used in the
Listing’s 1 cursor declaration. The difference is here that instead of throwing
the result of the Select into a cursor, this result is being thrown into a
temporary table called #tb_tmp_authors.
After
the temporary table load, we can then work with it to deal with the registers
line by line. Observe that in line 3 the Top clause is used with value 1. The
Top 1 clause assures that only the first record of the temporary table will be
returned by Select and consequently we will have the value of the au_id column
from the first record stored in the variable @au_id.
In
line 4 the while command is used to make the Loop and to check the value of the
@@rowcount global variable. This is a system variable which is automatically
filled with the amount of records affected by the Select executed in line 3.
Since we use the Top 1 in the Select, this will always affect a record at a
time and the @@rowcount variable will always be 1 until the temporary table is
empty.
From
lines 6 to 10 are then performed all the desired processing, in this in case we
only print the data in the screen, and observe that in line 9 the record of the
temporary table whose au_id was obtained in the Select of line 3 is excluded.
This
will make it so that the second record of the temporary table becomes the
first. In line 10 the same Select of line 3 is again executed taking the value
from the first record of the column au_id. And thus the process will continue
until the temporary table is empty and with this @@rowcount be equal to 0. At
the end of the processing, line 12 excludes the temporary table.
The
second method, which is presented in Listing 3, does not use a temporary table,
but uses the Min() function to take a record at a time from the Authors table.
Listing 3. Method to
replace cursor using the Min() function
Hide Copy Code
1. DECLARE @au_id char(11)
2. SELECT @au_id = min(au_id) from authors
3. WHILE @au_id is not null
4. BEGIN
5. SELECT au_id,
au_fname,au_lname FROM authors
6. WHERE au_id = @au_id
7. SELECT @au_id = min( au_id ) FROM authors
WHERE au_id > @au_id
8. END
Since
the method uses the Min() function it is necessary to guarantee that the
verification is made over a column that is single and crescent. This will
guarantee that new lines will always have a bigger identifier than the
identifier of the line being processed.
In
this example, in line 1 we have the variable declaration which will store the
content of the au_id column. In line 2, the Select obtains from the Authors
table the record that has the lesser value for the au_id column and stores this
value in the @au_id variable.
In
line 3 the while command is used to make the Loop and to verify if the @au_id
variable is not null, for when it is null it means that no more records to be
processed exist and with this we leave the Loop. Being the @au_id variable
different than null, we then enter the Loop and from lines 5 to 7 we perform
the desired processing.
In
this case we only print the data in the screen and in line 7 we obtain a new
record where the value of the au_id column is larger than the value already
stored in the @au_id variable. And thus the processing continues until it
reaches the last record in the table.
As
we have seen, not always do we need to use cursor to process our data inside
the SQL Server. If you execute the three examples mentioned here you will see
that the result will be the same.
4. Substitute the UNION operator by UNION ALL
whenever possible.
When
you are using the Union operator to combine the result of two queries, keep in
mind that this perform a Select Distinct in the final result to remove possible
duplicate records, even if there are no duplicate records.
Before
this, the advice is that not having the possibility of duplicate records or if
there are no problems for the application that the final result presents
duplications, use the Union All operator. Since this operator does not execute
Select Distinct in the final result, it uses less SQL Server resources and
therefore, improves the query performance.
In
Figure 7 we have two queries that perform the same operation over the Orders
table of the Northwind database, one using the Union operator and the other,
the Union All.
Observe
that the query with Union All will display all the records including the duplicates.
But this consumes less of the server’s resource for not performing the Select
Distinct in the final result.
Figure 7. Example of queries using the Union and Union All operators
5. Substitute Sub-queries by Joins.
Many
Transact SQL instructions that make use of sub-queries may be rewritten using
joins. It is true that many of the times you will not have performance benefits
when using sub-queries or joins, but in some cases where, for example, the
existence of a value needs to be verified, the use of joins will produce better
results.
So
being, whenever possible look to substitute your sub-queries for joins. In
Listing 4 we have two Select instructions being one written with sub-query and
the other with join.
Listing 4. Select
instructions using sub-query and join
Hide Copy Code
--
SELECT instruction using sub-query
SELECT ProductID,SupplierID, ProductName
FROM Products WHERE SupplierID IN
(SELECT SupplierID FROM Suppliers
WHERE (country = 'Brazil'))
--
SELECT instruction using join
SELECT prd.ProductID, prd.SupplierID, prd.ProductName
FROM Products prd INNER JOIN Suppliers sup
ON prd.SupplierID = sup.SupplierID
WHERE sup.country = 'Brazil'
Observe
that when being executed both will produce the same result: a list with all the
products of suppliers in Brazil.
6. In the WHERE clause do not use indexed column
in functions.
The
simplest way to make a column not able to be indexed, is to put this column in
a function! In the SQL Server, the use of the Substring function in the where
clause is every common, however, what very few know is that when you place an
indexed column inside a function, the SQL Server ends up not using the index in
a suitable fashion and many times does not even use it at all.
In
these situations, the best thing to do is to move the function to the other
side of the equation in the where clause or if possible not use it at all. In
Figure 8 we have an example of how the use of function in indexed column in the
where clause can prevent the SQL Server from using the index correctly.
Figure 8. Example of queries using the SUBSTRING function and the LIKE command
The
two queries presented in Figure 8 have as objective to obtain all employees
whose first name starts with the characters “Ma”. Observe that in the first query
the Substring function is used to break the fname column taking only the first
two characters and comparing them to the “Ma” string.
Since
the SQL Server never knows which will be the characters to be researched, the
process is performed for each of the table’s lines, it ends up performing an
Index Scan, sweeping all the pages of the index in a sequential way.
In
the second query the function was subtracted by the Like command. In this case,
since the indexed column is not affected, the SQL Server manages to use the
index in a suitable fashion performing an Index Seek in the index pages.
7. Whenever possible try to use operators that
are capable of being indexed.
Similar
to the problem of the use of indexed columns in functions, there is also a set
of operators which when used, they can prevent the SQL Server from using the
index in a suitable fashion. These are known as operators incapable of being
indexed.
The
positive operators are generally capable of being indexed: =, >, >=,
<, <=, Between and Like when used in the: Like ‘word%’ manner. The
negative operators are generally incapable of being indexed: <>, Not, Not
Exists, Not In, Not Like and Like when used in the: Like ‘%word’ manner.
8. When in need of obtaining an amount of records
form a table, avoid using the Select Count(*).
Whenever
we need to obtain the amount of records from a table, the first T-SQL
instruction that comes to the mind is: “Select Count (*) From table”.
The
problem with this instruction is that most of the times it performs a Table or
Index Scan to return the amount of records in the table. For large tables this
is a synonymous of slow query and high consumption of server resources.
A
simpler way to perform this same task without causing impact is using the
systems table called sysindexes. This table has a column called rows which
stores the total amount of records for each table of your database.
Being
so, whenever possible use the T-SQL instruction that follows, to obtain the
amount of records in the table.
Hide Copy Code
SELECT rows FROM sysindexes
WHERE id = OBJECT_ID('table_name') AND indid<2
Since
the sysindexes table does not have a column with the name of the tables, but
only their ids, the instruction uses the Object_ID() function so that the SQL
Server may, by the name of the table, identify its respective id inside the
sysindexes table.
9. Always use the Set NoCount On inside your
Stored Procedures.
This
is a “best practice” which I rarely see developers use. Certainly when running
T-SQL instructions such as Select, Insert, Update and Delete you must have
already seen the “nn row(s) affected” message making part of the result of your
query.
Maybe
you do not know, but this apparently harmless message may generate a great
impact in the performance of your Stored Procedures. This because when you
execute a SP which has several T-SQL instructions, this message is sent to the
client for every instruction inside the SP, which ends up generating an
unnecessary network traffic.
The
Set NoCount option disables the sending of these messages. With this, the SPs
that possess several T-SQL instructions, may present significant performance
improvement once the network traffic will be greatly reduced.
10. When creating composed index, order the
columns in the index in a way as to satisfy the Where clause of most of your
queries.
One
thing that must always be kept in mind when working with composed (indexes with
two or more of the table’s columns), is that the index will only be used by the
query if the first column of the key of a composed index is specified in the
where clause.
Therefore,
when working with composed index, the order of the columns in the index is very
important. In order for you to understand better, let us move to an example:
Assume that your database has a table called tb_employee and that this table
has an index composed by the columns last_name, first_name in that respective
order. When analyzing the execution plan of a query that has the where clause
as:
Hide Copy Code
where last_name='Pinheiro'
You
will see that the index was adequately used by the SQL Server’s consultation
optimizer, but when analyzing the plan from a consultation that has the where
clause as:
Hide Copy Code
You
will see that the index was not used. Therefore, when using composed indexes,
make sure that the where clause always has the first column of the index.
Komentar
Posting Komentar