Do you know
how SQL queries are processed in SQL Server? What are the execution order steps? What are
the possible scopes where queries can be tuned and what are the limitations?
Here we are going to discuss about query life cycle in detail.
A sql statement which is passed as an input to database engine
is processed through many different phases before getting back the final
output. This topic describes what those phases are and where we can help to sql
server to improve query performance. Below diagram illustrates processing of SELECT
statement:
![]() |
Query Processing Steps SELECT |
Below diagram illustrates processing of DML statement:
![]() |
Query Processing Steps DML |
Note:
In the actual execution, order of execution may differ from the above diagram.
Here, we will discuss each phases in little details and scope of
performance improvement.
1.1.1 Input sql statement
What sql query is passed to sql server is totally depends on sql
developer and database engine has not any control over this. It is very
difficult to improve the performance of badly written queries. It is our
responsibility to write efficient queries.
1.1.2
Query parsing
In this step sql server checks the syntax of the query. If
syntax of the sql statement is incorrect then it throws an error message of
severity level 15 otherwise it goes to next step.
We can parse a sql query without compiling or executing it by
setting PARSEONLY. For example:
SET PARSEONLY ON
SELECT *
FROM sys.tables
SET PARSEONLY OFF
Note: In SSMS there Parse button to parse sql statements.
We can get the list of all syntax error
message sql server by following sql script:
SELECT [text]
FROM sys.messages
WHERE severity = 15
AND language_id =
1033
1.1.3
Syntax normalization or standardization
In this phase sql server standardize the sql queries. It also
check the existence of database objects like tables, views etc. If it doesn't
exist then it throws error of severity 16.
Examples queries standardization:
a. Qualifying the database tables with schema and database name.
For example:
SELECT *
FROM tblUser
Will modify to something like:
SELECT *
FROM
[Exact_Help].[dbo].[tblUser].[PK__tblUser__C48487AE76619304]
Here Exact_Help is database name, dbo is schema
name and PK__tblUser__C48487AE76619304 is a clustered index.
b. Replace the * by actual column name. For
example:
SELECT *
FROM tblUser
Will modify to something like:
SELECT [MFS].[dbo].[tblUser].[ntUserID]
,[MFS].[dbo].[tblUser].[vcUserName]
FROM tblUser
c. Add indentifer quote ([]) in the all
database objects like tables, columns, schema, database etc. For example:
SELECT vcUserName
FROM tblUser
WHERE "ntUserID" = 10
Will modify to something like:
SELECT [MFS].[dbo].[tblUser].[vcUserName]
FROM [MFS].[dbo].[tblUser].[PK__tblUser__C48487AE76619304]
WHERE MFS].[dbo].[tblUser].[ntUserID] = 10
d. In the below query, ntUserID (primary key
column) will be replaced by *:
SELECT COUNT(ntUserID)
FROM tblUser
Will modify to something like:
SELECT COUNT(*)
FROM MFS].[dbo].[tblUser].[PK__tblUser__C48487AE76619304]
e. Expand the view by the view definition. For
example, Lets assume there is view named view_GetOrder:
CREATE VIEW view_GetOrder
AS
SELECT ntUserID
,MAX(dtOrderDate) AS dtOrderDate
FROM tblOrder
GROUP BY ntUserID
Consider on below sql script:
SELECT *
FROM tblUser U
INNER JOIN view_GetOrder O
ON O.ntUserID =
U.ntUserID
Will modify to something like this:
SELECT [U].[ntUserID]
,[U].[vcUserName]
,[MFS].[dbo].[tblOrder].[ntUserID]
,MAX([MFS].[dbo].[tblOrder].[dtOrderDate])
FROM [MFS].[dbo].[tblUser].[PK__tblUser__C48487AE76619304]
AS [U]
INNER JOIN [MFS].[dbo].[tblOrder]..[PK__tblOrder__27C9BAA27A3223E8]
ON [MFS].[dbo].[tblOrder].[ntUserID]
= [U].[ntUserID]
GROUP BY [MFS].[dbo].[tblOrder].[ntUserID]
,[U].[ntUserID]
,[U].[vcUserName]
f. Syntax optimization
In this phase syntax of queries are also
optimized. For example, it modifies the order of prdicates in the WHERE clause,
ON clasue, HAVING cluase etc. It also modifies join order of tables in the JOIN
satement.
g. In
the following sql query:
SELECT *
FROM tblUser
WHERE ntUserID IS NULL
Sql server replace the above SQL query by
constant expression since it knows, ntUserID is a primary key column and it
cannot be null. Hence there is not need to fetch data from the table tblUser.
1.1.4
Statistics creation
When we execute any queries first time, sql
server may create the single column statistics for columns in filter conditions
if auto create statistics is enabled and there is not any existing statistics
on those columns .
There is separate topic on statistics. For now,
statistics keeps the statical informations like total number of distinct
values, density and many for this of column which is very help while
generating execution plan(There is also a separate topic on execution plan) .
For examples, Lets assume after executing below script:
SELECT *
FROM tblActor
WHERE vcActorName = 'Tom Cruise'
SQL server created a statistics named _WA_Sys_00000002_2645B050
Script to get statistical data from a statistics:
DBCC SHOW_STATISTICS (
"tblActor"
,_WA_Sys_00000002_2645B050
)
Sample output:
Header:
Name
|
Rows
|
Rows
Sampled
|
Steps
|
Density
|
Average key
length
|
String
Index
|
Filter
Expression
|
Unfiltered
Rows
|
_WA_Sys_00000002_2645B050
|
7
|
7
|
7
|
0
|
10.85714
|
YES
|
NULL
|
7
|
Density vector:
All density
|
Average Length
|
Columns
|
0.1428571
|
10.85714
|
vcActorName
|
Histogram:
RANGE_HI_KEY
|
RANGE_ROWS
|
EQ_ROWS
|
DISTINCT_RANGE_ROWS
|
AVG_RANGE_ROWS
|
Amy Adams
|
0
|
1
|
0
|
1
|
Diane Lane
|
0
|
1
|
0
|
1
|
Henry Cavill
|
0
|
1
|
0
|
1
|
Jane Levy
|
0
|
1
|
0
|
1
|
Mark Wahlberg
|
0
|
1
|
0
|
1
|
Robert Downey
|
0
|
1
|
0
|
1
|
Tom Cruise
|
0
|
1
|
0
|
1
|
1.1.5
Query compilation
In this phase Transact sql language statement are executed. For
examples:
a.
Variable declaration and assignment
b.
Execution of IF-ELSE statements, WHILE loops, CURSOR etc.
1.1.6
Generating the execution plan
In this
phase SQL Server query optimizer generates the execution plan of the
queries. SQL server query optimizer
try to generate best execution plan based on the cost of different resources (disk, memory etc) used by the query
during the execution. Execution plans
are dependent upon user created objects:
a. Statistics
b. Indexes
It also dependent on table definition and data constraints,
query hints, hardware etc. Indexes are
the objects which keep data which are sorted on different fields. If a table
has not sufficient indexes then it may lead to bad execution plan. It is task
of SQL developer to create indexes. If a table has too many indexes then query
optimizer may not analyze the all the indexes, it may lead sub optimal
execution plan
Statistics are used to get the information about total numbers
of records in a table, density and many things. If table has not sufficient
statistics or has statistics but not updated then it may lead to bad execution plan.
It is task of SQL developer to create statistics and keep it updated. After
generating the execution plan it is kept in cached plan. It is an optional step
i.e. If a query has already cached
execution then it will reuse otherwise it would be generated.
Script to get list of all cached execution plan:
SELECT ECP.objtype [ObjectType]
,EST.TEXT [Query]
,EQP.query_plan [QueryPlan]
,ECP.usecounts [PlanUseCount]
,(ECP.size_in_bytes) / 1024 AS
[PlanSizeKB]
FROM sys.dm_exec_cached_plans ECP
CROSS APPLY sys.dm_exec_sql_text(plan_handle) EST
CROSS APPLY sys.dm_exec_query_plan(plan_handle) EQP
1.1.7
Execution of execution plan
Execution plan keeps the physical operators. These physical operators
are actually algorithm (Nested loop joins, merge join, hash join, sorting,
aggregate etc) to process the data. In this phase those algorithms are executed.
1.1.8
Physical Read and Write
Loading data, index etc pages from disk to member buffer is called
physical read. Physical write is just its opposite. During the execution of
execution plan, if required pages are not present in the memory buffer then it
is loaded from the disk.
Script to
get pages in the memory buffer:
SELECT (
SELECT
NAME
FROM
sys.indexes
WHERE
object_id =
obj.object_id
AND index_id = obj.index_id
)
AS IndexName
,page_id
,page_type
,row_count
FROM sys.dm_os_buffer_descriptors AS
bd
INNER JOIN
(
SELECT
object_id
,index_id
,allocation_unit_id
FROM
sys.allocation_units AS
au
INNER
JOIN sys.partitions AS p
ON
au.container_id =
p.hobt_id
AND (
au.type = 1
OR au.type
= 3
)
UNION
ALL
SELECT
object_id
,index_id
,allocation_unit_id
FROM
sys.allocation_units AS
au
INNER
JOIN sys.partitions AS p
ON
au.container_id =
p.partition_id
AND
au.type = 2
)
AS obj
ON
bd.allocation_unit_id =
obj.allocation_unit_id
WHERE database_id = DB_ID()
AND object_name(object_id) = 'tblUser'
1.1.9
Logical Read and Write
Fetching the actual data from memory buffer to end user is
called logical read and logic write is just its opposite.
1.1.10
Transactional Log
In case of DML statement transactional logs are kept.
Transaction log operation is performed before the physical write. Transaction log is used to rollback data, point
in time data recovery etc.
Script to
get the transaction log of current database:
SELECT *
FROM sys.fn_dblog(NULL,NULL)
1 comment:
Excellent !! Thanks so much for such detailed explanation. It really helped a lot. Could you please do a post of SQL Architecture in Detail and Simple manner.
Post a Comment