Dec 20, 2014

Query Execution and process Life Cycle in SQL SERVER

Do you know how SQL queries are processed in SQL Server? What are the execution order steps? What are the possible scopes where queries can be tuned and what are the limitations? Here we are going to discuss about query life cycle in detail.

A sql statement which is passed as an input to database engine is processed through many different phases before getting back the final output. This topic describes what those phases are and where we can help to sql server to improve query performance.  Below diagram illustrates processing of SELECT statement:

Query Processing Steps SELECT

Below diagram illustrates processing of DML statement:

Query Processing Steps DML
Note: In the actual execution, order of execution may differ from the above diagram.  

Here, we will discuss each phases in little details and scope of performance improvement.

1.1.1         Input sql statement


What sql query is passed to sql server is totally depends on sql developer and database engine has not any control over this. It is very difficult to improve the performance of badly written queries. It is our responsibility to write efficient queries.

1.1.2         Query parsing


In this step sql server checks the syntax of the query. If syntax of the sql statement is incorrect then it throws an error message of severity level 15 otherwise it goes to next step.
We can parse a sql query without compiling or executing it by setting PARSEONLY. For example:

SET PARSEONLY ON

SELECT *
FROM sys.tables

SET PARSEONLY OFF

Note: In SSMS there Parse button to parse sql statements.
We can get the list of all syntax error message sql server by following sql script:

SELECT [text]
FROM sys.messages
WHERE severity = 15
     AND language_id = 1033

1.1.3         Syntax normalization or standardization


In this phase sql server standardize the sql queries. It also check the existence of database objects like tables, views etc. If it doesn't exist then it throws error of severity 16.   Examples queries standardization:

a. Qualifying the database tables with schema and database name. For example:

SELECT *
FROM tblUser

Will modify to something like:

SELECT *
FROM [Exact_Help].[dbo].[tblUser].[PK__tblUser__C48487AE76619304]

Here Exact_Help is database name, dbo is schema name and PK__tblUser__C48487AE76619304 is a clustered index.

b. Replace the * by actual column name. For example:

SELECT *
FROM tblUser

Will modify to something like:

SELECT [MFS].[dbo].[tblUser].[ntUserID]
     ,[MFS].[dbo].[tblUser].[vcUserName]
FROM tblUser

c. Add indentifer quote ([]) in the all database objects like tables, columns, schema, database etc. For example:

SELECT vcUserName
FROM tblUser
WHERE "ntUserID" = 10

Will modify to something like:
SELECT [MFS].[dbo].[tblUser].[vcUserName]
FROM [MFS].[dbo].[tblUser].[PK__tblUser__C48487AE76619304]
WHERE MFS].[dbo].[tblUser].[ntUserID] = 10

d. In the below query, ntUserID (primary key column) will be replaced by *:

SELECT COUNT(ntUserID)
FROM tblUser

Will modify to something like:

SELECT COUNT(*)
FROM MFS].[dbo].[tblUser].[PK__tblUser__C48487AE76619304]

e. Expand the view by the view definition. For example, Lets assume there is view named view_GetOrder:

CREATE VIEW view_GetOrder
AS
SELECT ntUserID
     ,MAX(dtOrderDate) AS dtOrderDate
FROM tblOrder
GROUP BY ntUserID

Consider on below sql script:

SELECT *
FROM tblUser U
INNER JOIN view_GetOrder O
ON O.ntUserID = U.ntUserID

Will modify to something like this:

SELECT [U].[ntUserID]
     ,[U].[vcUserName]
     ,[MFS].[dbo].[tblOrder].[ntUserID]
     ,MAX([MFS].[dbo].[tblOrder].[dtOrderDate])
FROM [MFS].[dbo].[tblUser].[PK__tblUser__C48487AE76619304] AS [U]
INNER JOIN [MFS].[dbo].[tblOrder]..[PK__tblOrder__27C9BAA27A3223E8]
ON [MFS].[dbo].[tblOrder].[ntUserID] = [U].[ntUserID]
GROUP BY [MFS].[dbo].[tblOrder].[ntUserID]
     ,[U].[ntUserID]
  ,[U].[vcUserName]

f. Syntax optimization

In this phase syntax of queries are also optimized. For example, it modifies the order of prdicates in the WHERE clause, ON clasue, HAVING cluase etc. It also modifies join order of tables in the JOIN satement. 

g.  In the following sql query:

SELECT *
FROM tblUser
WHERE ntUserID IS NULL

Sql server replace the above SQL query by constant expression since it knows, ntUserID is a primary key column and it cannot be null. Hence there is not need to fetch data from the table tblUser.

1.1.4         Statistics creation


When we execute any queries first time, sql server may create the single column statistics for columns in filter conditions if auto create statistics is enabled and there is not any existing statistics on those columns .

There is separate topic on statistics. For now, statistics keeps the statical informations like total number of distinct values, density and many for this of column which is very help while generating execution plan(There is also a separate topic on execution plan) . For examples, Lets assume after executing below script:

SELECT *
FROM tblActor
WHERE vcActorName = 'Tom Cruise'

SQL server created a statistics named _WA_Sys_00000002_2645B050  

Script to get statistical data from a statistics:

DBCC SHOW_STATISTICS (
          "tblActor"
          ,_WA_Sys_00000002_2645B050
          )

Sample output:

Header:
Name
Rows
Rows Sampled
Steps
Density
Average key length
String Index
Filter Expression
Unfiltered Rows
_WA_Sys_00000002_2645B050
7
7
7
0
10.85714
YES
NULL
7

Density vector:

All density
Average Length
Columns
0.1428571
10.85714
vcActorName

Histogram:

RANGE_HI_KEY
RANGE_ROWS
EQ_ROWS
DISTINCT_RANGE_ROWS
AVG_RANGE_ROWS
Amy Adams
0
1
0
1
Diane Lane
0
1
0
1
Henry Cavill
0
1
0
1
Jane Levy
0
1
0
1
Mark Wahlberg
0
1
0
1
Robert Downey
0
1
0
1
Tom Cruise
0
1
0
1

1.1.5         Query compilation


In this phase Transact sql language statement are executed. For examples:
a. Variable declaration and assignment
b. Execution of IF-ELSE statements,    WHILE loops, CURSOR etc.

1.1.6         Generating the execution plan


In this phase SQL Server query optimizer generates the execution plan of the queries.    SQL server query optimizer try to generate best execution plan based on the cost of different resources (disk, memory etc)   used by the query during the execution.  Execution plans are dependent upon user created objects:

a.     Statistics
b.     Indexes

It also dependent on table definition and data constraints, query hints, hardware etc.  Indexes are the objects which keep data which are sorted on different fields. If a table has not sufficient indexes then it may lead to bad execution plan. It is task of SQL developer to create indexes. If a table has too many indexes then query optimizer may not analyze the all the indexes, it may lead sub optimal execution plan
Statistics are used to get the information about total numbers of records in a table, density and many things. If table has not sufficient statistics or has statistics but not updated then it may lead to bad execution plan. It is task of SQL developer to create statistics and keep it updated. After generating the execution plan it is kept in cached plan. It is an optional step i.e.  If a query has already cached execution then it will reuse otherwise it would be generated. 

Script to get list of all cached execution plan:

SELECT ECP.objtype [ObjectType]
     ,EST.TEXT [Query]
     ,EQP.query_plan [QueryPlan]
     ,ECP.usecounts [PlanUseCount]
     ,(ECP.size_in_bytes) / 1024 AS [PlanSizeKB]
FROM sys.dm_exec_cached_plans ECP
CROSS APPLY sys.dm_exec_sql_text(plan_handle) EST
CROSS APPLY sys.dm_exec_query_plan(plan_handle) EQP

1.1.7         Execution of execution plan


Execution plan keeps the physical operators. These physical operators are actually algorithm (Nested loop joins, merge join, hash join, sorting, aggregate etc) to process the data. In this phase those algorithms are executed.

1.1.8         Physical Read and Write


Loading data, index etc pages from disk to member buffer is called physical read. Physical write is just its opposite. During the execution of execution plan, if required pages are not present in the memory buffer then it is loaded from the disk.

Script to get pages in the memory buffer:

SELECT (
          SELECT NAME
          FROM sys.indexes
          WHERE object_id = obj.object_id
              AND index_id = obj.index_id
          ) AS IndexName
     ,page_id
     ,page_type
     ,row_count
FROM sys.dm_os_buffer_descriptors AS bd
INNER JOIN (
     SELECT object_id
          ,index_id
          ,allocation_unit_id
     FROM sys.allocation_units AS au
     INNER JOIN sys.partitions AS p
     ON au.container_id = p.hobt_id
          AND (
              au.type = 1
              OR au.type = 3
              )
    
     UNION ALL
    
     SELECT object_id
          ,index_id
          ,allocation_unit_id
     FROM sys.allocation_units AS au
     INNER JOIN sys.partitions AS p
     ON au.container_id = p.partition_id
          AND au.type = 2
     ) AS obj
     ON bd.allocation_unit_id = obj.allocation_unit_id
WHERE database_id = DB_ID()
  AND object_name(object_id) = 'tblUser'

1.1.9         Logical Read and Write


Fetching the actual data from memory buffer to end user is called logical read and logic write is just its opposite.  

1.1.10      Transactional Log


In case of DML statement transactional logs are kept. Transaction log operation is performed before the physical write.  Transaction log is used to rollback data, point in time data recovery etc.

Script to get the transaction log of current database:

SELECT *

FROM sys.fn_dblog(NULL,NULL)

1 comment:

  1. Excellent !! Thanks so much for such detailed explanation. It really helped a lot. Could you please do a post of SQL Architecture in Detail and Simple manner.

    ReplyDelete