Common
table expression (CTE) vs. derived table vs. temporary table vs. table variable
best practice and query optimization
We generally come across scenario where there is
need to store the data at the intermediate level before perform final
operation. Here we are going to discuss
about some interesting approaches which are followed by some developers. Before
to discuss about this, I want say that storing the data in the intermediate
table are costly operation and should be used only if it is really necessary.
a. Intermediate update to fetch only data:
Let’s consider a business need to fetch user name
and their point from the user table while their point should be 30% less than
current point. The approach we are going to discuss with you, actually I’m not
kidding with you but I found during one of my code review:
SELECT vcUserName
,ntUserPoint
INTO #TempUserPointTable
FROM tblUser
UPDATE #TempUserPointTable
SET ntUserPoint =
ntUserPoint - (ntUserPoint * 0.3)
SELECT *
FROM #TempUserPointTable
DROP TABLE
#TempUserPointTable
Total logical read: 367+ 237+ 237 = 847
Same business need, can also be full filled by below
script:
SELECT vcUserName
,ntUserPoint
- (ntUserPoint * 0.3) AS ntUserPoint
FROM tblUser
Total
logical read: 367
So,
here is use of intermediate temporary table was absolute unnecessary. If we
will compare the execution cost using execution plan, approach using
intermediate temporary table has cost
95% while without using intermediate temporary table has cost 5% compare
to the batch.
b. Unnecessary
use of temporary tables
Some of developers are very big fan of temporary
table. Let’s consider below script:
DECLARE @tblPostTemp TABLE (
ntUserID BIGINT
,ntPostID
BIGINT
,xlPostData
XML
)
DECLARE @tblUserPostCommentCount TABLE (
ntPostID BIGINT
,ntCommentCount
INTEGER
)
INSERT INTO
@tblPostTemp
SELECT ntUserID
,ntPostID
,xlPostData
FROM tblPost
INSERT INTO
@tblUserPostCommentCount
SELECT ntPostID
,COUNT(*)
FROM tblComment
GROUP BY
ntPostID
SELECT P.ntPostID
,P.ntUserID
,P.xlPostData
,UPCC.ntCommentCount
FROM @tblPostTemp P
INNER JOIN
@tblUserPostCommentCount UPCC
ON P.ntPostID
= UPCC.ntPostID
Total logical read: 18830
+ 23550 + 18751
+ 1300 = 62431
In this approach each output of first
two queries are stored in the intermediate temporary table and then join
operation is performed using intermediate table to get desire output.
second approach:
SELECT P.ntPostID
,P.ntUserID
,P.xlPostData
,UPCC.ntCommentCount
FROM tblPost P
INNER JOIN (
SELECT
ntPostID
,COUNT(*) AS ntCommentCount
FROM
tblComment
GROUP
BY ntPostID
)
UPCC ON P.ntPostID
= UPCC.ntPostID
Total logical read: 23550
+ 19157 = 42707
It is obvious second
approach is much better than first approach.
Then same basic question come to our mind, when to
use temporary objects and when not use?
A temporary
objects (Temporary table and table variable) may perform better when:
a. Use temporary objects when there is more than one
referenced of objects.
There
is a probability that an intermediate storage of result into a temporary object
may perform better when temporary object is used more than once. For example:
SELECT P.ntPostID
,P.ntUserID
,P.xlPostData
,UPCC.ntCommentCount
FROM tblPost P
INNER JOIN (
SELECT
dtCommentDate
,COUNT(*) AS ntCommentCount
FROM
tblComment
GROUP
BY dtCommentDate
)
UPCC ON P.dtPostDate
= UPCC.dtCommentDate
SELECT *
FROM tblUser U
INNER JOIN (
SELECT
dtCommentDate
,COUNT(*) AS ntCommentCount
FROM
tblComment
GROUP
BY dtCommentDate
)
UPCC ON U.dtCreateDate
= UPCC.dtCommentDate
Total
logical read: 148831
In
the above script same derived table UPCC has
executed two times. In this case it is better to save the output of derived
table UPCC into temporary table.
SELECT dtCommentDate
,COUNT(*) AS ntCommentCount
INTO #tblUserPostCommentCount
FROM tblComment
GROUP BY
dtCommentDate
SELECT P.ntPostID
,P.ntUserID
,P.xlPostData
,UPCC.ntCommentCount
FROM tblPost P
INNER JOIN
#tblUserPostCommentCount UPCC
ON P.dtPostDate
= UPCC.dtCommentDate
SELECT *
FROM tblUser U
INNER JOIN
#tblUserPostCommentCount UPCC
ON U.dtCreateDate
= UPCC.dtCommentDate
Total
logical read: 84378
So, logical read say in this case temporary table is
performing much better. But it is not always true. There is possibility even if
there is more than one reference of temporary object, derived table or CTE may
perform better. For example:
a. When
there is need to execute very complex queries
It
is recommend using temporary tables or table variables when there is need to
execute very costly query on a table. By using the temporary object it may
degrade the query performance to execute a single query. Then why it is
recommended to use temporary objects?
It
is due to locking of a table. In real world a table is referenced by many
queries. During the query execution a table is locked according to query,
isolation level etc. So we are going to execute a very costly query and it is
locking the table for significant amount that even if that queries is most
efficient from number of logical read it is better to choose little least
efficient queries to decrease the overall lock on a particular table if that
table is highly used by many queries. In that case we should store the data in
the temporary tables and costliest queries should be executed on temporary
object instead of the actual table.
b. When
there is very big join statement
There were many examples of real world hazards where
whole database has locked due to a single join statement. It is highly
recommend that a join statement should be very large. We should not join more
than four big tables a query (These numbers depends upon size of table, server
load, configuration etc.) As far as single query is concerned a single query
joined with multiple tables will perform better. So here this recommendation is
not to improve the performance of single query but to decrease lock time to
increase the overall performances of all the queries which uses the table. Solution is very simple uses temporary
objects and simplify the JOIN statement. But it is not as simple as look like. I
want to explain the complexity by a good example:
SELECT *
FROM tblUser U
INNER JOIN
tblCity C ON U.ntCityID
= C.ntCityID
INNER JOIN
tblUserType UT ON U.ntUserTypeID
= UT.ntUserTypeID
LEFT JOIN
tblUserPhoto UP ON U.ntUserID
= UP.ntUserID
LEFT JOIN
tblUserFavSport UFS ON U.ntUserID = UFS.ntUserID
LEFT JOIN
tblUserContact UC ON U.ntUserID
= UC.ntUserID
WHERE U.ntUserID
= 10
Total logical read: 10
Uses the temporary objects don’t mean populate data
of each table into temporary table, something like this:
SELECT *
INTO #tblUser
FROM tblUser
SELECT *
INTO #tblCity
FROM tblCity
SELECT *
INTO #tblUserType
FROM tblUserType
SELECT *
INTO #tblUserPhoto
FROM tblUserPhoto
SELECT *
INTO #tblUserFavSport
FROM tblUserFavSport
SELECT *
INTO #tblUserContact
FROM tblUserContact
SELECT *
FROM #tblUser U
INNER JOIN
#tblCity C ON U.ntCityID
= C.ntCityID
INNER JOIN
#tblUserType UT ON U.ntUserTypeID
= UT.ntUserTypeID
LEFT JOIN
#tblUserPhoto UP ON U.ntUserID
= UP.ntUserID
LEFT JOIN
#tblUserFavSport UFS ON U.ntUserID = UFS.ntUserID
LEFT JOIN
#tblUserContact UC ON U.ntUserID
= UC.ntUserID
WHERE U.ntUserID
= 10
DROP TABLE
#tblUser
DROP TABLE
#tblCity
DROP TABLE
#tblUserType
DROP TABLE
#tblUserPhoto
DROP TABLE
#tblUserFavSport
DROP TABLE
#tblUserContact
Total logical read: 815
+ 5 + 2 + 3118 + 3097 + 2 + 2 + 2 + 4 + 812 = 7859
No one will accept the decrease the table lock time
in the cost of increase of logical read from 10 to 7859. Splitting the join
query and using temporary table is skills. So, here our challenge is to split
the JOIN statement in such way that increases in number of logical read should
be minimal. Here key point is numbers of rows processed and stored should be
minimal. For example:
SELECT U.*
,C.vcCityName
,UT.vcUserTypeName
,UP.ntUserPhotoID
,UP.vcPhotoPath
INTO #tblTempUser
FROM tblUser U
INNER JOIN
tblCity C ON U.ntCityID
= C.ntCityID
INNER JOIN
tblUserType UT ON U.ntUserTypeID
= UT.ntUserTypeID
LEFT JOIN
tblUserPhoto UP ON U.ntUserID
= UP.ntUserID
WHERE U.ntUserID
= 10
SELECT *
FROM #tblTempUser U
LEFT JOIN
tblUserFavSport UFS ON U.ntUserID = UFS.ntUserID
LEFT JOIN
tblUserContact UC ON U.ntUserID
= UC.ntUserID
WHERE U.ntUserID
= 10
DROP TABLE
#tblTempUser
Total logical read: 12
Logical read increasing from 10 to 12 is acceptable
and both the SQL script is joining only three tables. So, in this way we have
decreased the joining from 6 tables to 3 tables. So this solution look like
have a proper balance of increase in logical read and decease in the number of
tables in the JOIN statement.
No comments:
Post a Comment