Dec 27, 2014

When to use derived tables, CTE and Temporary tables SQL server performance tuning

Common table expression (CTE) vs. derived table vs. temporary table vs. table variable best practice and query optimization    


We generally come across scenario where there is need to store the data at the intermediate level before perform final operation.  Here we are going to discuss about some interesting approaches which are followed by some developers. Before to discuss about this, I want say that storing the data in the intermediate table are costly operation and should be used only if it is really necessary.

a.  Intermediate update to fetch only data:

Let’s consider a business need to fetch user name and their point from the user table while their point should be 30% less than current point. The approach we are going to discuss with you, actually I’m not kidding with you but I found during one of my code review:

SELECT vcUserName
     ,ntUserPoint
INTO #TempUserPointTable
FROM tblUser

UPDATE #TempUserPointTable
SET ntUserPoint = ntUserPoint - (ntUserPoint * 0.3)

SELECT *
FROM #TempUserPointTable

DROP TABLE #TempUserPointTable

Total logical read: 367+ 237+ 237 = 847

Same business need, can also be full filled by below script:

SELECT vcUserName
     ,ntUserPoint - (ntUserPoint * 0.3) AS ntUserPoint
FROM tblUser


Total logical read: 367

So, here is use of intermediate temporary table was absolute unnecessary. If we will compare the execution cost using execution plan, approach using intermediate temporary table has cost   95% while without using intermediate temporary table has cost 5% compare to the batch.

b. Unnecessary use of temporary tables
Some of developers are very big fan of temporary table.  Let’s consider below script:

DECLARE @tblPostTemp TABLE (
     ntUserID BIGINT
     ,ntPostID BIGINT
     ,xlPostData XML
     )
DECLARE @tblUserPostCommentCount TABLE (
     ntPostID BIGINT
     ,ntCommentCount INTEGER
     )

INSERT INTO @tblPostTemp
SELECT ntUserID
     ,ntPostID
     ,xlPostData
FROM tblPost

INSERT INTO @tblUserPostCommentCount
SELECT ntPostID
     ,COUNT(*)
FROM tblComment
GROUP BY ntPostID

SELECT P.ntPostID
     ,P.ntUserID
     ,P.xlPostData
     ,UPCC.ntCommentCount
FROM @tblPostTemp P
INNER JOIN @tblUserPostCommentCount UPCC
ON P.ntPostID = UPCC.ntPostID

Total logical read: 18830 + 23550 + 18751 + 1300 = 62431

In this approach each output of first two queries are stored in the intermediate temporary table and then join operation is performed using intermediate table to get desire output. 
second approach:

SELECT P.ntPostID
     ,P.ntUserID
     ,P.xlPostData
     ,UPCC.ntCommentCount
FROM tblPost P
INNER JOIN (
     SELECT ntPostID
          ,COUNT(*) AS ntCommentCount
     FROM tblComment
     GROUP BY ntPostID
     ) UPCC ON P.ntPostID = UPCC.ntPostID


Total logical read: 23550 + 19157 = 42707

It is obvious second approach is much better than first approach.


Then same basic question come to our mind, when to use temporary objects and when not use?

A temporary objects (Temporary table and table variable) may perform better when:

a. Use temporary objects when there is more than one referenced of objects.

There is a probability that an intermediate storage of result into a temporary object may perform better when temporary object is used more than once. For example:
          
SELECT P.ntPostID
     ,P.ntUserID
     ,P.xlPostData
     ,UPCC.ntCommentCount
FROM tblPost P
INNER JOIN (
     SELECT dtCommentDate
          ,COUNT(*) AS ntCommentCount
     FROM tblComment
     GROUP BY dtCommentDate
     ) UPCC ON P.dtPostDate = UPCC.dtCommentDate

SELECT *
FROM tblUser U
INNER JOIN (
     SELECT dtCommentDate
          ,COUNT(*) AS ntCommentCount
     FROM tblComment
     GROUP BY dtCommentDate
     ) UPCC ON U.dtCreateDate = UPCC.dtCommentDate


Total logical read: 148831

In the above script same derived table UPCC has executed two times. In this case it is better to save the output of derived table UPCC into temporary table.

SELECT dtCommentDate
     ,COUNT(*) AS ntCommentCount
INTO #tblUserPostCommentCount
FROM tblComment
GROUP BY dtCommentDate

SELECT P.ntPostID
     ,P.ntUserID
     ,P.xlPostData
     ,UPCC.ntCommentCount
FROM tblPost P
INNER JOIN #tblUserPostCommentCount UPCC
ON P.dtPostDate = UPCC.dtCommentDate

SELECT *
FROM tblUser U
INNER JOIN #tblUserPostCommentCount UPCC
ON U.dtCreateDate = UPCC.dtCommentDate

Total logical read: 84378

So, logical read say in this case temporary table is performing much better. But it is not always true. There is possibility even if there is more than one reference of temporary object, derived table or CTE may perform better.  For example:
a.     When there is need to execute very complex queries

It is recommend using temporary tables or table variables when there is need to execute very costly query on a table. By using the temporary object it may degrade the query performance to execute a single query. Then why it is recommended to use temporary objects?

It is due to locking of a table. In real world a table is referenced by many queries. During the query execution a table is locked according to query, isolation level etc. So we are going to execute a very costly query and it is locking the table for significant amount that even if that queries is most efficient from number of logical read it is better to choose little least efficient queries to decrease the overall lock on a particular table if that table is highly used by many queries. In that case we should store the data in the temporary tables and costliest queries should be executed on temporary object instead of the actual table.

b.     When there is very big join statement 
There were many examples of real world hazards where whole database has locked due to a single join statement. It is highly recommend that a join statement should be very large. We should not join more than four big tables a query (These numbers depends upon size of table, server load, configuration etc.) As far as single query is concerned a single query joined with multiple tables will perform better. So here this recommendation is not to improve the performance of single query but to decrease lock time to increase the overall performances of all the queries which uses the table.  Solution is very simple uses temporary objects and simplify the JOIN statement. But it is not as simple as look like. I want to explain the complexity by a good example:

SELECT *
FROM tblUser U
INNER JOIN tblCity C ON U.ntCityID = C.ntCityID
INNER JOIN tblUserType UT ON U.ntUserTypeID = UT.ntUserTypeID
LEFT JOIN tblUserPhoto UP ON U.ntUserID = UP.ntUserID
LEFT JOIN tblUserFavSport UFS ON U.ntUserID = UFS.ntUserID
LEFT JOIN tblUserContact UC ON U.ntUserID = UC.ntUserID
WHERE U.ntUserID = 10

Total logical read: 10
Uses the temporary objects don’t mean populate data of each table into temporary table, something like this:

SELECT *
INTO #tblUser
FROM tblUser

SELECT *
INTO #tblCity
FROM tblCity

SELECT *
INTO #tblUserType
FROM tblUserType

SELECT *
INTO #tblUserPhoto
FROM tblUserPhoto

SELECT *
INTO #tblUserFavSport
FROM tblUserFavSport

SELECT *
INTO #tblUserContact
FROM tblUserContact

SELECT *
FROM #tblUser U
INNER JOIN #tblCity C ON U.ntCityID = C.ntCityID
INNER JOIN #tblUserType UT ON U.ntUserTypeID = UT.ntUserTypeID
LEFT JOIN #tblUserPhoto UP ON U.ntUserID = UP.ntUserID
LEFT JOIN #tblUserFavSport UFS ON U.ntUserID = UFS.ntUserID
LEFT JOIN #tblUserContact UC ON U.ntUserID = UC.ntUserID
WHERE U.ntUserID = 10

DROP TABLE #tblUser

DROP TABLE #tblCity

DROP TABLE #tblUserType

DROP TABLE #tblUserPhoto

DROP TABLE #tblUserFavSport

DROP TABLE #tblUserContact

Total logical read: 815 + 5 + 2 + 3118 + 3097 + 2 + 2 + 2 + 4 + 812 = 7859

No one will accept the decrease the table lock time in the cost of increase of logical read from 10 to 7859. Splitting the join query and using temporary table is skills. So, here our challenge is to split the JOIN statement in such way that increases in number of logical read should be minimal. Here key point is numbers of rows processed and stored should be minimal.  For example:

SELECT U.*
     ,C.vcCityName
     ,UT.vcUserTypeName
     ,UP.ntUserPhotoID
     ,UP.vcPhotoPath
INTO #tblTempUser
FROM tblUser U
INNER JOIN tblCity C ON U.ntCityID = C.ntCityID
INNER JOIN tblUserType UT ON U.ntUserTypeID = UT.ntUserTypeID
LEFT JOIN tblUserPhoto UP ON U.ntUserID = UP.ntUserID
WHERE U.ntUserID = 10

SELECT *
FROM #tblTempUser U
LEFT JOIN tblUserFavSport UFS ON U.ntUserID = UFS.ntUserID
LEFT JOIN tblUserContact UC ON U.ntUserID = UC.ntUserID
WHERE U.ntUserID = 10

DROP TABLE #tblTempUser

Total logical read: 12

Logical read increasing from 10 to 12 is acceptable and both the SQL script is joining only three tables. So, in this way we have decreased the joining from 6 tables to 3 tables. So this solution look like have a proper balance of increase in logical read and decease in the number of tables in the JOIN statement.    

No comments:

Post a Comment