Jul 8, 2016

Complex or multiple tables join best practices SQL server performance tuning

How to optimize table JOIN if there are many tables in a JOIN statement in the SQL Server? How to decrease numbers of tables in a JOIN?

 It is highly recommended we should not JOIN more than four or five tables in a single query. This number varies according to data volume of a table.  Why? 

I want to share a real world example. One senior developer joined a new company. There was a database which was facing a lot performance issues. To analyze the root cause, that developer writes a query which joins four tables.  Those tables contain a very large volume of data. Hence that query took time to fetch records and it locked four tables in significant period of time due to which other queries got blocked.  After some time it blocked whole database.  When client came to know that new employee got fired. A very simple mistake became cause this accident.

How to avoid numbers of tables in a JOIN?
Basic question is if there is business need to join more than five tables to fetch some data. What is alternative approach? How to meet this requirement without joining five tables? Answer is very simple. First join only three tables and load the intermediate data in a temporary table. Then join remaining tables with the temporary table. For example:

Business need:
SELECT T1.Id, T2.C, T3.C, T4.C, T5.C
FROM T1 INNER JOIN T2 ON T1.Id = T2.Id
INNER JOIN T3 ON T3.Id = T2.Id
INNER JOIN T4 ON T4.Id = T3.Id
INNER JOIN T5 ON T5.Id = T4.Id

Our approach should be:

--Step 1
SELECT T1.Id, T2.C, T3.C
INTO #T
FROM T1 INNER JOIN T2 ON T1.Id = T2.Id
INNER JOIN T3 ON T3.Id = T2.Id

--Step 2
SELECT #T.*, T4.C, T5.C
FROM #T INNER JOIN T4 ON #T.Id = T4.Id
INNER JOIN T5 ON T5.Id = T4.Id

But this solution is not as simple as looks. There are many performance considerations before splitting complex JOIN statement. A wrong splitting may decrease overall performance of the query and a correct splitting may improve overall query performance significantly. For example:

Data set up:
--Creating any five tables
CREATE TABLE TA(
AID INT PRIMARY KEY IDENTITY,
A1 INT,
A2 DATETIME DEFAULT(GETDATE())
)

CREATE TABLE TB(
BID BIGINT PRIMARY KEY,
B1 VARCHAR(10),
B2 UNIQUEIDENTIFIER
)

CREATE TABLE TC(
CID BIGINT PRIMARY KEY IDENTITY,
C1 UNIQUEIDENTIFIER DEFAULT(NEWID()),
C2 NVARCHAR(10)
)

CREATE TABLE TD(
DID BIGINT PRIMARY KEY IDENTITY,
D1 INT,
D2 NVARCHAR(10)
)

CREATE TABLE TE(
EID BIGINT PRIMARY KEY IDENTITY,
E1 VARCHAR(64),
E2 INT
)

GO

--Populating records
INSERT INTO TA(A1)
SELECT severity FROM sys.messages
GO 10

INSERT INTO TB(BID,B1)
SELECT AID,LEFT(NEWID(),10) FROM TA
GO

INSERT INTO TC(C1,C2)
SELECT NEWID(),B1 FROM TB
GO

INSERT INTO TD(D1,D2)
SELECT CID, V FROM TC ,(SELECT 'XXX' V UNION ALL SELECT 'YYYY') DT
UNION ALL
SELECT TOP(5)  CID ,'ZZZ' FROM TC 
GO

INSERT INTO TE(E1,E2)
SELECT  B1 + CAST(C1 AS VARCHAR(36)), 1
FROM TB INNER JOIN TC ON TB.BID = TC.CID

Let’s assume our business need is to join above all five tables and get output on the basis of those joined tables. Below is our actual query:

DECLARE @P1 AS INT = 16,@P2 VARCHAR(1)= 'Z',@P3 AS INT = 1
SELECT * FROM TA
INNER JOIN TB ON TA.AID = TB.BID
INNER JOIN TC ON TC.C2 = TB.B1
INNER JOIN TD ON TD.D1 = TC.CID
INNER JOIN TE ON TE.E1 = TB.B1 + CAST(TC.C1 AS VARCHAR(36))
WHERE TA.A1 = @P1
      AND LEFT(TD.D2,1) = @P2
      AND TE.E2 = @P3

As we discussed joining five tables is bad practice. Also sometimes it decreases performance.

Average execution time of above query in my PC: 1 minute and 52 seconds
Total logical read:
Workfile: 34 * 97936
TD: 19416
TE: 22226
TC: 17844
TB: 14732
TA: 7838
= 3411880

To simplify above script, we are going to split above join in two parts. First part will keep first three tables and second part remaining tables:

--First part
DECLARE @P1 AS INT = 16,@P2 VARCHAR(1)= 'Z',@P3 AS INT = 1
SELECT *
INTO #T
FROM TA
INNER JOIN TB ON TA.AID = TB.BID
INNER JOIN TC ON TC.C2 = TB.B1
WHERE TA.A1 = @P1
      AND TB.B2 IS NULL

--Second part
SELECT * FROM #T T
INNER JOIN TD ON TD.D1 = T.CID
INNER JOIN TE ON TE.E1 = T.B1 + CAST(T.C1 AS VARCHAR(36))
WHERE  LEFT(TD.D2,1) = @P2
      AND TE.E2 = @P3

Average execution time of above query in my PC: 2 minutes and 38 seconds
Total logical read:
Workfile: 10 * 21544 + 94 * 69336
TD: 19416
TE: 22226
TC: 17844
TB: 14732
TA: 7838
#T: 24992
= 6840072

After simplifying the query overall execution time has increased significantly. Logical read has become double. After simplifying the query performance has decreased by almost 50% than original query. Even that simplified query is better because large number of tables in a single query may keep tables locked for long period of time which is worst.

But this splitting of a complex query is wrong. We have to think a lot before simplifying a complex query. thumb rule is:

Split a complex query in such a way that it has to fetch minimum possible records from a table. In the other word logical read should be minimal.

Best way to split above query:

DECLARE @P1 AS INT = 16,@P2 VARCHAR(1)= 'Z',@P3 AS INT = 1

--First Part
SELECT * INTO #P
FROM  TD
INNER JOIN TC ON TD.D1 = TC.CID
WHERE   LEFT(TD.D2,1) = @P2

--Second part
SELECT *,TB.B1 + CAST(P.C1 AS VARCHAR(36)) ASINTO#Q
FROM #P P INNER JOIN TB ON P.C2 = TB.B1

--Third part
SELECT * FROM
#Q Q
INNER JOIN TA ON TA.AID = Q.BID
INNER JOIN TE ON TE.E1 = Q.N
WHERE TA.A1 = @P1
      AND TE.E2 = @P3

Average execution time of above query in my PC: 22 seconds
Total logical read:
Workfile: 0
TD: 19416
TE: 22226
TC: 17844
TB: 14732
TA: 15
#P: 1
#Q: 1
= 74235


We can notice that logical read has decreased almost 46 times than original query. Here we are gaining huge performance. Hence, a skilfully simplification of a complex query may achieve huge performance. Apart from this, an appropriate index on intermediate temporary table can get extra performance benefit. 

15 comments:

PMP Certification said...


Great post i must say and thanks for the information. Education is definitely a sticky subject. However, is still among the leading topics of our time. I appreciate your post and look forward to more.

PMP Certification

PMP Certification in Malaysia

tejaswini said...

I have been searching to find a comfort or effective procedure to complete this process and I think this is the most suitable way to do it effectively.
data science course in malaysia
data science certification
data science course
data science bootcamp malaysia

reshma said...

thanks for sharing nice information...
DevOps Training in Hyderabad

PMP Certification said...

Thanks for sharing this information. I really like your blog post very much. You have really shared a informative and interesting blog post with people..

PMP Certification
PMP Certification in Malaysia
360DigiTMG PMP Course in Malaysia

tejaswini said...

Really impressive post. I read it whole and going to share it with my social circules. I enjoyed your article and planning to rewrite it on my own blog.
data science course

nakshatra said...

I am impressed by the information that you have on this blog. It shows how well you understand this subject.
data analytics course
big data analytics malaysia
big data course

PMP Certification said...

Nice information, valuable and excellent design, as share good stuff with good ideas and concepts, lots of great information and inspiration, both of which I need, thanks to offer such a helpful information here.
360DigiTMG PMP Certification
360DigiTMG PMP Certification in Malaysia
360DigiTMG PMP Course
360DigiTMG PMP Course in Malaysia

dataanalyticscourse said...

I am impressed by the information that you have on this blog. It shows how well you understand this subject.
data analytics course
big data analytics malaysia
big data course

tejaswini said...

A good blog always comes-up with new and exciting information and while reading I have feel that this blog is really have all those quality that qualify a blog to be a one
data science course

PMP Certification said...

It's late finding this act. At least, it's a thing to be familiar with that there are such events exist. I agree with your Blog and I will be back to inspect it more in the future so please keep up your act.
PMP Certification
PMP Course
PMP Course in Malaysia
PMP training in Malaysia
PMP Training

dataanalyticscourse said...

Awesome blog. I enjoyed reading your articles. This is truly a great read for me. I have bookmarked it and I am looking forward to reading new articles. Keep up the good work!
data analytics course
big data analytics malaysia
big data course

tejaswini said...

I would like to thank you for the efforts you have made in writing this article. I am hoping the same best work from you in the future as well. In fact your creative writing abilities has inspired me to start my own Blog Engine blog now. Really the blogging is spreading its wings rapidly. Your write up is a fine example of it.
data science course

PMP Certification said...

You actually make it look so easy with your performance but I find this matter to be actually something which I think I would never comprehend. It seems too complicated and extremely broad for me. I'm looking forward for your next post, I’ll try to get the hang of it!
360DigiTMG PMP Certification
360DigiTMG PMP Course
360DigiTMG PMP Course in Malaysia
360DigiTMG PMP Training in Malaysia
360DigiTMG PMP Training

tejaswini said...

I’m excited to uncover this page. I need to to thank you for ones time for this particularly fantastic read !! I definitely really liked every part of it and i also have you saved to fav to look at new information in your site.data science course

dataanalyticscourse said...

Awesome blog. I enjoyed reading your articles. This is truly a great read for me. I have bookmarked it and I am looking forward to reading new articles. Keep up the good work!
data analytics course
big data analytics malaysia
big data course