Jul 5, 2016

Subquery vs inner join which one is better or faster performance in sql server

When we should use inner join?
When we should use a subquery?
Which one will faster in SQL server according to query performance?

These are common questions when we need to get data using more than one table. There is not a straightforward answer to these questions. Here we will discuss many aspects of this.

1. Use joins when we need to get data from both the tables in a SELECT statement.
2. Use subquery when we need to get data from only one table and another table is used only to check existence. For example:
Let's assume we have two tables: tblEmp and tblEmpDetail with the following schema:

CREATE TABLE tblEmp(
    ntEmpID BIGINT PRIMARY KEY IDENTITY,
    vcName VARCHAR(50),
    ntAge INT
)

CREATE TABLE tblEmpDetail(
    ntEmpDetailID BIGINT PRIMARY KEY IDENTITY,
    ntEmpID BIGINT REFERENCES tblEmp,
    moSalary MONEY,
    dtDOJ DATETIME
)

--Inserting some recodes into it
INSERT tblEmp
SELECT
    LEFT([Text],20),
    severity 
FROM Sys.messages

INSERT tblEmpDetail
SELECT
    ntEmpID,
    2000 * (ntEmpID % 10),
    GETDATE() - (ntEmpID % 30) 
FROM tblEmp

Inner join: By using inner join we can select fields of both the tables. For example:

SELECT
    E.ntEmpID,
    E.vcName,
    E.vcName,
    ED.moSalary,
    ED.dtDOJ
FROM tblEmp E INNER JOIN tblEmpDetail ED
ON E.ntEmpID = ED.ntEmpID

Subquery: we can get the data from the only table. For example:

SELECT
    E.ntEmpID,
    E.vcName,
    E.vcName
FROM tblEmp E
WHERE ntEmpID IN (SELECT ntEmpID
    FROM tblEmpDetail)

Here we cannot include fields of table tblEmpDetail in a select list. So if we need to get the data from fields of both tables join is the only solution. It is not possible by the subquery.

When we need to get the data from only one table by checking the existence in another table then it is possible by both joins as well as a subquery. In this case which one will perform better inner join or subquery?

Here we have written the same query in four different ways:

1.
--Using inner join
SELECT E.*
FROM tblEmp E INNER JOIN tblEmpDetail ED
ON E.ntEmpID = ED.ntEmpID

2.
--Using IN clause
SELECT *
FROM tblEmp
WHERE ntEmpID IN (SELECT ntEmpID
    FROM tblEmpDetail

3.
--Using EXISTS clause
SELECT *
FROM tblEmp E
WHERE EXISTS(SELECT *
    FROM tblEmpDetail
    WHERE ntEmpID = E.ntEmpID)

4. 
--Using correlated subquery
SELECT *
FROM tblEmp E
WHERE ntEmpID = (SELECT ntEmpID
    FROM tblEmpDetail
    WHERE ntEmpID = E.ntEmpID)

Let's compare the actual execution plan:



Query 1, 2 and 3 have same execution cost while query 4 is costliest. Also, query 2 and 3 have an exactly same execution plan. Let me explain it one by one.

Why query 4 is costliest compare to others?

If we will observe its execution plan we will find it has index spool operator which cost is 95% and has an assert operator. It is due to equality arithmetic operator that is:

ntEmpID = (SELECT ntEmpID
    FROM tblEmpDetail
    WHERE ntEmpID = E.ntEmpID)

SQL server doesn't know how many rows will be returned by this subquery. If it is more than one then it will throw an error: Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <=, >, >= or when the subquery is used as an expression.

So, to validate this, SQL server has followed following extra steps:

1. Index spool: It creates a copy of a table in tempdb which satisfy WHERE ntEmpID = E.ntEmpID and create an index on it.
2. Stream Aggregate: It counts the total records in tblEmpDetail for each ntEmpID something like this [Expr1006] = Count (*)
3. Assert: It checks if record count is one then it is fine otherwise throw error message that is CASE WHEN [Expr1006]>(1) THEN (0) ELSE NULL END

Note: Even top one clause will not help us. For example:

ntEmpID = (SELECT TOP(1) ntEmpID
    FROM tblEmpDetail
    WHERE ntEmpID = E.ntEmpID)

Conclusion: Never write subquery with equal to an operator for this scenario.

Solution: Create unique index on the field tblEmpDetail.ntEmpID. For example:

CREATE UNIQUE NONCLUSTERED INDEX NCI_ntEmpID
ON tblEmpDetail(ntEmpID)

Now if you will check the execution plan, you will find all four queries have equal execution cost (25%).


Suppose there is not any unique index on the field tblEmpDetail.ntEmpID so I am dropping it:

DROP INDEX NCI_ntEmpID ON tblEmpDetail

Now compare the execution plan of query 1, 2 and 3:

Query 1: It uses Hash Match (Inner Join)
Query 2 and 3:  It uses Hash Match (Right Semi Join)

Operator Inner Join can get data from both tables while operator Right Semi Join can get the data from an only right table.

Conclusion: Inner join has more flexibility than a subquery. It can select the data from both tables as well as only from one table with same query cost just like subquery. For example:

--Inner Join 1
SELECT *
FROM tblEmp E INNER JOIN tblEmpDetail ED
ON E.ntEmpID = ED.ntEmpID

--Inner Join 2
SELECT E.*
FROM tblEmp E INNER JOIN tblEmpDetail ED
ON E.ntEmpID = ED.ntEmpID

Actual execution plan:


Definitely Inner join 1 will more costly than inner Join 2 since it is selecting more numbers of columns.

Is there any other difference between inner join and subquery except inner join get data from both tables while not by subquery?

We are going to check an inserting observation. Comparing the execution plan of the following query:

--Using inner join
SELECT TOP(10) E.*
FROM tblEmp E INNER JOIN tblEmpDetail ED
ON E.ntEmpID = ED.ntEmpID

--Using IN clause
SELECT TOP(10) *
FROM tblEmp
WHERE ntEmpID IN (SELECT ntEmpID
    FROM tblEmpDetail)

Actual execution plan:


So, in this case, an inner join is performing better than a subqueryNow we are inserting few duplicates ntEmpID into the tblEmpDetail:

INSERT tblEmpDetail VALUES
    (1,5000,GETDATE()),
    (2,7000,GETDATE())

Comparing the execution plan of the same query:

--Using inner join
SELECT TOP(10) E.*
FROM tblEmp E INNER JOIN tblEmpDetail ED
ON E.ntEmpID = ED.ntEmpID

--Using IN clause
SELECT TOP(10) *
FROM tblEmp
WHERE ntEmpID IN (SELECT ntEmpID
    FROM tblEmpDetail)


Now in this case subquery is performing better than inner join.

Why there are differences?  Sometimes an inner join performs better and sometime subquery performs better. To understand this we are going to create two tables with fewer records and checking the output:

CREATE TABLE tblStu(
    ntStuID BIGINT,
    vcName VARCHAR(50),
    ntAge INT
)

CREATE TABLE tblStuDetail(
    ntStuDetailID BIGINT,
    ntStuID BIGINT,
    moSalary MONEY
)

Now we are inserting three records in both the tables:

INSERT tblStu VALUES
    (1,'Scott', 18),
    (2,'Greg', 21),
    (3,'Davis', 22)
   
INSERT tblStuDetail VALUES
    (1, 1, 5000),
    (2, 2, 8000),
    (3, 3, 6000)

Now, both the table has one to one relationship for ntStuID. That is one student has one detail and one details belong to one student.

Note: But SQL server doesn't know both tables has one to one relationship. To tell SQL server, we have to create a unique constraint in ntStuID of both tables. We can do it by making it unique key, Primary key or by creating a unique index.

Let's check output of following queries:

--Inner join
 SELECT S.* FROM tblStu S INNER JOIN tblStuDetail SD
 ON S.ntStuID = SD.ntStuID

 --Subquery
 SELECT * FROM tblStu S
 WHERE S.ntStuID IN(SELECT SD.ntStuID
 FROM tblStuDetail SD WHERE S.ntStuID = SD.ntStuID)

Inner Join:
ntStuID
vcName
ntAge
1
Scott
18
2
Greg
21
3
Davis
22

Sub query:
ntStuID
vcName
ntAge
1
Scott
18
2
Greg
21
3
Davis
22

We will get an exactly same output of both the query.

Now we are making the one to many or many to one relation between the tables. So we are going to insert few duplicate records into the table tblStuDetail:

INSERT tblStuDetail VALUES(4,2,5000),(5,2,8000)

Note: Again SQL server doesn't know both tables has one to many relationships or not. To tell SQL server, we have to create a unique constraint in ntStuID in only one table. We can do it by making it unique key, Primary key or by creating a unique index. 

Let's check output of same queries:

--Inner join
 SELECT S.* FROM tblStu S INNER JOIN tblStuDetail SD
 ON S.ntStuID = SD.ntStuID

 --Subquery
 SELECT * FROM tblStu S
 WHERE S.ntStuID IN(SELECT SD.ntStuID
 FROM tblStuDetail SD WHERE S.ntStuID = SD.ntStuID)

Inner Join:
1
Scott
18
2
Greg
21
3
Davis
22
2
Greg
21
2
Greg
21
1
Scott
18

Sub query:
ntStuID
vcName
ntAge
1
Scott
18
2
Greg
21
3
Davis
22

Now inner join is returning more records than a subquery. Both queries have different output!! How can we compare the performance if both queries return different output? They are two different queries.

Last, now are going to make it many to many relationships between them. For this we are also going to insert few duplicate records into the table tblStu:

INSERT tblStu VALUES
    (1,'Alan',19),
    (2,'Sam',21)

Note: If two tables have not any primary key, unique key or unique index on column SD.ntStuID, SQL server assume both tables have many to many relations. 

Let's again check output of same queries:

--Inner join
 SELECT S.* FROM tblStu S INNER JOIN tblStuDetail SD
 ON S.ntStuID = SD.ntStuID

 --Subquery
 SELECT * FROM tblStu S
 WHERE S.ntStuID IN(SELECT SD.ntStuID
 FROM tblStuDetail SD WHERE S.ntStuID = SD.ntStuID)

Inner Join:
ntStuID
vcName
ntAge
1
Scott
18
2
Greg
21
2
Greg
21
2
Greg
21
3
Davis
22
1
Alan
19
2
Sam
21
2
Sam
21
2
Sam
21

Sub query:
ntStuID
vcName
ntAge
1
Scott
18
2
Greg
21
3
Davis
22
1
Alan
19
2
Sam
21

Again, inner join returning more records than a subquery. Also subquery returning duplicate recodes. Both queries have different output. In this case, we cannot compare the performance between subquery and inner join since both queries have different output.

Question: If two tables have one to one relation for a key column and we want to get data from only one table in this case inner join will perform better or subquery?

Answer: As I know the performance of both the query will exactly same. Till now, I didn't find any differences if you please share with us.

Best practice:  Inner join vs. subquery

1. If we need the data from both the tables we must have to choose inner join. If we need data from only one table then we can choose either subquery or inner join.
2. If two tables have one to many, many to one or many to many, subquery and inner join may have different output. So choose according to your application requirement.
3. If two tables have one to one relationship then you can choose either subquery or inner join since query optimizer will follow same execution plan for both of them. (Thanks to SQL server query optimizer)
4. Following subqueries are equivalent from query performance point of view:

a.
SELECT * FROM tblStu S
WHERE S.ntStuID IN(SELECT SD.ntStuID
FROM tblStuDetail SD WHERE S.ntStuID = SD.ntStuID)

b.
SELECT * FROM tblStu S
WHERE S.ntStuID IN(SELECT SD.ntStuID
FROM tblStuDetail SD)

c.
SELECT * FROM tblStu S
WHERE EXISTS(SELECT *
FROM tblStuDetail SD WHERE S.ntStuID = SD.ntStuID)

An SQL server query optimizer is smart enough that it will choose same execution for all.

5. Don't use subquery with equality operator unless there is not one to one relation between two tables. For example:

SELECT * FROM tblStu S
WHERE S.ntStuID = (SELECT SD.ntStuID
FROM tblStuDetail SD
WHERE S.ntStuID = SD.ntStuID)

Otherwise, it will decrease the performance or cause of an error.

6. If there is an aggregate function in subquery then there is no direct equivalent query using an inner join. For example:

SELECT * FROM tblStu S
WHERE S.ntStuID = (SELECT MAX(SD.ntStuID)
FROM tblStuDetail SD)

So you must have to use a subquery.

7. If a query is either inner join or subquery, internally is handled by any of these operators:

a. Nested loop join
b. Merge join
c. Hash join

So, technically there is no difference between inner join and subquery.

8. Consider on two inner join syntax:

--New way
 SELECT S.* FROM tblStu S
 INNER JOIN tblStuDetail SD
 ON S.ntStuID = SD.ntStuID

--Old way
SELECT S.*
FROM tblStu S ,tblStuDetail SD
WHERE  S.ntStuID = SD.ntStuID

There are not any differences except syntax. I will prefer using ON clause to write join condition instead of WHERE clause. If we will miss the ON clause:

SELECT S.*
FROM tblStu S
INNER JOIN tblStuDetail SD

We will get an error message: Incorrect syntax near 'SD'.
And we can correct it. What will happen if we miss the WHERE clause:

SELECT S.*
FROM tblStu S ,tblStuDetail SD

We get wrong output without any warning. The worst thing is if one or both table has too many records this query will be too costly and locked the both table for a long period of time. So I suggest don't write join query in the old way.

Write SQL queries in set-based approach SQL server
SQL server best practices for stored procedure
What is Adhoc query in SQL server: Cached Query Plan
Indexes best practices in SQL server: Where to start creating indexes?
SQL server query optimization tips: Tuning best practices with examples

48 comments:

Anonymous said...

Great Explanation !!

Anonymous said...

wonderful! detail explanation on inner join vs correlated subquery with clear examples, many thanks
-srihari konakanchi

Unknown said...

best explanation i got online for inner join vs subquery

Unknown said...

thanks for your best explanation.

Anonymous said...

Excellent teaching!!!!!!!!!

Vani said...

I have carried out the example you have explained above.
When both the tables have 3 records only

SELECT S.* FROM tblStu S INNER JOIN tblStuDetail SD
ON S.ntStuID = SD.ntStuID

SELECT * FROM tblStu S
WHERE S.ntStuID IN(SELECT SD.ntStuID
FROM tblStuDetail SD WHERE S.ntStuID = SD.ntStuID)

My inner join takes 77% while IN takes 23% .Also I could see Inner join uses Hash Match while IN query uses nested Loop.
I am unable to understand this .Can you please explain.

Satish With U said...

Very nice explanation and thank you so much for your time to explain !

Unknown said...

Nice teacv thankx you

Unknown said...

Very well explained.. thank you...

Anu said...

Thanks for sharing this, I actually appreciate you taking the time to share with everybody.
Data Science Course In Hyderabad With Placements

EXCELR said...

Thank you for your post, I look for such article along time, today i find it finally. this post give me lots of advise it is very useful for me !data science training in Hyderabad

Excelr Tuhin said...

Thank you so much for shearing this type of post.
This is very much helpful for me. Keep up for this type of good post.
please visit us below
data science training in Hyderabad

data scientist course said...

This is my first time I visit here. I found so many engaging stuff in your blog, particularly its conversation. From the huge loads of remarks on your articles, I surmise I am by all account not the only one having all the recreation here! Keep doing awesome. I have been importance to compose something like this on my site and you have given me a thought.
data scientist training and placement

Pallavi reddy said...

i am glad to discover this page : i have to thank you for the time i spent on this especially great reading !! i really liked each part and also bookmarked you for new information on your site.
data science training in bangalore

Data Science Course in Bhilai - 360DigiTMG said...

Truly mind blowing blog went amazed with the subject they have developed the content. These kind of posts really helpful to gain the knowledge of unknown things which surely triggers to motivate and learn the new innovative contents. Hope you deliver the similar successive contents forthcoming as well.

data science in bangalore

Maneesha said...

I'm hoping you keep writing like this. I love how careful and in depth you go on this topic. Keep up the great work
data scientist course in hyderabad

princika said...

I want to leave a little comment to support and wish you the best of luck.we wish you the best of luck in all your blogging enedevors.
data science training in chennai

data scientist course said...

I think this is an informative post and it is very useful and knowledgeable. Therefore, I would like to thank you for the efforts you have made in writing this article.
data scientist training in hyderabad

Mallela said...

Thanks for posting the best information and the blog is very important.digital marketing institute in hyderabad

Priya Rathod said...

I see the best substance on your blog and I unbelievably love getting them.
DevOps Training in Hyderabad
DevOps Course in Hyderabad

Mallela said...

Thanks for posting the best information and the blog is very important.artificial intelligence course in hyderabad

data scientist course said...

I would also motivate just about every person to save this web page for any favorite assistance to assist posted the appearance.
data scientist training and placement

Mallela said...

Thanks for posting the best information and the blog is very important.data science institutes in hyderabad

Data Science Course in Bhilai - 360DigiTMG said...

Thanks for bringing such innovative content which truly attracts the readers towards you. Certainly, your blog competes with your co-bloggers to come up with the newly updated info. Finally, kudos to you.

Data Science Course in Varanasi

Edison hope said...

Great post I would like to thank you for the effort you put into writing this interesting and informative article, no matter what is the purpose of your trip to Turkey, you need to pay the visa cost Turkey before submitting the visa application .Depending on your desired option, the e Visa Turkey cost are calculated.

Mallela said...

Thanks for posting the best information and the blog is very good.Cloud Computing course in Bangalore

Emarketedu said...

Thanks for your marvelous posting! I really enjoyed reading it, you could be a great author. I will be sure to bookmark your blog and definitely will come back in the foreseeable future. I want to encourage you to continue your great posts, have a nice afternoon! Online Digital Marketing Course with Placement

Digital Chandu said...

First You got a great blog .I will be interested in more similar topics.I commend you for your excellent report on the knowledge that you have shared in this blog.

digital marketing training in hyderabad
free digital marketing course in hyderabad

Admissiongyan said...

Excellent goods from you, man. I have understand your stuff previous to and you are simply extremely fantastic. I actually like what you’ve obtained right here, really like what you are saying and the way in which by which you are saying it. You are making it enjoyable and you still care to stay sensible. I can’t wait to read much more from you. This is really a wonderful site. Electronics and Communication Engineering Scope

Allen smith said...

It's such meaningful content. So nice words you are saying . I feel so glad to read it. I suggest that people read it once. It's really helpful. And also check out how to apply for an emergency business visa India for the foreigners who want to visit urgent in India for the purpose of business meetings, startup, seminar, sale/purchase etc.

Ramesh Sampangi said...

Nice blog. Informative and knowledgeable content. Big thumbs up for this blog. I really enjoyed this blog. Thank you for sharing with us.
Data Science Course Training in Hyderabad
Data Science Course Training Institute in Hyderabad

Alexander Jacob said...

This is a wonderful inspiring article. I am practically satisfied with your great work. You have really put together extremely helpful data. Keep it up... foreign citizens can apply for their kenya e visa in a simple, secure, and convenient way.

Robert smith said...

I can't believe I've never been to this site before, but after going through some of the posts I realized it's new to me. Anyhow, I'm glad I found it and I'll be bookmarking and checking back often. If you wish to travel to India, you can apply for an India visa application via Online Evisa India portal.

anonymous said...

I want to leave a little comment to support and wish you the best of luck.we wish you the best of luck in all your blogging enedevors
data science training in trivandrum

EaglePie said...

Thanks for sharing this valuable information, we also provide instagram video download and,
really appreciate your hard work and good research that help us in for of your good valuable article. Keep updating us with your awesome content.

Unknown said...

I'm genuinely getting a charge out of scrutinizing your richly formed articles. Apparently you consume a huge load of energy and time on your blog. I have bookmarked it and I am expecting scrutinizing new articles. Continue to do amazing.data science course in ghaziabad

Career Program and Skill Development said...

Are you looking for a Data Science certification course to start your career in Data Science then 360DigiTMG is all you need. With experienced professional trainers and real-time projects, you can improve your skillset and move ahead in your profession. Why wait to enroll with us now.

Data Science Training in Jodhpur

Professional Academic Institute said...

Our Data Science course timings are properly applicable for both working professionals & as well as job seekers. With intense data & expertise in Data Science domain, you can apply for the excessive paying job roles like Business Analyst, Data Engineer/Data Architect, Machine Learning Engineer, Big Data Engineer

Data Science Course in Bangalore

Career Academic institute said...

Our Data Science course timings are properly applicable for both working professionals & as well as job seekers. With intense data & expertise in Data Science domain, you can apply for the excessive paying job roles like Business Analyst, Data Engineer/Data Architect, Machine Learning Engineer, Big Data Engineer


Data Science in Bangalore with Placement

Data Science said...

Good to become visiting your weblog again, it has been months for me. Nicely this article that i've been waited for so long.
I will need this post to total my assignment in the college, and it has exact same topic together with your write-up. Thanks, good share.
I am impressed by the information that you have on this blog. It shows how well you understand this subject.
data scientist training in hyderabad

Data Science said...

Its as if you had a great grasp on the subject matter, but you forgot to include your readers.
Perhaps you should think about this from more than one angle.
data science institutes in hyderabad

Data Science said...

You completed certain reliable points there.
I did a search on the subject and found nearly all persons will agree with your blog.

360DigiTMG data scientist course

Anonymous said...

You have amazing writing skills and you display them in every article. Keep it going!
Choose our Online Course Visit cyber security Training

Anonymous said...

Thanks for sharing informative content with us. Keep it going!

Cyber Security Training for Beginners

codeme hub said...

informative article with a nice topic

Data science course in Kerala

Shilpa said...

This blog was informative and worth reading. Keep up the good work.

Mobile app development company in Hyderabad

Preethi said...

An excellent topic that will be useful to others. I enjoyed reading.

ERP software companies in Hyderabad

360DigiTMG said...

Data science is the trendiest topic, and this article deals with all the information about how you can get started in this field without any confusion. At this site, you will get a huge collection of information about the right kind of courses from 360DigiTMG that will turn you from a fresher to an experienced and skilled data science professional. 360DigiTMG data science courses will provide you with hands on experience with visualization tools, algorithms, and other programming languages that will help you find hidden patterns from comprehensive data and derive meaningful conclusions for companies and businesses. This blog helped me understand why data science training should be the priority if I want to make a successful career. I hope more readers find the guidance that they need from this article and solve their doubts and confusion.
best data science course institute in hyderabad