Sometimes we need MANY TO
MANY join in SQL Server. Sometime we don’t but event that sometimes SQL Server
query optimizer chooses MANY TO MANY join. That was not our actual need. Let me
explain it by an example:
Let’s assume we have two
tables named UserPhoto and tblUser:
CREATE TABLE [dbo].[UserPhoto] (
[UserPhotoID]
[bigint] PRIMARY KEY
IDENTITY(1, 1) NOT NULL
,[PhotoID] [int] NULL
,[PhotoPath] [varchar](100) NULL
)
CREATE TABLE [dbo].[tblUser] (
[UserID]
[bigint] PRIMARY KEY
IDENTITY(1, 1) NOT NULL
,[UserName] [varchar](100) NULL
,[PhotoID] [int] NULL
)
Creating non-clustered
indexes to improve performance:
CREATE NONCLUSTERED INDEX NCI_PhotoID
ON [dbo].[tblUser] ([PhotoID])
INCLUDE (
[UserID]
,[UserName]
)
CREATE NONCLUSTERED INDEX NCI_PhotoID
ON [dbo].[UserPhoto] ([PhotoID])
INCLUDE ([PhotoPath])
Let’s assume a business
constrains is an user can have only one photo. So we are populating data in
both tables so that an cannot have more than one photoid:
INSERT INTO [UserPhoto] ([PhotoID],[PhotoPath])
SELECT CONCAT (language_id,message_id),CONCAT ('C:\Image\',message_id,'.png' )
FROM sys.messages
INSERT INTO [tblUser] (UserName,PhotoID)
SELECT CONCAT ('User',PhotoID) ,PhotoID
FROM [UserPhoto]
Now we are going to execute
below script which join above two tables on the basis of PhotoID:
SET STATISTICS IO ON
SELECT TOP (100000) U.UserID
,U.UserName
,UP.PhotoPath
FROM tblUser U
INNER JOIN UserPhoto UP ON U.PhotoID = UP.PhotoID
Let’s analyze its execution
plan:
If we will notice on merge
join we will get MANY TO MANY is true. While our business constraint says an
user cannot have more than one photo.
Then while SQL Server query optimizer chosen MANY TO MANY join which is
costlier?
Now we are going to check
total logical reads:
Table 'UserPhoto'. Scan count
5, logical reads 884, Total logical read = 5 * 884 = 4420
Table 'tblUser'. Scan count
5, logical reads 772, Total logical read = 5 * 772 = 3860
Reason behind this is SQL
Server query optimizer doesn’t know about our business constraint “An User can
have one and only one photo”. That is why it has generated bad execution plan
that is MANY TO MANY. Hence it is our task to tell SQL server query optimizer
about our business constraints. In this case we can say about our business
constraint (An User can have one and only one photo) by creating below two
UNIQUE NONCLUSTERED indexes:
CREATE UNIQUE NONCLUSTERED INDEX
NCI_PhotoID
ON [dbo].[tblUser] ([PhotoID])
INCLUDE ([UserID],[UserName])
WITH (DROP_EXISTING = ON)
CREATE UNIQUE NONCLUSTERED INDEX
NCI_PhotoID
ON [dbo].[UserPhoto] ([PhotoID])
INCLUDE ([PhotoPath])
WITH (DROP_EXISTING = ON)
Now if we will execute again
same joining query:
SET STATISTICS IO ON
SELECT TOP (100000)
U.UserID
,U.UserName
,UP.PhotoPath
FROM tblUser U
INNER JOIN UserPhoto UP ON U.PhotoID = UP.PhotoID
Let’s check execution plan:
Now MANY to MANY is false!!!
And total logical read:
Table 'UserPhoto'. Scan count 1, logical reads 487, Total Logical Read = 1 * 487 = 487
Table 'tblUser'. Scan count 1, logical reads 424, Total Logical Read = 1 * 424 = 424
Now this time total logical
read is much lesser than previous one for same query. It was example of
unwanted MANY TO MNAY join. This will unnecessarily degrade query the
performance.