Sep 17, 2015

Unwanted MANY TO MANY join execution plan SQL Server

Sometimes we need MANY TO MANY join in SQL Server. Sometime we don’t but event that sometimes SQL Server query optimizer chooses MANY TO MANY join. That was not our actual need. Let me explain it by an example:

Let’s assume we have two tables named UserPhoto and tblUser:

CREATE TABLE [dbo].[UserPhoto] (
       [UserPhotoID] [bigint] PRIMARY KEY IDENTITY(1, 1) NOT NULL
       ,[PhotoID] [int] NULL
       ,[PhotoPath] [varchar](100) NULL
)

CREATE TABLE [dbo].[tblUser] (
       [UserID] [bigint] PRIMARY KEY IDENTITY(1, 1) NOT NULL
       ,[UserName] [varchar](100) NULL
       ,[PhotoID] [int] NULL
)

Creating non-clustered indexes to improve performance:

CREATE NONCLUSTERED INDEX NCI_PhotoID
ON [dbo].[tblUser] ([PhotoID])
INCLUDE (
       [UserID]
       ,[UserName]
)

CREATE NONCLUSTERED INDEX NCI_PhotoID
ON [dbo].[UserPhoto] ([PhotoID])
INCLUDE ([PhotoPath])
  
Let’s assume a business constrains is an user can have only one photo. So we are populating data in both tables so that an cannot have more than one photoid:

INSERT INTO [UserPhoto] ([PhotoID],[PhotoPath])
SELECT CONCAT (language_id,message_id),CONCAT ('C:\Image\',message_id,'.png'              )
FROM sys.messages

INSERT INTO [tblUser] (UserName,PhotoID)
SELECT CONCAT ('User',PhotoID) ,PhotoID
FROM [UserPhoto]

Now we are going to execute below script which join above two tables on the basis of PhotoID:

SET STATISTICS IO ON

SELECT TOP (100000) U.UserID
       ,U.UserName
       ,UP.PhotoPath
FROM tblUser U
INNER JOIN UserPhoto UP ON U.PhotoID = UP.PhotoID

Let’s analyze its execution plan:


If we will notice on merge join we will get MANY TO MANY is true. While our business constraint says an user cannot have more than one photo.  Then while SQL Server query optimizer chosen MANY TO MANY join which is costlier?     

Now we are going to check total logical reads:

Table 'UserPhoto'. Scan count 5, logical reads 884, Total logical read = 5 * 884 = 4420

Table 'tblUser'. Scan count 5, logical reads 772, Total logical read = 5 * 772 = 3860

Reason behind this is SQL Server query optimizer doesn’t know about our business constraint “An User can have one and only one photo”. That is why it has generated bad execution plan that is MANY TO MANY. Hence it is our task to tell SQL server query optimizer about our business constraints. In this case we can say about our business constraint (An User can have one and only one photo) by creating below two UNIQUE NONCLUSTERED indexes:

CREATE UNIQUE NONCLUSTERED INDEX NCI_PhotoID
ON [dbo].[tblUser] ([PhotoID])
INCLUDE ([UserID],[UserName])
WITH (DROP_EXISTING = ON)

CREATE UNIQUE NONCLUSTERED INDEX NCI_PhotoID
ON [dbo].[UserPhoto] ([PhotoID])
INCLUDE ([PhotoPath])
WITH (DROP_EXISTING = ON)

Now if we will execute again same joining query:

SET STATISTICS IO ON

SELECT TOP (100000)
       U.UserID
       ,U.UserName
       ,UP.PhotoPath
FROM tblUser U
INNER JOIN UserPhoto UP ON U.PhotoID = UP.PhotoID

Let’s check execution plan:


Now MANY to MANY is false!!!

And total logical read:

Table 'UserPhoto'. Scan count 1, logical reads 487, Total Logical Read = 1 * 487 = 487
Table 'tblUser'. Scan count 1, logical reads 424, Total Logical Read = 1 * 424 = 424

Now this time total logical read is much lesser than previous one for same query. It was example of unwanted MANY TO MNAY join. This will unnecessarily degrade query the performance.

No comments:

Post a Comment