Sep 17, 2015

Unwanted MANY TO MANY join execution plan SQL Server

Sometimes we need MANY TO MANY join in SQL Server. Sometime we don’t but event that sometimes SQL Server query optimizer chooses MANY TO MANY join. That was not our actual need. Let me explain it by an example:

Let’s assume we have two tables named UserPhoto and tblUser:

CREATE TABLE [dbo].[UserPhoto] (
       [UserPhotoID] [bigint] PRIMARY KEY IDENTITY(1, 1) NOT NULL
       ,[PhotoID] [int] NULL
       ,[PhotoPath] [varchar](100) NULL
)

CREATE TABLE [dbo].[tblUser] (
       [UserID] [bigint] PRIMARY KEY IDENTITY(1, 1) NOT NULL
       ,[UserName] [varchar](100) NULL
       ,[PhotoID] [int] NULL
)

Creating non-clustered indexes to improve performance:

CREATE NONCLUSTERED INDEX NCI_PhotoID
ON [dbo].[tblUser] ([PhotoID])
INCLUDE (
       [UserID]
       ,[UserName]
)

CREATE NONCLUSTERED INDEX NCI_PhotoID
ON [dbo].[UserPhoto] ([PhotoID])
INCLUDE ([PhotoPath])
  
Let’s assume a business constrains is an user can have only one photo. So we are populating data in both tables so that an cannot have more than one photoid:

INSERT INTO [UserPhoto] ([PhotoID],[PhotoPath])
SELECT CONCAT (language_id,message_id),CONCAT ('C:\Image\',message_id,'.png'              )
FROM sys.messages

INSERT INTO [tblUser] (UserName,PhotoID)
SELECT CONCAT ('User',PhotoID) ,PhotoID
FROM [UserPhoto]

Now we are going to execute below script which join above two tables on the basis of PhotoID:

SET STATISTICS IO ON

SELECT TOP (100000) U.UserID
       ,U.UserName
       ,UP.PhotoPath
FROM tblUser U
INNER JOIN UserPhoto UP ON U.PhotoID = UP.PhotoID

Let’s analyze its execution plan:


If we will notice on merge join we will get MANY TO MANY is true. While our business constraint says an user cannot have more than one photo.  Then while SQL Server query optimizer chosen MANY TO MANY join which is costlier?     

Now we are going to check total logical reads:

Table 'UserPhoto'. Scan count 5, logical reads 884, Total logical read = 5 * 884 = 4420

Table 'tblUser'. Scan count 5, logical reads 772, Total logical read = 5 * 772 = 3860

Reason behind this is SQL Server query optimizer doesn’t know about our business constraint “An User can have one and only one photo”. That is why it has generated bad execution plan that is MANY TO MANY. Hence it is our task to tell SQL server query optimizer about our business constraints. In this case we can say about our business constraint (An User can have one and only one photo) by creating below two UNIQUE NONCLUSTERED indexes:

CREATE UNIQUE NONCLUSTERED INDEX NCI_PhotoID
ON [dbo].[tblUser] ([PhotoID])
INCLUDE ([UserID],[UserName])
WITH (DROP_EXISTING = ON)

CREATE UNIQUE NONCLUSTERED INDEX NCI_PhotoID
ON [dbo].[UserPhoto] ([PhotoID])
INCLUDE ([PhotoPath])
WITH (DROP_EXISTING = ON)

Now if we will execute again same joining query:

SET STATISTICS IO ON

SELECT TOP (100000)
       U.UserID
       ,U.UserName
       ,UP.PhotoPath
FROM tblUser U
INNER JOIN UserPhoto UP ON U.PhotoID = UP.PhotoID

Let’s check execution plan:


Now MANY to MANY is false!!!

And total logical read:

Table 'UserPhoto'. Scan count 1, logical reads 487, Total Logical Read = 1 * 487 = 487
Table 'tblUser'. Scan count 1, logical reads 424, Total Logical Read = 1 * 424 = 424

Now this time total logical read is much lesser than previous one for same query. It was example of unwanted MANY TO MNAY join. This will unnecessarily degrade query the performance.

Sep 13, 2015

User defined scalar value function and query optimization Sql Server

Use of scalar function is good or bad performance analysis

Sometimes we encounter a situation where there is a little complex business logic where we need to create user defined scalar value functions. It is due to re-usability or due to simplification of main SQL statement.

Here only potential issue is execution plan of the query. SQL server query optimizer generates separate plan for User defined scalar function which is revoked by main or base execution plan. Let me explain it by a very good example:

--Creating sample table
CREATE TABLE tblMessage(
     MessageID BIGINT PRIMARY KEY IDENTITY,
     LanguageID INT,
     MessageType TINYINT
)

--Populating records into it
INSERT INTO tblMessage
SELECT language_id,severity FROM sys.messages

GO

--Creating scalar function to get count
CREATE FUNCTION dbo.udfMsgTypeCount(
     @MessageType TINYINT
)
RETURNS INT
AS
BEGIN
     DECLARE @MsgTypeCount AS INT

     SELECT @MsgTypeCount = COUNT(*) FROM tblMessage WHERE MessageType = @MessageType

     RETURN @MsgTypeCount
END

Now we are going to execute two similar queries, one using scalar user defined function and another using equivalent sub query of scalar function:

--Approach 1: Using scalar UDF:
SELECT TOP(1000)
     MessageID,
     LanguageID,
     dbo.udfMsgTypeCount(MessageType) AS Typecount
FROM tblMessage
ORDER BY MessageID

--Approach 2: Using subquery
SELECT TOP(1000)
     MessageID,
     LanguageID,
     (SELECT COUNT(*) FROM tblMessage WHERE MessageType = M.MessageType) AS Typecount
FROM tblMessage M
ORDER BY MessageID

Comparing execution plan:


According to execution plan first approach (using scalar value function) is better than approach two (Sub query) since cost of first approach is only 0% while second approach is 100%.

Comparing execution time:

Approach 1:
CPU time = 87313 ms, elapsed time = 87522 ms.

Approach 2:
CPU time = 312 ms, elapsed time = 588 ms.

According to execution time first approach is bad since it took around 1.4 minutes while second approach took only 0.6 seconds.
  
Question 1: Why execution plan is wrong?

Here execution plan is not wrong. It is limitation of execution plan. SQL Server execution plan doesn’t include the execution plan of invoking user defined scalar value functions.  If we will observe the execution plan of first approach it has only compute scalar physical operator (Inside green rectangle) for invoking UDF.

There is separate execution plan of UDF: We can get it by script:

SELECT query_plan
FROM sys.dm_exec_cached_plans
CROSS APPLY sys.dm_exec_query_plan(plan_handle)
WHERE objectid =  OBJECT_ID('dbo.udfMsgTypeCount')

Above plan of scalar UDF was not visible in base query execution plan and total cost has not been evaluated correctly. Hence it looks like execution plan of first approach was wrong. But in reality second approach is much better than first approach.

Questions:  I agree with the limitation that execution plan is unable to integrate the execution plan of invoking scalar UDF. But why first approach took 1.4 minutes while second approach took only 0.5 second? Why there is huge performance difference?

In case of scalar UDF execution plan of base query and invoking scalar UDF are generated independently. Hence sometimes overall execution plan may not efficient one compare to the combined execution plan which has been generated for whole script (second approach). Current demonstration is one of the good examples for this.

If we will analyze text actual execution plan (SET STATISTICS PROFILE ON ), in second approach there is a table spool which keeps only 16 records which is not in first approach that is why second approach executes faster:

Using scalar UDF:


Using subquery:


Hence in this situation we should avoid using scalar user defined function. Apart from this scalar value function is also not good for query parallelism.

Sep 6, 2015

Use of trigger SQL Server Query Performance good or bad

Avoid using DML trigger at inappropriate place

I am not going to tell triggers are bad but there are many situations where using of triggers degrade the performance.  For example, sometimes there are triggers which reference same time and perform the same task which could be done the script which fire it.  In that case there is no need to create triggers. Let me explain it by an example:

Approach 1: Using trigger

a. Creating a table:
CREATE TABLE tblOrder(
     OrderID BIGINT PRIMARY KEY IDENTITY,
     OrderType INT,
     Qty INT,
     InsertDate DATETIME
)

b. Creating a trigger to update the insertion date:

CREATE TRIGGER Trg_Insert_Date ON tblOrder
AFTER INSERT
AS
BEGIN
    
    UPDATE  tblOrder
    SET     InsertDate = GETDATE()
    FROM    tblOrder O
    INNER JOIN inserted I
    ON     O.OrderID = I.OrderID

END

c. Populating records in the order table and enabling statistics:

SET STATISTICS IO ON

INSERT INTO tblOrder(OrderType,Qty)
SELECT message_id, severity FROM Sys.messages

Total logical read:

Table 'tblOrder'. Scan count 0, logical reads 645023
Table 'Worktable'. Scan count 1, logical reads 585437
Table 'tblOrder'. Scan count 1, logical reads 1036

Approach 2: Using Default value

a. Dropping trigger and creating equivalent default constraint on InsertDate column:

DROP TRIGGER Trg_Insert_Date

ALTER TABLE tblOrder
ADD CONSTRAINT [def_insert_dt]
DEFAULT (GETDATE())
FOR InsertDate

b. Removing all records from table:

TRUNCATE TABLE tblOrder

c. Populating records once again:

SET STATISTICS IO ON

INSERT INTO tblOrder(OrderType,Qty)
SELECT message_id, severity FROM Sys.messages

Table 'tblOrder'. Scan count 0, logical reads 645023
Table 'Worktable'. Scan count 1, logical reads 585437

After comparing logical reads we can say there are 1036 extra logical reads from tblOrder if we use trigger. Hence in this particular example it is bad approach to use trigger.