Sep 13, 2015

User defined scalar value function and query optimization Sql Server

Use of scalar function is good or bad performance analysis

Sometimes we encounter a situation where there is a little complex business logic where we need to create user defined scalar value functions. It is due to re-usability or due to simplification of main SQL statement.

Here only potential issue is execution plan of the query. SQL server query optimizer generates separate plan for User defined scalar function which is revoked by main or base execution plan. Let me explain it by a very good example:

--Creating sample table
CREATE TABLE tblMessage(
     MessageID BIGINT PRIMARY KEY IDENTITY,
     LanguageID INT,
     MessageType TINYINT
)

--Populating records into it
INSERT INTO tblMessage
SELECT language_id,severity FROM sys.messages

GO

--Creating scalar function to get count
CREATE FUNCTION dbo.udfMsgTypeCount(
     @MessageType TINYINT
)
RETURNS INT
AS
BEGIN
     DECLARE @MsgTypeCount AS INT

     SELECT @MsgTypeCount = COUNT(*) FROM tblMessage WHERE MessageType = @MessageType

     RETURN @MsgTypeCount
END

Now we are going to execute two similar queries, one using scalar user defined function and another using equivalent sub query of scalar function:

--Approach 1: Using scalar UDF:
SELECT TOP(1000)
     MessageID,
     LanguageID,
     dbo.udfMsgTypeCount(MessageType) AS Typecount
FROM tblMessage
ORDER BY MessageID

--Approach 2: Using subquery
SELECT TOP(1000)
     MessageID,
     LanguageID,
     (SELECT COUNT(*) FROM tblMessage WHERE MessageType = M.MessageType) AS Typecount
FROM tblMessage M
ORDER BY MessageID

Comparing execution plan:


According to execution plan first approach (using scalar value function) is better than approach two (Sub query) since cost of first approach is only 0% while second approach is 100%.

Comparing execution time:

Approach 1:
CPU time = 87313 ms, elapsed time = 87522 ms.

Approach 2:
CPU time = 312 ms, elapsed time = 588 ms.

According to execution time first approach is bad since it took around 1.4 minutes while second approach took only 0.6 seconds.
  
Question 1: Why execution plan is wrong?

Here execution plan is not wrong. It is limitation of execution plan. SQL Server execution plan doesn’t include the execution plan of invoking user defined scalar value functions.  If we will observe the execution plan of first approach it has only compute scalar physical operator (Inside green rectangle) for invoking UDF.

There is separate execution plan of UDF: We can get it by script:

SELECT query_plan
FROM sys.dm_exec_cached_plans
CROSS APPLY sys.dm_exec_query_plan(plan_handle)
WHERE objectid =  OBJECT_ID('dbo.udfMsgTypeCount')

Above plan of scalar UDF was not visible in base query execution plan and total cost has not been evaluated correctly. Hence it looks like execution plan of first approach was wrong. But in reality second approach is much better than first approach.

Questions:  I agree with the limitation that execution plan is unable to integrate the execution plan of invoking scalar UDF. But why first approach took 1.4 minutes while second approach took only 0.5 second? Why there is huge performance difference?

In case of scalar UDF execution plan of base query and invoking scalar UDF are generated independently. Hence sometimes overall execution plan may not efficient one compare to the combined execution plan which has been generated for whole script (second approach). Current demonstration is one of the good examples for this.

If we will analyze text actual execution plan (SET STATISTICS PROFILE ON ), in second approach there is a table spool which keeps only 16 records which is not in first approach that is why second approach executes faster:

Using scalar UDF:


Using subquery:


Hence in this situation we should avoid using scalar user defined function. Apart from this scalar value function is also not good for query parallelism.

No comments:

Post a Comment