Use of scalar function is good or bad performance analysis
Sometimes we encounter a situation where there is a little complex business logic where we need to create user defined scalar value functions. It is due to re-usability or due to simplification of main SQL statement.
Here only potential issue is execution plan of the query. SQL server query optimizer generates separate plan for User defined scalar function which is revoked by main or base execution plan. Let me explain it by a very good example:
--Creating sample table
CREATE TABLE tblMessage(
MessageID BIGINT PRIMARY KEY IDENTITY,
--Populating records into it
INSERT INTO tblMessage
SELECT language_id,severity FROM sys.messages
--Creating scalar function to get count
CREATE FUNCTION dbo.udfMsgTypeCount(
DECLARE @MsgTypeCount AS INT
SELECT @MsgTypeCount = COUNT(*) FROM tblMessage WHERE MessageType = @MessageType
Now we are going to execute two similar queries, one using scalar user defined function and another using equivalent sub query of scalar function:
--Approach 1: Using scalar UDF:
dbo.udfMsgTypeCount(MessageType) AS Typecount
ORDER BY MessageID
--Approach 2: Using subquery
(SELECT COUNT(*) FROM tblMessage WHERE MessageType = M.MessageType) AS Typecount
FROM tblMessage M
ORDER BY MessageID
Comparing execution plan:
According to execution plan first approach (using scalar value function) is better than approach two (Sub query) since cost of first approach is only 0% while second approach is 100%.
Comparing execution time:
CPU time = 87313 ms, elapsed time = 87522 ms.
CPU time = 312 ms, elapsed time = 588 ms.
According to execution time first approach is bad since it took around 1.4 minutes while second approach took only 0.6 seconds.
Question 1: Why execution plan is wrong?
Here execution plan is not wrong. It is limitation of execution plan. SQL Server execution plan doesn’t include the execution plan of invoking user defined scalar value functions. If we will observe the execution plan of first approach it has only compute scalar physical operator (Inside green rectangle) for invoking UDF.
There is separate execution plan of UDF: We can get it by script:
CROSS APPLY sys.dm_exec_query_plan(plan_handle)
WHERE objectid = OBJECT_ID('dbo.udfMsgTypeCount')
Above plan of scalar UDF was not visible in base query execution plan and total cost has not been evaluated correctly. Hence it looks like execution plan of first approach was wrong. But in reality second approach is much better than first approach.
Questions: I agree with the limitation that execution plan is unable to integrate the execution plan of invoking scalar UDF. But why first approach took 1.4 minutes while second approach took only 0.5 second? Why there is huge performance difference?
In case of scalar UDF execution plan of base query and invoking scalar UDF are generated independently. Hence sometimes overall execution plan may not efficient one compare to the combined execution plan which has been generated for whole script (second approach). Current demonstration is one of the good examples for this.
If we will analyze text actual execution plan (SET STATISTICS PROFILE ON ), in second approach there is a table spool which keeps only 16 records which is not in first approach that is why second approach executes faster:
Using scalar UDF:
Hence in this situation we should avoid using scalar user defined function. Apart from this scalar value function is also not good for query parallelism.