Use of scalar function is good or bad performance analysis
Sometimes we encounter a situation
where there is a little complex business logic where we need to create user
defined scalar value functions. It is due to re-usability or due to simplification
of main SQL statement.
Here only potential issue is
execution plan of the query. SQL server query optimizer generates separate plan
for User defined scalar function which is revoked by main or base execution
plan. Let me explain it by a very good example:
--Creating
sample table
CREATE TABLE
tblMessage(
MessageID BIGINT PRIMARY KEY IDENTITY,
LanguageID INT,
MessageType TINYINT
)
--Populating
records into it
INSERT INTO
tblMessage
SELECT language_id,severity FROM sys.messages
GO
--Creating
scalar function to get count
CREATE FUNCTION
dbo.udfMsgTypeCount(
@MessageType TINYINT
)
RETURNS INT
AS
BEGIN
DECLARE @MsgTypeCount AS INT
SELECT @MsgTypeCount = COUNT(*) FROM tblMessage WHERE MessageType =
@MessageType
RETURN @MsgTypeCount
END
Now we are going to execute two
similar queries, one using scalar user defined function and another using equivalent sub
query of scalar function:
--Approach 1:
Using scalar UDF:
SELECT TOP(1000)
MessageID,
LanguageID,
dbo.udfMsgTypeCount(MessageType) AS Typecount
FROM tblMessage
ORDER BY
MessageID
--Approach 2:
Using subquery
SELECT TOP(1000)
MessageID,
LanguageID,
(SELECT COUNT(*) FROM tblMessage WHERE MessageType = M.MessageType) AS Typecount
FROM tblMessage M
ORDER BY
MessageID
Comparing execution plan:
According to execution plan first approach
(using scalar value function) is better than approach two (Sub query) since
cost of first approach is only 0% while second approach is 100%.
Comparing
execution time:
Approach 1:
CPU time = 87313 ms, elapsed
time = 87522 ms.
Approach 2:
CPU time = 312 ms, elapsed
time = 588 ms.
According to execution time first
approach is bad since it took around 1.4 minutes while second approach took
only 0.6 seconds.
Question 1: Why execution
plan is wrong?
Here execution plan is not
wrong. It is limitation of execution plan. SQL Server execution plan doesn’t include
the execution plan of invoking user defined scalar value functions. If we will observe the execution plan of first
approach it has only compute scalar physical operator (Inside green rectangle)
for invoking UDF.
There is separate execution
plan of UDF: We can get it by script:
SELECT query_plan
FROM sys.dm_exec_cached_plans
CROSS APPLY
sys.dm_exec_query_plan(plan_handle)
WHERE objectid = OBJECT_ID('dbo.udfMsgTypeCount')
Above plan of scalar UDF was
not visible in base query execution plan and total cost has not been evaluated
correctly. Hence it looks like execution plan of first approach was wrong. But in reality second approach is much better than first approach.
Questions: I agree with the limitation that execution
plan is unable to integrate the execution plan of invoking scalar UDF. But why first
approach took 1.4 minutes while second approach took only 0.5 second? Why there
is huge performance difference?
In case of scalar UDF execution
plan of base query and invoking scalar UDF are generated independently. Hence
sometimes overall execution plan may not efficient one compare to the combined execution
plan which has been generated for whole script (second approach). Current demonstration
is one of the good examples for this.
If we will analyze text actual execution plan
(SET STATISTICS
PROFILE ON ), in second approach there
is a table spool which keeps only 16 records which is not in first approach that
is why second approach executes faster:
Using scalar UDF:
Using subquery:
Hence in this situation we
should avoid using scalar user defined function. Apart from this scalar value
function is also not good for query parallelism.
No comments:
Post a Comment