Feb 5, 2013

Best ways to pass comma delimited varchar variables to IN clause sql server


There are many ways to pass the comma delimited string variable to IN clauses. Here we would like discuss about performance what the best way to do this is. I have done little research and compared the performance of stored procedure by using sql server actual execution plan. I am listing out is from worst performance to best performance. Before this I am creating a table, index and inserting some records into it:

--Creating table
CREATE TABLE tblMsg(
     ntID BIGINT PRIMARY KEY IDENTITY,
     ntServeiry INT,
     vcMsg VARCHAR(100),
)

--Inserting some records into it
INSERT tblMsg
SELECT severity,LEFT(Text,100)
FROM sys.messages

--Creating non-clustered index on it
CREATE NONCLUSTERED INDEX NCI_Serveiry
ON tblMsg(ntServeiry)
INCLUDE(vcMsg)

Solution five:  Here we are first converting the delimited data into XML form and getting data using XQuery.

CREATE PROC uspGetMsg5(
     @vcseverity AS VARCHAR(100)
)
AS
BEGIN

     DECLARE @xlData AS XML
    
     SET @xlData = '<Root><Data>'
          + REPLACE(@vcseverity,',','</Data><Data>')
          + '</Data></Root>'

     SELECT vcMsg
     FROM tblMsg
     WHERE ntServeiry IN (SELECT C.value('.','int')
          FROM @xlData.nodes('/Root/Data') AS T(C))
    
END

Solution four: Here we are first converting the delimited data into XML form and getting data using OPENXML.

CREATE PROC uspGetMsg4(
     @vcseverity AS VARCHAR(100)
)
AS
BEGIN
     DECLARE @xlData AS XML
     DECLARE @ntIdoc AS INT
    
     SET @xlData = '<Root><Data>'
          + REPLACE(@vcseverity,',','</Data><Data>')
          + '</Data></Root>'
    
     EXEC SP_XML_PREPAREDOCUMENT @ntIdoc OUTPUT, @xlData
    
     SELECT vcMsg
     FROM tblMsg
     WHERE ntServeiry IN (SELECT CustomerID
          FROM OPENXML (@ntIdoc, '/Root/Data',1)
          WITH(CustomerID  int   '.'))

     EXEC SP_XML_REMOVEDOCUMENT @ntIdoc
    
END

Comparison between solution five and four:

EXECUTE dbo.uspGetMsg5 '15, 20, 21'
EXECUTE dbo.uspGetMsg4 '15, 20, 21'

Actual execution plan:


Solution four is much better than solution five.

Solution three: Here we are using a table valued function named Split. You can this function from following link:

CREATE PROC uspGetMsg3(
     @vcseverity AS VARCHAR(100)
)
AS
BEGIN
    
     SELECT vcMsg
     FROM tblMsg
     WHERE ntServeiry IN (SELECT *
     FROM dbo.Split(@vcseverity,','))
    
END

Comparison between solution four and three

EXECUTE dbo.uspGetMsg4 '15, 20, 21'
EXECUTE dbo.uspGetMsg3 '15, 20, 21'

Actual execution plan:


Solution three is much better than solution four.

Solution two: Here we are using dynamic sql to insert delimited data into the table variable.

CREATE PROC uspGetMsg2(
     @vcseverity AS VARCHAR(100)
)
AS
BEGIN
    
     DECLARE @vcSqlQuery AS NVARCHAR(MAX)
     DECLARE @tblSeverity AS TABLE(ntSeverity INT)
    
     SET @vcSqlQuery = 'SELECT '
          + REPLACE(@vcseverity,',',' UNION ALL SELECT ')
         
     INSERT INTO @tblSeverity
     EXECUTE SP_EXECUTESQL @vcSqlQuery
    
     SELECT vcMsg
     FROM tblMsg
     WHERE ntServeiry IN (SELECT ntSeverity
          FROM @tblSeverity)
    
END

Comparison between solution three and two

EXECUTE dbo.uspGetMsg3 '15, 20, 21'
EXECUTE dbo.uspGetMsg2 '15, 20, 21'

Actual execution plan:


Solution two is better than solution three.

Solution one: Here we are using dynamic sql.

CREATE PROC uspGetMsg1(
     @vcseverity AS VARCHAR(100)
)
AS
BEGIN
    
     DECLARE @vcSqlQuery AS NVARCHAR(MAX)
    
     SET @vcSqlQuery = N'SELECT vcMsg
          FROM tblMsg
          WHERE ntServeiry IN (' + @vcseverity + ')'
    
     EXECUTE SP_EXECUTESQL
          @vcSqlQuery
    
END

Comparison between solution two and one

EXECUTE dbo.uspGetMsg2 '15, 20, 21'
EXECUTE dbo.uspGetMsg1 '15, 20, 21'

Actual execution plan:



Solution one is much better than solution two.

Wait, It is not conclusion. It is one aspect. Now we are going to compare according to query caching and reuse. Now we are going to execute dbo.uspGetMsg1, dbo.uspGetMsg2 and dbo.uspGetMsg3 with three different parameters. Also we would to set statistics time on. Also for better comparison we would to clean the cached plane:

--To clean the cached plan.
--Don't execute it on production server
DBCC FREEPROCCACHE

--To get the compilation and execution time
SET STATISTICS TIME ON

EXEC [uspGetMsg1] '15, 20, 21'
EXEC [uspGetMsg1] '15, 23, 21'
EXEC [uspGetMsg1] '15, 23,   21'

We will observe in all three executions stored procedure uspGetMsg1 has get compiled only one time while a query statement in this procedure has compiled three times. We can find out the query statements by following query:

SELECT objtype,[text]
FROM sys.dm_exec_cached_plans
CROSS APPLY sys.dm_exec_sql_text(plan_handle) AS st
WHERE text LIKE N'%tblMsg%'
     AND text NOT LIKE '%sys.dm_exec_cached_plans%'

Output:

objtype
text
Proc
CREATE PROC uspGetMsg1(   @vcseverity AS VARCHAR(100)  )  AS  BEGIN      DECLARE @vcSqlQuery AS NVARCHAR(MAX)      SET @vcSqlQuery = N'SELECT vcMsg FROM tblMsg WHERE ntServeiry IN (' + @vcseverity + ')'      EXECUTE SP_EXECUTESQL     @vcSqlQuery     END
Adhoc
SELECT vcMsg FROM tblMsg WHERE ntServeiry IN (15,23, 21)
Adhoc
SELECT vcMsg FROM tblMsg WHERE ntServeiry IN (15,23,21)
Adhoc
SELECT vcMsg FROM tblMsg WHERE ntServeiry IN (15,20,21)

So for each different parameter one adhoc query is executed by procedure uspGetMsg1. So we have to pay extra compilation time in execution apart from unnecessary query caching etc. So definitely it is not good idea to pass comma delimited varchar variables to IN clause sql server using solution 1.

Let's check the solution 2: 

--Don't execute it on production server
DBCC FREEPROCCACHE

--To get the compilation and execution time
SET STATISTICS TIME ON

EXEC [uspGetMsg2] '15, 20, 21'
EXEC [uspGetMsg2] '15, 23, 21'
EXEC [uspGetMsg2] '15, 23,   21'

We will observe in all three executions stored procedure uspGetMsg2 has get compiled only one time while a query statement in this procedure has compiled three times. We can find out the query statements by following query:

SELECT objtype,[text]
FROM sys.dm_exec_cached_plans
CROSS APPLY sys.dm_exec_sql_text(plan_handle) AS st
WHERE (text LIKE N'%UNION ALL%'
     OR text LIKE N'%tblMsg%')
     AND text NOT LIKE '%sys.dm_exec_cached_plans%'

Output:

objtype
text
Proc
CREATE PROC [dbo].[uspGetMsg2](   @vcseverity AS VARCHAR(100)  )  AS  BEGIN      DECLARE @vcSqlQuery AS NVARCHAR(MAX)   DECLARE @tblSeverity AS TABLE(ntSeverity INT)      SET @vcSqlQuery = 'SELECT ' + REPLACE(@vcseverity,',',' UNION ALL SELECT ')      INSERT INTO @tblSeverity   EXECUTE SP_EXECUTESQL @vcSqlQuery      SELECT vcMsg    FROM tblMsg    WHERE ntServeiry IN (SELECT ntSeverity FROM @tblSeverity)     END
Adhoc
SELECT 15 UNION ALL SELECT 23 UNION ALL SELECT  21
Adhoc
SELECT 15 UNION ALL SELECT 23 UNION ALL SELECT 21
Adhoc
SELECT 15 UNION ALL SELECT 20 UNION ALL SELECT 21

So for each different parameter one adhoc query is executed by procedure uspGetMsg2. So we have to pay extra compilation time in execution apart from unnecessary query caching etc. So definitely it is not good idea to pass comma delimited varchar variables to IN clause sql server using solution 2.

Let's check the solution 1: 

--Don't execute it on production server
DBCC FREEPROCCACHE

--To get the compilation and execution time
SET STATISTICS TIME ON

EXEC [uspGetMsg3] '15, 20, 21'
EXEC [uspGetMsg3] '15, 23, 21'
EXEC [uspGetMsg3] '15, 23,   21'

We will observe in all three executions stored procedure uspGetMsg3 and tabled value function has got compiled only one times. We can find out the by following query:

SELECT objtype,[text]
FROM sys.dm_exec_cached_plans
CROSS APPLY sys.dm_exec_sql_text(plan_handle) AS st
WHERE (text LIKE N'%tblMsg%'
     OR text LIKE N'%Split%')
     AND text NOT LIKE '%sys.dm_exec_cached_plans%'

Output:

objtype
text
Proc
  CREATE FUNCTION [dbo].[Split](@String varchar(8000), @Delimiter char(1))         returns @temptable TABLE (items varchar(8000))         as         begin             declare @idx int             declare @slice varchar(8000)                     select @idx = 1                 if len(@String)<1 or @String is null  return                     while @idx!= 0             begin                 set @idx = charindex(@Delimiter,@String)                 if @idx!=0                     set @slice = left(@String,@idx - 1)                 else                     set @slice = @String                             if(len(@slice)>0)                insert into @temptable(Items) values(@slice)                     set @String = right(@String,len(@String) - @idx)                 if len(@String) = 0 break             end     return         end   
Proc
CREATE PROC uspGetMsg3(   @vcseverity AS VARCHAR(100)  )  AS  BEGIN      SELECT vcMsg    FROM tblMsg    WHERE ntServeiry IN (SELECT * FROM dbo.Split(@vcseverity,','))     END

By using solution 3 there are not any adhoc queries. Using this approach we can save the unnecessary compilation time. So out of 5 solutions 3 is best solution to achieve this goal. If you have any better solutions please suggest us.

Write sql queries in set based approach sql server
Sql server best practices for stored procedure
What is Adhoc query in sql server: Cached Query Plan
Subquery vs inner join which one is better in sql server
Difference between actual execution plan and estimated execution plan in sql server

No comments:

Post a Comment