How
to optimize table JOIN if there are many tables in a JOIN statement in the SQL
Server? How to decrease numbers of tables in a JOIN?
Sql Server script examples Query tuning and optimization best practices Databases, Tables, Stored procedures,functions, SSIS, SSRS Interview questions and answers explanation, Errors and solutions, Data 3G tariff recharge
Showing posts with label SQL Server Performance Tuning. Show all posts
Showing posts with label SQL Server Performance Tuning. Show all posts
Jul 8, 2016
Jul 6, 2016
Efficient | fast way to populate large volume of data in a table c# SQL Server
Sometimes there is a need to populate | insert a large volume of data from application code(c#, Java, vb.net) to a table in the database (SQL Server). If data volume is less then performance is not a big concern. But if the data volume is big we must have to consider the most efficient approach. Let us discuss different approaches and their advantages and disadvantages:
Looping over insert statement:
One of the easiest ways to write an insert statement inside a loop. And it will populate records one by one in the database. For example:
foreach (DataRow row in table.Rows) {
using (SqlConnection connection = new SqlConnection(connectionString))
{
SqlCommand cmd = new SqlCommand("INSERT INTO Users(Name, Phone, Age) VALUES (@Name, @Phone, @Age)");
cmd.CommandType = CommandType.Text;
cmd.Connection = connection;
cmd.Parameters.AddWithValue("@Name", row[1]);
cmd.Parameters.AddWithValue("@Phone", row[2]);
cmd.Parameters.AddWithValue("@Age", row[3]);
connection.Open();
cmd.ExecuteNonQuery();
}
}
Here the problem with above script is there is separate database call for each row. For a large volume of records above script will take a significant amount of time.
Multiple insert statement in a single query:
In this approach first all insert queries are saved in a string variable. Then whole insert statement is executed in the database at the same time. For example:
string Query = @"INSERT INTO Users(Name, Phone, Age) VALUES ('Name_1', 'Phone_1', 'Age_1')
INSERT INTO Users(Name, Phone, Age) VALUES ('Name_2', 'Phone_2', 'Age_2')
.....
.....
INSERT INTO Users(Name, Phone, Age) VALUES ('Name_n', 'Phone_n', 'Age_n')"
using (SqlConnection connection = new SqlConnection(connectionString))
{
SqlCommand cmd = new SqlCommand(Query);
cmd.CommandType = CommandType.Text;
cmd.Connection = connection;
connection.Open();
cmd.ExecuteNonQuery();
}
In the above approach database is called only one time but inside database rows are being inserted one by one. So it is rows based approach and it is not efficient in case of a large volume of data.
Table value constructor:
In SQL server table value constructor is set based approach. And it is very efficient. Instead of the full insert statement, we can save the query string in table value constructor format. For example:
string Query = @"INSERT INTO Users(Name, Phone, Age)
VALUES ('Name_1', 'Phone_1', 'Age_1')
VALUES ('Name_2', 'Phone_2', 'Age_2')
.....
.....
VALUES ('Name_n', 'Phone_n', 'Age_n')"
Although this approach is very efficient but limited to 1000 rows. If there are more than 1000 rows above approach is not going to work.
Passing Data table:
One of the best approaches is passing the data table as a parameter to stored procedure. For example:
Create a table type in SQL server:
CREATE TYPE UserType AS TABLE(
Name VARCHAR(50) NOT NULL,
Phone VARCHAR(10),
Age INT
)
Create a stored procedure which accepts UserType as a parameter:
CREATE PROCEDURE AddUsers(
@UserTable UserType READONLY
)
AS
BEGIN
INSERT INTO Users
SELECT * FROM @UserTable
END
Using C# code, execute stored procedure and pass data table as a parameter:
using (SqlConnection conn = new SqlConnection(conn))
using (SqlCommand cmd = new SqlCommand("dbo.AddUsers", conn))
{
var dt = new DataTable();
cmd.CommandType = CommandType.StoredProcedure;
cmd.Parameters.Add(new SqlParameter("@UserType", dt));
conn.Open();
cmd.ExecuteNonQuery();
}
Using bulk insert:
SQL server support bulk inserts that loading the data from a file. For this first, we have to download or copy data file at the specified location using programming language. Then using BULK INSERT command data can be imported into a table. For example:
BULK INSERT EXACT_HELP.dbo.Users
FROM 'C:\Data\Users.txt'
WITH
(
FIELDTERMINATOR =' |',
ROWTERMINATOR =' |\n'
);
We can also do by using C# SqlBulkCopy.
Pass data in the XML format in SQL server:
In your programming language, first of all, covert whole data table in XML format. Pass that XML to a stored procedure to populate the table. For example:
Create a stored procedure:
CREATE PROCEDURE AddUers(
@UserTable XML
)
AS
BEGIN
DECLARE @idoc int
EXEC sp_xml_preparedocument @idoc OUTPUT, @UserTable;
INSERT INTO Users
SELECT *
FROM OPENXML (@idoc, '/ROOT/Users',1)
WITH (
Name VARCHAR(50),
Phone VARCHAR(10),
Age INT)
END
C# code to pass XML data:
DataSet ds = new DataSet();
ds.Tables.Add(dt1);
string UserXml= ds.GetXml();
using (SqlConnection conn = new SqlConnection(conn))
using (SqlCommand cmd = new SqlCommand("dbo.AddUsers", conn))
{
cmd.CommandType = CommandType.StoredProcedure;
cmd.Parameters.Add("@UserTable", SqlDbType.Xml);
cmd.Parameters["@UserTable"].Value = UserXml;
conn.Open();
cmd.ExecuteNonQuery();
}
Sep 17, 2015
Unwanted MANY TO MANY join execution plan SQL Server
Sometimes we need MANY TO
MANY join in SQL Server. Sometime we don’t but event that sometimes SQL Server
query optimizer chooses MANY TO MANY join. That was not our actual need. Let me
explain it by an example:
Let’s assume we have two
tables named UserPhoto and tblUser:
CREATE TABLE [dbo].[UserPhoto] (
[UserPhotoID]
[bigint] PRIMARY KEY
IDENTITY(1, 1) NOT NULL
,[PhotoID] [int] NULL
,[PhotoPath] [varchar](100) NULL
)
CREATE TABLE [dbo].[tblUser] (
[UserID]
[bigint] PRIMARY KEY
IDENTITY(1, 1) NOT NULL
,[UserName] [varchar](100) NULL
,[PhotoID] [int] NULL
)
Creating non-clustered
indexes to improve performance:
CREATE NONCLUSTERED INDEX NCI_PhotoID
ON [dbo].[tblUser] ([PhotoID])
INCLUDE (
[UserID]
,[UserName]
)
CREATE NONCLUSTERED INDEX NCI_PhotoID
ON [dbo].[UserPhoto] ([PhotoID])
INCLUDE ([PhotoPath])
Let’s assume a business
constrains is an user can have only one photo. So we are populating data in
both tables so that an cannot have more than one photoid:
INSERT INTO [UserPhoto] ([PhotoID],[PhotoPath])
SELECT CONCAT (language_id,message_id),CONCAT ('C:\Image\',message_id,'.png' )
FROM sys.messages
INSERT INTO [tblUser] (UserName,PhotoID)
SELECT CONCAT ('User',PhotoID) ,PhotoID
FROM [UserPhoto]
Now we are going to execute
below script which join above two tables on the basis of PhotoID:
SET STATISTICS IO ON
SELECT TOP (100000) U.UserID
,U.UserName
,UP.PhotoPath
FROM tblUser U
INNER JOIN UserPhoto UP ON U.PhotoID = UP.PhotoID
Let’s analyze its execution
plan:
If we will notice on merge
join we will get MANY TO MANY is true. While our business constraint says an
user cannot have more than one photo.
Then while SQL Server query optimizer chosen MANY TO MANY join which is
costlier?
Now we are going to check
total logical reads:
Table 'UserPhoto'. Scan count
5, logical reads 884, Total logical read = 5 * 884 = 4420
Table 'tblUser'. Scan count
5, logical reads 772, Total logical read = 5 * 772 = 3860
Reason behind this is SQL
Server query optimizer doesn’t know about our business constraint “An User can
have one and only one photo”. That is why it has generated bad execution plan
that is MANY TO MANY. Hence it is our task to tell SQL server query optimizer
about our business constraints. In this case we can say about our business
constraint (An User can have one and only one photo) by creating below two
UNIQUE NONCLUSTERED indexes:
CREATE UNIQUE NONCLUSTERED INDEX
NCI_PhotoID
ON [dbo].[tblUser] ([PhotoID])
INCLUDE ([UserID],[UserName])
WITH (DROP_EXISTING = ON)
CREATE UNIQUE NONCLUSTERED INDEX
NCI_PhotoID
ON [dbo].[UserPhoto] ([PhotoID])
INCLUDE ([PhotoPath])
WITH (DROP_EXISTING = ON)
Now if we will execute again
same joining query:
SET STATISTICS IO ON
SELECT TOP (100000)
U.UserID
,U.UserName
,UP.PhotoPath
FROM tblUser U
INNER JOIN UserPhoto UP ON U.PhotoID = UP.PhotoID
Let’s check execution plan:
Now MANY to MANY is false!!!
And total logical read:
Table 'UserPhoto'. Scan count 1, logical reads 487, Total Logical Read = 1 * 487 = 487
Table 'tblUser'. Scan count 1, logical reads 424, Total Logical Read = 1 * 424 = 424
Now this time total logical
read is much lesser than previous one for same query. It was example of
unwanted MANY TO MNAY join. This will unnecessarily degrade query the
performance.
Sep 13, 2015
User defined scalar value function and query optimization Sql Server
Use of scalar function is good or bad performance analysis
Sometimes we encounter a situation
where there is a little complex business logic where we need to create user
defined scalar value functions. It is due to re-usability or due to simplification
of main SQL statement.
Here only potential issue is
execution plan of the query. SQL server query optimizer generates separate plan
for User defined scalar function which is revoked by main or base execution
plan. Let me explain it by a very good example:
--Creating
sample table
CREATE TABLE
tblMessage(
MessageID BIGINT PRIMARY KEY IDENTITY,
LanguageID INT,
MessageType TINYINT
)
--Populating
records into it
INSERT INTO
tblMessage
SELECT language_id,severity FROM sys.messages
GO
--Creating
scalar function to get count
CREATE FUNCTION
dbo.udfMsgTypeCount(
@MessageType TINYINT
)
RETURNS INT
AS
BEGIN
DECLARE @MsgTypeCount AS INT
SELECT @MsgTypeCount = COUNT(*) FROM tblMessage WHERE MessageType =
@MessageType
RETURN @MsgTypeCount
END
Now we are going to execute two
similar queries, one using scalar user defined function and another using equivalent sub
query of scalar function:
--Approach 1:
Using scalar UDF:
SELECT TOP(1000)
MessageID,
LanguageID,
dbo.udfMsgTypeCount(MessageType) AS Typecount
FROM tblMessage
ORDER BY
MessageID
--Approach 2:
Using subquery
SELECT TOP(1000)
MessageID,
LanguageID,
(SELECT COUNT(*) FROM tblMessage WHERE MessageType = M.MessageType) AS Typecount
FROM tblMessage M
ORDER BY
MessageID
Comparing execution plan:
According to execution plan first approach
(using scalar value function) is better than approach two (Sub query) since
cost of first approach is only 0% while second approach is 100%.
Comparing
execution time:
Approach 1:
CPU time = 87313 ms, elapsed
time = 87522 ms.
Approach 2:
CPU time = 312 ms, elapsed
time = 588 ms.
According to execution time first
approach is bad since it took around 1.4 minutes while second approach took
only 0.6 seconds.
Question 1: Why execution
plan is wrong?
Here execution plan is not
wrong. It is limitation of execution plan. SQL Server execution plan doesn’t include
the execution plan of invoking user defined scalar value functions. If we will observe the execution plan of first
approach it has only compute scalar physical operator (Inside green rectangle)
for invoking UDF.
There is separate execution
plan of UDF: We can get it by script:
SELECT query_plan
FROM sys.dm_exec_cached_plans
CROSS APPLY
sys.dm_exec_query_plan(plan_handle)
WHERE objectid = OBJECT_ID('dbo.udfMsgTypeCount')
Above plan of scalar UDF was
not visible in base query execution plan and total cost has not been evaluated
correctly. Hence it looks like execution plan of first approach was wrong. But in reality second approach is much better than first approach.
Questions: I agree with the limitation that execution
plan is unable to integrate the execution plan of invoking scalar UDF. But why first
approach took 1.4 minutes while second approach took only 0.5 second? Why there
is huge performance difference?
In case of scalar UDF execution
plan of base query and invoking scalar UDF are generated independently. Hence
sometimes overall execution plan may not efficient one compare to the combined execution
plan which has been generated for whole script (second approach). Current demonstration
is one of the good examples for this.
If we will analyze text actual execution plan
(SET STATISTICS
PROFILE ON ), in second approach there
is a table spool which keeps only 16 records which is not in first approach that
is why second approach executes faster:
Using scalar UDF:
Using subquery:
Hence in this situation we
should avoid using scalar user defined function. Apart from this scalar value
function is also not good for query parallelism.
Sep 6, 2015
Use of trigger SQL Server Query Performance good or bad
Avoid using DML trigger at
inappropriate place
I am not going to tell triggers are
bad but there are many situations where using of triggers degrade the performance.
For example, sometimes there are triggers
which reference same time and perform the same task which could be done the script which fire it. In that case there is no need to create triggers.
Let me explain it by an example:
Approach 1: Using trigger
a. Creating a table:
CREATE TABLE
tblOrder(
OrderID BIGINT PRIMARY KEY IDENTITY,
OrderType INT,
Qty INT,
InsertDate DATETIME
)
b. Creating a trigger to
update the insertion date:
CREATE TRIGGER
Trg_Insert_Date ON tblOrder
AFTER INSERT
AS
BEGIN
UPDATE tblOrder
SET InsertDate =
GETDATE()
FROM tblOrder O
INNER JOIN inserted I
ON O.OrderID = I.OrderID
END
c. Populating records in the
order table and enabling statistics:
SET STATISTICS
IO ON
INSERT INTO
tblOrder(OrderType,Qty)
SELECT message_id, severity FROM Sys.messages
Total logical read:
Table 'tblOrder'. Scan count
0, logical reads 645023
Table 'Worktable'. Scan count
1, logical reads 585437
Table 'tblOrder'. Scan count
1, logical reads 1036
Approach 2: Using Default value
a. Dropping trigger and creating
equivalent default constraint on InsertDate column:
DROP TRIGGER
Trg_Insert_Date
ALTER TABLE
tblOrder
ADD CONSTRAINT
[def_insert_dt]
DEFAULT (GETDATE())
FOR InsertDate
b. Removing all records from
table:
TRUNCATE TABLE
tblOrder
c. Populating records once again:
SET STATISTICS
IO ON
INSERT INTO
tblOrder(OrderType,Qty)
SELECT message_id, severity FROM Sys.messages
Table 'tblOrder'. Scan count
0, logical reads 645023
Table 'Worktable'. Scan count
1, logical reads 585437
After comparing logical
reads we can say there are 1036 extra logical reads from tblOrder if we use
trigger. Hence in this particular example it is bad approach to use trigger.
Subscribe to:
Posts (Atom)