Application program code performance tuning SQL server
Looping is most commonly used programming element over collection of
data. Since it is very easy to think and implement a complex algorithm by
processing one row at a
time. When this collection is database objects like
table etc this row based approach could be major performance bottleneck. Since
a table may have 1 million records or even more than that. Just to say to
perform a task, if row based approach is taking four to five hours then it may
possible to accomplish same task by set based approach within 30 to 60 seconds.
Here we are going to discuss about most common faulty code which force to use
loop or recursion:
Partial Join:
Some programmers first fetch the base data from database and store either
in data table or in application cache etc. In the next step they iterate that
data table using loop or recursion and inside that loop there is will be
another child queries are being executed with passing the parameters to child
queries from the output of based query. For examples:
There is business need to pull users as well as their post information.
C# script for this:
Conn = new SqlConnection(@"Integrated Security=SSPI;Persist Security
Info=False;Initial Catalog=ExactHelp;Data Source=RITESH\RITESH14");
Conn.Open();
//Fetching user
information (base query)
SqlCommand Command = Conn.CreateCommand();
Command.CommandText = "SELECT * FROM tblUser";
Reader =
Command.ExecuteReader();
//Looping over base
query
while (Reader.Read()){
SqlConnection InnerConn = new SqlConnection(@"Integrated Security=SSPI;Persist Security
Info=False;Initial Catalog=ExactHelp;Data Source=RITESH\RITESH14");
InnerConn.Open();
SqlCommand InnerCommand =
InnerConn.CreateCommand();
SqlParameter Param;
//fetching post information of given
user ID.
InnerCommand.CommandText
= "SELECT * FROM tblPost WHERE ntUserID = @ntUserID";
Param = new SqlParameter("@ntUserID", SqlDbType.Int, 0);
InnerCommand.Parameters.Add(Param);
Param.Value
= Reader["ntUserID"];
InnerCommand.ExecuteReader();
InnerConn.Close();
}
Reader.Close();
Execution time: 28
minutes.
Above script is one of the examples of partial join. It is better to
join the tbluser with tblPost in the database at the same time instead of first
fetching the user’s information and then fetching the post information of those
users one by one for in a loop.
Above script should be written
something like this:
Conn = new SqlConnection(@"Integrated Security=SSPI;Persist Security
Info=False;Initial Catalog=ExactHelp;Data Source=RITESH\RITESH14");
Conn.Open();
SqlCommand Command = Conn.CreateCommand();
Command.CommandText = "SELECT * FROM tblUser U LEFT
JOIN tblPost P ON U.ntUserID = P.ntUserID";
Command.ExecuteReader();
Execution time: 1
minute 28 seconds (with same set of tables). Hence a code with partial joining
with large volume of data could be big performance bottleneck.
Data manipulation:
It is a best practice to write business logic of data in the database
side. Business logic on data changes with time. In this situation stored
procedure is best approach since modifying the application code is difficult
task. Some programmers first pull the whole the data from database then
using some loop perform business logic on data. For example:
In the below script a simple data manipulation for user type has been
done inside loop:
SqlConnection Conn;
Conn = new SqlConnection(@"Integrated Security=SSPI;Persist Security
Info=False;Initial Catalog=ExactHelp;Data Source=RITESH\RITESH14");
SqlCommand Command = new SqlCommand("SELECT * FROM tblUser", Conn);
Conn.Open();
SqlDataReader Reader = Command.ExecuteReader();
while (Reader.Read())
{
DataGridViewRow row = (DataGridViewRow)dataGridView1.Rows[0].Clone();
row.Cells[0].Value = Reader["ntUserID"];
row.Cells[1].Value = Reader["vcUserName"];
row.Cells[2].Value = Reader["vcPassword"];
row.Cells[3].Value = Reader["ntAge"];
row.Cells[4].Value = Reader["ntCityID"];
//Data manupulation
if (Int32.Parse(Reader["ntUserTypeID"].ToString()) == 1)
{
row.Cells[5].Value = "Admin";
}
else if (Int32.Parse(Reader["ntUserTypeID"].ToString()) == 1)
{
row.Cells[5].Value = "User";
}
else
{
row.Cells[5].Value = "Guest";
}
row.Cells[6].Value = Reader["dtCreateDate"];
dataGridView1.Rows.Add(row);
}
Conn.Close();
Total execution time: 14 seconds
Instead of manipulation of user type by C# code is should be done by
transact SQL using CASE statement. For example:
SqlConnection Conn;
Conn = new SqlConnection(@"Integrated Security=SSPI;Persist Security
Info=False;Initial Catalog=ExactHelp;Data Source=RITESH\RITESH14");
SqlDataAdapter ObjAdapter = new SqlDataAdapter(
@"SELECT [ntUserID] AS [User ID]
,[vcUserName] AS [User Name]
,[vcPassword] AS [Password]
,[ntAge] AS [Age]
,[ntCityID] [City]
,CASE [ntUserTypeID]
WHEN 1
THEN 'Admin'
WHEN 2
THEN 'User'
ELSE 'Guest'
END [User Type]
,[dtCreateDate] [Date]
FROM [dbo].[tblUser]", Conn);
DataTable Dt = new DataTable();
Conn.Open();
ObjAdapter.Fill(Dt);
dataGridView1.DataSource = Dt;
Conn.Close();
Execution time: 1
second
It is due to Transact SQL is used set based approach instead of
looping. Some time it is due to lack of knowledge of database. They think
database can do only SELECT, UPDATE, INSERT etc. They are not aware of
programming capabilities of database like IF ELSE, CASE, looping, SQL Function
etc.
Pull
minimum data from database:
Some programmers fetch all the data
from a database tables. And after some business logic calculation then apply
filter in the application layer. For example they pull 1 million records from database
then in application some filter condition is applied. After that total records reduced
to 100 (say). Which would be unnecessary cause of network traffic or may
database has appropriate indexes to perform that filter condition efficiently.
It is advisable to write business
logic of filter condition in the database side using programming capability of
database.
Minimize number of database invoke:
During a code review, I found a
very inserting piece of code. That code first pulling data from multiple tables
then after some business logic again populate data into single temporary table then
update the records in the database side. Again pull the records from temporary
table to application layer and using loop apply some different business logic
and finally populate data in the database.
It is highly recommended for single
event there should not be more than database call. Either business logical should
be implemented in application code or in database. Not partially in database
and partially in application code. As shown in below diagram:
This degrades overall performance of an application. In this example a single stored procedure should be written to accomplish whole business logic. |
XML and JSON support:
SQL server highly supports XML and
JSON (SQL Server 2016) data format. So there is no need to first fetch data in tabular
format then convert it XML or JSON format using application code which some how use a kind of loop. For large
volume of data it decreases performance significantly. For example:
--JSON
SELECT *
FROM Users
FOR JSON PATH
Sample output:
[
{ "id" : 1,"info": {
"name": "Alain", "surame": "Ray" },
"age": 30 },
{ "id" : 2,"info": {
"name": "Peter", "surame": "scott" },
"age": "25" }
]
--XML
SELECT *
FROM [Friend]
FOR XML AUTO
Sample output:
<Friend FriendId="43" UserId="55" />
<Friend FriendId="66" UserId="76" />
<Friend FriendId="10730136" UserId="304" />
No comments:
Post a Comment