Dec 26, 2014

Application code best practices SQL SERVER

Application program code performance tuning SQL server


Looping is most commonly used programming element over collection of data. Since it is very easy to think and implement a complex algorithm by processing one row at a
time. When this collection is database objects like table etc this row based approach could be major performance bottleneck. Since a table may have 1 million records or even more than that.  Just to say to perform a task, if row based approach is taking four to five hours then it may possible to accomplish same task by set based approach within 30 to 60 seconds. Here we are going to discuss about most common faulty code which force to use loop or recursion:

Partial Join:
Some programmers first fetch the base data from database and store either in data table or in application cache etc. In the next step they iterate that data table using loop or recursion and inside that loop there is will be another child queries are being executed with passing the parameters to child queries from the output of based query. For examples:

There is business need to pull users as well as their post information. C# script for this:

 Conn = new SqlConnection(@"Integrated Security=SSPI;Persist Security Info=False;Initial Catalog=ExactHelp;Data Source=RITESH\RITESH14");
Conn.Open();

//Fetching user information (base query)
SqlCommand Command = Conn.CreateCommand();
Command.CommandText = "SELECT * FROM tblUser";
Reader = Command.ExecuteReader();

//Looping over base query
while (Reader.Read()){
    SqlConnection InnerConn = new SqlConnection(@"Integrated Security=SSPI;Persist Security Info=False;Initial Catalog=ExactHelp;Data Source=RITESH\RITESH14");
    InnerConn.Open();

   SqlCommand InnerCommand = InnerConn.CreateCommand();
   SqlParameter Param;

  //fetching post information of given user ID.
  InnerCommand.CommandText = "SELECT * FROM tblPost WHERE ntUserID = @ntUserID";
  Param = new SqlParameter("@ntUserID"SqlDbType.Int, 0);
  InnerCommand.Parameters.Add(Param);
  Param.Value = Reader["ntUserID"];

  InnerCommand.ExecuteReader();
  InnerConn.Close();

}

Reader.Close();

Execution time: 28 minutes.

Above script is one of the examples of partial join. It is better to join the tbluser with tblPost in the database at the same time instead of first fetching the user’s information and then fetching the post information of those users one by one for in a loop.

Above script should be written something like this:

Conn = new SqlConnection(@"Integrated Security=SSPI;Persist Security Info=False;Initial Catalog=ExactHelp;Data Source=RITESH\RITESH14");
Conn.Open();

SqlCommand Command = Conn.CreateCommand();
Command.CommandText = "SELECT * FROM tblUser U LEFT JOIN tblPost P ON U.ntUserID = P.ntUserID";
Command.ExecuteReader();

Execution time: 1 minute 28 seconds (with same set of tables). Hence a code with partial joining with large volume of data could be big performance bottleneck.

Data manipulation:
It is a best practice to write business logic of data in the database side. Business logic on data changes with time. In this situation stored procedure is best approach since modifying the application code is difficult task.  Some programmers first pull the whole the data from database then using some loop perform business logic on data. For example:

In the below script a simple data manipulation for user type has been done inside loop:

  SqlConnection Conn;
            Conn = new SqlConnection(@"Integrated Security=SSPI;Persist Security Info=False;Initial Catalog=ExactHelp;Data Source=RITESH\RITESH14");

            SqlCommand Command = new SqlCommand("SELECT * FROM tblUser", Conn);
            Conn.Open();
            SqlDataReader Reader = Command.ExecuteReader();

            while (Reader.Read())
            {

                DataGridViewRow row = (DataGridViewRow)dataGridView1.Rows[0].Clone();

                row.Cells[0].Value = Reader["ntUserID"];
                row.Cells[1].Value = Reader["vcUserName"];
                row.Cells[2].Value = Reader["vcPassword"];
                row.Cells[3].Value = Reader["ntAge"];
                row.Cells[4].Value = Reader["ntCityID"];

                //Data manupulation
                if (Int32.Parse(Reader["ntUserTypeID"].ToString()) == 1)
                {
                    row.Cells[5].Value = "Admin";
                }
                else if (Int32.Parse(Reader["ntUserTypeID"].ToString()) == 1)
                {

                    row.Cells[5].Value = "User";
                }

                else
                {
                    row.Cells[5].Value = "Guest";
                }

                row.Cells[6].Value = Reader["dtCreateDate"];

                dataGridView1.Rows.Add(row);

            }

            Conn.Close();

Total execution time: 14 seconds

Instead of manipulation of user type by C# code is should be done by transact SQL using CASE statement. For example:

SqlConnection Conn;
            Conn = new SqlConnection(@"Integrated Security=SSPI;Persist Security Info=False;Initial Catalog=ExactHelp;Data Source=RITESH\RITESH14");

            SqlDataAdapter ObjAdapter = new SqlDataAdapter(
                @"SELECT [ntUserID] AS [User ID]
                        ,[vcUserName] AS [User Name]
                        ,[vcPassword] AS [Password]
                        ,[ntAge] AS [Age]
                        ,[ntCityID] [City]
                        ,CASE [ntUserTypeID]
                                WHEN 1
                                        THEN 'Admin'
                                WHEN 2
                                        THEN 'User'
                                ELSE 'Guest'
                                END [User Type]
                        ,[dtCreateDate] [Date]
                FROM [dbo].[tblUser]", Conn);
            DataTable Dt = new DataTable();
            Conn.Open();
            ObjAdapter.Fill(Dt);

            dataGridView1.DataSource = Dt;
            Conn.Close();


Execution time: 1 second

It is due to Transact SQL is used set based approach instead of looping. Some time it is due to lack of knowledge of database. They think database can do only SELECT, UPDATE, INSERT etc. They are not aware of programming capabilities of database like IF ELSE, CASE, looping, SQL Function etc.

Pull minimum data from database:
Some programmers fetch all the data from a database tables. And after some business logic calculation then apply filter in the application layer. For example they pull 1 million records from database then in application some filter condition is applied. After that total records reduced to 100 (say). Which would be unnecessary cause of network traffic or may database has appropriate indexes to perform that filter condition efficiently.  
It is advisable to write business logic of filter condition in the database side using programming capability of database.

Minimize number of database invoke:
During a code review, I found a very inserting piece of code. That code first pulling data from multiple tables then after some business logic again populate data into single temporary table then update the records in the database side. Again pull the records from temporary table to application layer and using loop apply some different business logic and finally populate data in the database.
It is highly recommended for single event there should not be more than database call. Either business logical should be implemented in application code or in database. Not partially in database and partially in application code. As shown in below diagram:

This degrades overall performance of an application. In this example a single stored procedure should be written to accomplish whole business logic.

XML and JSON support:
SQL server highly supports XML and JSON (SQL Server 2016) data format. So there is no need to first fetch data in tabular format then convert it XML or JSON format using application code which some how use a kind of loop. For large volume of data it decreases performance significantly. For example:

--JSON
SELECT * 
FROM Users
FOR JSON PATH

Sample output:
[
        { "id" : 1,"info": { "name": "Alain", "surame": "Ray" }, "age": 30 },
        { "id" : 2,"info": { "name": "Peter", "surame": "scott" }, "age": "25" }
]

--XML
SELECT *
FROM [Friend]
FOR XML AUTO

Sample output:
<Friend FriendId="43" UserId="55" />
<Friend FriendId="66" UserId="76" />
<Friend FriendId="10730136" UserId="304" />

No comments:

Post a Comment