Oct 14, 2013

Default sort order of a SELECT statement in Sql server


Order By clause in Sql server: Default sort order of a SELECT statement in Sql server

Suppose we are creating a table and inserting some records by using following sql queries:

--Deleting tblEmployee table if it already present.
IF OBJECT_ID('tblEmployee') IS NOT NULL
     DROP TABLE tblEmployee

--Creating tblEmployee.
CREATE TABLE tblEmployee(
     ntEmpID BIGINT PRIMARY KEY IDENTITY,
     vcName VARCHAR(Max),
     ntAge INTEGER
)

--Creating one non clustered index on it.
CREATE NONCLUSTERED INDEX NCI_Age ON tblEmployee(ntAge)  

--Inserting some records into it 
INSERT INTO tblEmployee
SELECT TOP(90000) REPLICATE(LEFT([Text],1),5000), severity FROM Sys.messages  

Now I am executing two SELECT statements:

--Statement one   
SELECT TOP(10)
     ntEmpID
FROM  tblEmployee  WHERE ntAge IN(15,16)  
    
--Statement two   
SELECT TOP(10)
     ntEmpID,
     LEFT(vcName,1) AS vcName_Initial
FROM      tblEmployee WHERE ntAge IN(15,16)

Both the select statements are exactly same except first select statement is only selecting  the ntEmpID while second select statement also selects the vcName_Initial. So both of the queries will return same result set except second select query will also return the vcName_Initial. Let's check the output:

Statement One:
ntEmpID
2
3
4
5
6
8
9
10
11
12

Statement Two:
ntEmpID
vcName
2
Q
3
I
4
T
5
O
6
U
7
T
8
T
9
T
10
T
11
T

I became shocked. Statement one and two are returning the different ntEmpID in its result set!! Result set of first select statement includes 12 but not in second select statement. Result set of second select statement includes 7 but not by the first select statement. Why ntEmpID is different? Is it due to the first query is selecting one column while the second query two columns? I don't think so.

Default Order:

AS I know when we insert any records into a table, default order will be chronological order of insertion. Let's check it:

I am creating a table and inserting some records into it by following sql script:

--Deleting tblKeyValue table if it already present.
IF OBJECT_ID('tblKeyValue') IS NOT NULL
     DROP TABLE tblKeyValue
    
--Creating tblKeyValue table
CREATE TABLE tblKeyValue(
     vcKey VARCHAR(900),
     vcValue INTEGER
)

--Inserting 11 records into the tblKeyValue
INSERT tblKeyValue VALUES(REPLICATE('E',900),2)
INSERT tblKeyValue VALUES(REPLICATE('X',900),4)
INSERT tblKeyValue VALUES(REPLICATE('A',900),6)
INSERT tblKeyValue VALUES(REPLICATE('C',900),8)
INSERT tblKeyValue VALUES(REPLICATE('T',900),1)
INSERT tblKeyValue VALUES(REPLICATE('H',900),5)
INSERT tblKeyValue VALUES(REPLICATE('E',900),3)
INSERT tblKeyValue VALUES(REPLICATE('L',900),7)
INSERT tblKeyValue VALUES(REPLICATE('P',900),9)
INSERT tblKeyValue VALUES(REPLICATE('A',900),11)
INSERT tblKeyValue VALUES(REPLICATE('B',900),10)

Now I am checking the default order of selection:

SELECT
     LEFT(vcKey,1) AS vcKey,
     vcValue
FROM tblKeyValue

vcKey
vcValue
E
2
X
4
A
6
C
8
T
1
H
5
E
3
L
7
P
9
A
11
B
10

So default order of select statement is chronological order of insertion.

Now I am inserting two more records into the table:

INSERT tblKeyValue VALUES('X',12)
INSERT tblKeyValue VALUES('Y',13)

Now I am again checking the default order of selection:

SELECT
     LEFT(vcKey,1) AS vcKey,
     vcValue
FROM tblKeyValue

vcKey
vcValue
E
2
X
4
A
6
C
8
T
1
H
5
E
3
L
7
X
12
Y
13
P
9
A
11
B
10

Now it is not according to chronological order of insertion. It should insert at the bottom. So it is wrong to say that default order of selection is chronological order of insertion.

Now I am explaining it.  Few important points about sql server:

1. When we insert any records into a table it is stored in data file.
2. Size of data file is 8 KB. But only 8060 bytes is used to store data. 96 bytes are used for header information.
3. A single row must be inserted into one data file. In other word it is not possible to store data of a single row in two data files.
4. If data file has less space than a single row then whole row will be moved to another data file.

In the last example,   
Size of a row is 900 + 4 bytes

First eight rows will be inserted into the first data file and will occupy 8 * 904 = 7232 bytes.
Remaining spaces in the first data file: 8060 - 7232 = 828 bytes.


If we will try to insert 9th row which size is 904 bytes but free spaces in the first data file is only 828 bytes. Hence it will not fit in the first data file so whole row will be moved to the new data file. Same is true for 10th and 11th rows.


Consider on 12th row which size is only (1 + 4) bytes. It can fit in first data file. Same is also true for 13th row.


And sql server will retrieve data from first data file then second data file and so on. I think you got the point why should not belief on default order.

Note: It is not necessary that sql server first full one data file then move to second data file even there are some rows which fits into the first data file.

How can we assure for default order of selection?

I am creating a clustered index on the column vcKey:

CREATE CLUSTERED INDEX CI_Key ON tblKeyValue(vcKey)  

Now, check the default order of selection:

SELECT
     LEFT(vcKey,1) AS vcKey,
     vcValue
FROM tblKeyValue

vcKey
vcValue
A
6
A
11
B
10
C
8
E
2
E
3
H
5
L
7
P
9
T
1
X
12
X
4
Y
13

Wow! Data has sorted according to key column of the clustered index. Finally I got it. Default order of selection in sql server is key columns of a clustered index.

Now I am creating a non-clustered index on the column vcValue:

CREATE NONCLUSTERED INDEX NCI_Value ON tblKeyValue(vcValue)

Now check the default order of selection:

SELECT
     LEFT(vcKey,1) AS vcKey,
     vcValue
FROM tblKeyValue

vcKey
vcValue
T
1
E
2
E
3
X
4
H
5
A
6
L
7
C
8
P
9
B
10
A
11
X
12
Y
13

Now it has sorted according to key column of non-clustered index. Its means that time I was wrong: "Default order of selection in sql server is key columns of a clustered index ".

A table can have more than one clustered index. It is very difficult to find out our query is using which non-clustered index?   

Note: Also there are some situations in which a query is using any index but default order of selection will not be according to the key column of that index.

This question is still open:

How can we assure for default order of selection?

Conclusion:

There is not any concept of default order of selection in sql server. We can only assure that data will be selected according specific order of the columns only by using ORDER BY clause. For example:

SELECT
     LEFT(vcKey,1) AS vcKey,
     vcValue
FROM tblKeyValue

3 comments:

  1. Anonymous1/20/2014

    good article, thank you.

    ReplyDelete
  2. Anonymous8/08/2014

    Nice Article

    ReplyDelete
  3. Anonymous9/30/2014

    Really Awesum article boss....Thanks ...for the detail explanation

    ReplyDelete