Jan 6, 2012

What is non clustered index in sql server with examples

What is a non clustered index in SQL server?


Consider on this query:

SELECT * FROM Student WHERE Age = 18

Our data has sorted on column RollNo not on Age :( How to search the students in the table tblSudent (clustered index) for a given age efficiently? We got the same problem as we had got for RollNo in the Heap. If a problem is same then a solution should be same that is store the data of the table tblSudent in the sorted order of column Age. But this time problem is a little complex. If we will try to store data in the sorted order of column Age then sorting on column RollNo will disturb.

We need data to be stored on the basis of column RollNo for query:

SELECT * FROM Student WHERE RollNo = 111

We also need data to be stored on the basis of column Age for query:

SELECT * FROM Student WHERE Age = 18

We want both!! How is it possible? Any object can have only one physical organization.  Then what is a solution? I'm telling what SQL server does. It creates a copy of the table and stored data in a sorted order of column Age in the copied table and it doesn't change anything in tblSudent(clustered index). This new copied table is called non-clustered index.  Let's assume the name of the copied table is tblStdentNCI_1 then copied or subset table will look like something:

tblStudent (Clustered index): Sorted on RollNo
RollNo
Name
Country
Age
101
Greg
UK
23
102
Rahul
India
27
103
Scott
USA
31
107
Miyabi
China
18
108
Marry
Russia
27
109
Akaram
Pakistan
22
110
Benazir
Banglades
17
111
Miyabi
Japan
24
112
Sachin
India
21

tblStdentNCI_1 (Non-clustered index): Sorted on Age

RollNo
Age
110
17
107
18
112
21
109
22
101
23
111
24
108
27
102
27
113
31


FAQ about no-clustered index:

Q. What is a non-clustered index?
Answer: It is a copy of clustered index table sorted on some columns. It is a subset of clustered since we can add some or all columns of the clustered index table in the non-clustered index tables.

Q. A table can have how many non-clustered indexes?
Answer: We can create any number of copies of a table. So a table (more accurately clustered index) can have any number of non-clustered indexes. But there is some max limitation. For example in SQL server 2012, 999 non-clustered indexes can be created per table.
   
Q. Does all non-clustered indexes occupy disk spaces?
Answer: Yes

Q. What will happen if data is modified in the base table (clustered index)?
Answer:
If data is inserted, deleted or updated in clustered index same data is also inserted, deleted or updated in all non-clustered indexes.  There is no need to worry about data synchronization between a clustered index and non-clustered indexes since SQL server takes this responsibility.

Q. Does non-clustered indexes are good or bad?

Answer: It may good for select statements or it may bad for insert, delete and update statement  

Non-clustered index in depth:

It is a logical organization of data in a table. A table can have 1023 non-clustered index. A non-clustered index can be of two types as:

1. Heap
2. Based on the clustered index.

If a table has clustered index then leaf node of non-clustered index keeps the key columns of the clustered index. If the table has no clustered index then leaf node of non-clustered index keeps RID which is unique to each row of a table. Such non-clustered index is called heap.

Non-clustered index based on the clustered index: 

Description of above image:

It is the B-tree representation of non-clustered index NCI_Student_Age where lower section of dotted line is also B-tree of clustered index CL_Student_Roll. The top node of non-clustered index is called root node and bottom nodes are called leaf node. All the nodes in between root node and leaf nodes are called intermediate nodes. A B-tree can have many intermediate nodes in case table has too many records. In this example, there is no intermediate node and we have assumed the size of each node is four i.e. each node can keep maximum five Age.

Important points about non-clustered index based on the clustered index:

1. All the nodes keep key columns of a non-clustered index and key column of the unique clustered index.
2. If a clustered index is not unique then instead of a key column of the clustered index it keeps a unique row identifier of the actual data rows of a table.
3. Apart from points above, a leaf node also keeps the included columns of the non-clustered index. (We will discuss included column later) 
Now if we will execute following SQL query:

SELECT * FROM Student WHERE RollNo = 111

Since Student table has clustered index, now it will go for index scan instead of table scan to search the RollNo = 111. A scan will start from the root node in the clustered index CL_Student_Roll.

SELECT Country 
FROM Student WITH (INDEX (NCI_Student_Age))
WHERE Age = 31

Note: Here we have added WITH clause only because SQL server uses the non-clustered index NCI_Student_Age,  otherwise it will not use a non-clustered index or will use if a table has too many records. We will discuss why is it so later.  

Execution plan of above query will look like this:


Meaning of above execution plan:

SQL server will first use non-clustered index NCI_Student_Age and find out the corresponding value of key column i.e. RollNumber of the clustered index (It is called key lookup) using a nested loop and find out the value of column Country.

Scanning steps:



Scan 1: 22 < 31 so it will go to next element of a root node.

Scan 2: 24 < 31 so it will go to the next element of a root node. Since there is not any other element in root node so, it will follow right most link or path.
Scan 3: In leaf node, 24 < 31 o it will go to next element of a leaf node.
Scan 4: In leaf node, 24 < 31 o it will go to next element of a leaf node.

Scan 5: In leaf node, 27 < 31 o it will go to next element of a leaf node.
Scan 6: In leaf node, 27 < 31 o it will go to next element of a leaf node.
Scan 7: In leaf node, 31 = 31. It will get value key column of non clustered index i.e. RollNo which is equal to 109.
Scan 9: From here it will use clustered index CL_Student_Roll to get the values of column Country name of RollNo = 31. In root node of clustered index 107 < 109 so, it will go to next element of a root node.
Scan 10: 110 > 109 so it will go through this link.
Scan 11: In leaf node, 108 < 109 so, it will go to next element of a leaf node.
Scan 12: In leaf node, 107 < 109 so, it will go to next element of a leaf node.
Scan 13: In leaf node, 109 = 109 so it will follow this path and get the actual data that is a value of column Country which is the USA.

Simple syntax of creating a non-clustered index

CREATE [UNIQUE] NONCLUSTERED INDEX <IndexName>
    ON <Table_or_view_name>(<ColumnName_Key>[ASC|DESC] [ ,...n ])
    [INCLUDE (<ColumnName_Include> [,...n])]

<IndexName>: It is any valid name of the index.
<Table_or_view_name>: It is the name of the table or view on which we want to create the non-clustered index.
<ColumnName_Key>: It is the name of the key columns for clustered index separated by a comma. Values of these columns are present in each node of a clustered index. It has following restrictions:

1. We can specify maximum 16 column names.
2. Sum of the size of the columns cannot be more than 900 bytes.
3. All columns must belong to the same table.
4. Data type of columns cannot be ntext, text, varchar (max), nvarchar (max), varbinary (max), xml, or image
5. It cannot be a nondeterministic computed column.

<ColumnName_Include>: Due to above restrictions on <ColumnName_Key> columns, we can include more columns of table or view in the INCLUDE section. Date of include columns is only present in the leaf node of a clustered index. It has also some restrictions:

1. The data type of columns cannot be text, ntext, and image.
2.  A column cannot be present in key columns as well as can not include column at the same time. Also, repetition of columns is not allowed.

Good practice: If possible we should try to add all columns of WHERE clause in key columns list and all columns of a SELECT clause in the INCLUDE columns of a non-clustered index.

Covering of queries:

If we include all the columns of a SELECT statement either as key columns or INCLUDE columns in a non-clustered index such queries are called covered query. For example, consider on a query:

SELECT Name,Country
FROM Student
WHERE RollNo = 108 AND Age = 27

Non-clustered index 1:

CREATE NONCLUSTERED INDEX NCI_1
ON Student (RollNo,Age)

This index doesn't cover the above SELECT statement.

Non-clustered index 2:

CREATE NONCLUSTERED INDEX NCI_2
ON Student (RollNo,Age,Name)

This index doesn't cover the above SELECT statement.

Non-clustered index 3:

CREATE NONCLUSTERED INDEX NCI_3
ON Student (RollNo,Age,Name,Country)

This index covers the above SELECT statement.

Non-clustered index 4:

CREATE NONCLUSTERED INDEX NCI_4
ON Student (RollNo,Age)
INCLUDE (Name,Country)

This index covers the above SELECT statement. Now consider on the last index we have created that is

CREATE NONCLUSTERED INDEX NCI_Student_Age
ON Student (Age)

This index doesn't cover the SQL query:

SELECT Country 
FROM Student
WHERE Age = 31

Due to which SQL server doesn't use the index NCI_Student_Age. Execution plan of this query will look like:


SQL server follows Index scan which is same as table scan and it is a slow process.
If we will use forcefully non-clustered index NCI_Student_Age as

SELECT Country 
FROM Student WITH (INDEX (NCI_Student_Age))
WHERE Age = 31

Its execution plan will look like:




Here the problem is it is partiality index seek a process that is, to get a value of column Country it has to go through (key lookup) the clustered index CL_Student_Roll since non-clustered index doesn't keep the value of column Country. We can optimize this query by creating a new index which covers the SELECT statement.

CREATE NONCLUSTERED INDEX NCI_Student_Country
ON Student (Age)
INCLUDE (Country)

After creating this index, execution plan will look like:


Now there is no key look up and it is fully index seek. Structures of index NCI_Student_Country will look like this:


We can observe key columns of clustered index keep all the nodes while INCLUDE columns are only kept by the leaf node. To perform this query:

SELECT Country 
FROM Student
WHERE Age = 31

SQL server will not use clustered index since data of Country columns are present in non-clustered index NCI_Student_Country. So, there is no need of key lookup.

Non-clustered index based on the heap:

A table without any clustered index is called heap. It is possible to create a non-clustered index on such tables which do not have any clustered index.  The main difference between non-clustered index on clustered index and non-clustered index on the heap is, it keeps the RID (i.e. Row ID) in the leaf node of the non-clustered index instead of a key column of a clustered index which is a pointer to the actual data rows. Suppose we have a Student table which does not have any clustered index. Now we want to create a non-clustered index:

CREATE INDEX NCI_Heap
ON Student (Age)

If we will check the execution plan of this query:

SELECT Country 
FROM Student WITH (INDEX (NCI_Heap)) 
WHERE Age = 31

It will look something like this:




From execution plan, we can observe that it is using RID lookup instead of key lookup.

Note: RID of each row of a table is generated by SQL server using file identifier, page number and number of the row on the page etc.

Organizations of a non-clustered index on the heap will look something like this:


We can remove the RID look up in the execution plan by covering the query in a non-clustered index on the heap that is:

CREATE INDEX NCI_Heap_Country
ON Student (Age)
INCLUDE (Country)

Now execution plan of query:


SELECT Country FROM Student WHERE Age = 31

10 comments:

  1. Anonymous6/06/2012

    it's very nice

    ReplyDelete
  2. Anonymous7/24/2013

    Many Thanks, It is indeed exact Help.

    ReplyDelete
  3. Anonymous9/18/2013

    Nice Post very useful

    ReplyDelete
  4. Anonymous9/20/2013

    Thanks a lot....its really helpful...thanks

    ReplyDelete
  5. Anonymous10/22/2013

    nice explanation

    ReplyDelete
  6. Exact & Expected Answers. Thanks. Please continue writing ..

    ReplyDelete
  7. really appreciated blog.
    many many thanks.....

    ReplyDelete
  8. No 1 blog ever seen...

    ReplyDelete
  9. Thank you. Its really helped me a lot

    ReplyDelete
  10. Hi...Excellent blog about how clustered & non clustered indexes behave and their relationships. It would be great if you could explain how included columns are used with non-clustered indexes and how they differ to covering index.
    Many Thanks..

    ReplyDelete