Feb 1, 2013

FIND_IN_SET not using index with comma delimited string in Mysql

In Mysql it is very difficult to optimize if filter key column is in a comma delimited string. We can filter the records by using FIND_IN_SET function but issue with this is, field which is used in this function are not used by the index. In this post we will discuss how to force Mysql to use index with csv data in a tricky way.  First of all I'm explaining the issue by an example:

Let's assume we have table say tblEmp.

CREATE TABLE tblEmployee(
     ntEmpID BIGINT PRIMARY KEY AUTO_INCREMENT,
     vcEmpName VARCHAR(100),
     ntAge INT,
     dtDateOfJoin DATETIME
);

Now I'm going to insert some records into it:

INSERT INTO tblEmployee(vcEmpName,ntAge,dtDateOfJoin)
SELECT
CONCAT('Emp_' ,A.ID),
40 - A.Sortlen,
NOW()
FROM INFORMATION_SCHEMA.COLLATIONS A,
     INFORMATION_SCHEMA.COLLATIONS B,
     INFORMATION_SCHEMA.COLLATIONS C
LIMIT 1000000;

Now I would like to filter the records from tblEmployee while ntEmpID are in a comma delimited string:

SELECT *
FROM tblEmployee
WHERE FIND_IN_SET(ntEmpID,'500000,500001,500002,500003,500004,500005,500006,500007,500008,500009,500010,500011,500012,500013,500014,500015,500016,500017,500018,500019,500020,500021,500022,500023,500024,500025,500026,500027,500028,500029,500030,500031,500032,500033,500034,500035,500036,500037,500038,500039,500040,500041,500042,500043,500044,500045,500046,500047,500048,500049,500050,500051,500052,500053,500054,500055,500056,500057,500058,500059,500060,500061,500062,500063,500064,500065,500066,500067,500068,500069,500070,500071,500072,500073,500074,500075,500076,500077,500078,500079,500080,500081,500082,500083,500084,500085,500086,500087,500088,500089,500090,500091,500092,500093,500094,500095,500096,500097,500098,500099,500100');

When I executed above query in my local system it took around 7 seconds.

While if same value is passed in the IN clause, for example:

SELECT *
FROM tblEmployee
WHERE ntEmpID IN (500000,500001,500002,500003,500004,500005,500006,500007,500008,500009,500010,500011,500012,500013,500014,500015,500016,500017,500018,500019,500020,500021,500022,500023,500024,500025,500026,500027,500028,500029,500030,500031,500032,500033,500034,500035,500036,500037,500038,500039,500040,500041,500042,500043,500044,500045,500046,500047,500048,500049,500050,500051,500052,500053,500054,500055,500056,500057,500058,500059,500060,500061,500062,500063,500064,500065,500066,500067,500068,500069,500070,500071,500072,500073,500074,500075,500076,500077,500078,500079,500080,500081,500082,500083,500084,500085,500086,500087,500088,500089,500090,500091,500092,500093,500094,500095,500096,500097,500098,500099,500100);

It is taking only 0.016 seconds in my local system.  Here there is significant difference in the execution time for same requirement. But problem is we cannot use IN clause since our data is in csv string format :(

Why there is so difference. I now we are comparing the query execution plan by using explain statement:

Explain of first query (Using FIND_IN_SET):
ID
Select_Type
Table
Type
Key
Key_Len
Ref
Rows
Extra
1
Simple
tblEmployee
All
997906
Using Where

Explain of second query (Using IN):
ID
Select_Type
Table
Type
Key
Key_Len
Ref
Rows
Extra
1
Simple
tblEmployee
Range
Primary
8
101
Using Where
    
We can observe the difference. First query has to check 997906 rows while second query has to check only 101 rows to fetch only 101 records. This difference is due to second query is using PRIMARY index that is clustered index(Its key is ntEmpID) while first query is not able to.

After some effort I found out a trick to use index while filter data is in a comma delimited string.

Step 1: Create a table. Say tblIndex:

CREATE TABLE tblIndex(
ntIndex INTEGER PRIMARY KEY AUTO_INCREMENT
);

Step 2: Insert some records into it according to max possible keys in the comma delimited string. In My case comma delimited keys string cannot be more than 400.  

INSERT INTO tblIndex(ntIndex)
SELECT NULL
FROM INFORMATION_SCHEMA.COLLATIONS A,
     INFORMATION_SCHEMA.COLLATIONS B  
LIMIT 400;

Step 3: Write same query in the following way:

SELECT E.*
FROM tblEmployee E
INNER JOIN tblIndex
ON ntEmpID = (SUBSTRING_INDEX(SUBSTRING_INDEX(@vcKeyValue, ',', ntIndex), ',', -1))
WHERE ntIndex <= + LENGTH(@vcKeyValue) - LENGTH(REPLACE(@vcKeyValue,',',''));

When I executed this query in my local system it took only 0.032 seconds :)

It has solved my problem.  Now check the explain statement of this query:

ID
Select_Type
Table
Type
Key
Key_Len
Ref
Rows
Extra
1
Simple
TblIndex
Range
Primary
4
NULL
102
Using Where;Using Index
1
Simple
tblEmployee
Eq_Ref
Primary
8
Func
1
Using Where

Now query is using Primary index!!!

14 comments:

  1. it is good tutorials

    ReplyDelete
  2. Awesome blog thanks for sharing While choosing your perfect ride for driving, Accord Cars comes with and the best packages for you to pick from. Car rentals for self drive in Chennai are done the easier. Just pick out your plan from hourly, daily, weekly and even monthly plans available.

    ReplyDelete
  3. Very interesting blog thanks for sharing At Pearl’s - The best Bridal Makeup Parlour in Chennai, we take personal responsibility in making sure that you look as flawless and beautiful and the marriage that you have been dreaming of. With around 16,000 successful brides in our books, you can be confident that we know our art intimately and deep.

    ReplyDelete
  4. Great blog thanks for sharing Looking for the best creative agency to fuel new brand ideas? Adhuntt Media is not just a digital marketing company in chennai. We specialize in revamping your brand identity to drive in best traffic that converts.

    ReplyDelete
  5. Nice blog thanks for sharing Is this a special day for you? Beautiful and fragrant flowers are sure to make it even more amazing of a day no doubt. This is why Karuna Nursery Gardens offers you the best rental plants in Chennai that too at drop dead prices.

    ReplyDelete
  6. Excellent blog thanks for sharing Run your salon business successfully by tying up with the best beauty shop in Chennai - The Pixies Beauty Shop. With tons of prestigious brands to choose from, and amazing offers we’ll have you amazed.

    ReplyDelete
  7. Awesome blog thankks for sharing 100% virgin Remy Hair Extension in USA, importing from India. Premium and original human hair without joints and bondings. Available in Wigs, Frontal, Wavy, Closure, Bundle, Curly, straight and customized color hairstyles Extensions.

    ReplyDelete
  8. Very useful blog thanks for sharing IndPac India the German technology Packaging and sealing machines in India is the leading manufacturer and exporter of Packing Machines in India.

    ReplyDelete