In Mysql it is very difficult to optimize if filter key column is in a comma delimited string. We can filter the records by using FIND_IN_SET function but issue with this is, field which is used in this function are not used by the index. In this post we will discuss how to force Mysql to use index with csv data in a tricky way. First of all I'm explaining the issue by an example:
Let's assume we have table say tblEmp.
CREATE TABLE tblEmployee(
ntEmpID BIGINT PRIMARY KEY AUTO_INCREMENT,
vcEmpName VARCHAR(100),
ntAge INT,
dtDateOfJoin DATETIME
);
Now I'm going to insert
some records into it:
INSERT INTO tblEmployee(vcEmpName,ntAge,dtDateOfJoin)
SELECT
CONCAT('Emp_' ,A.ID),
40 - A.Sortlen,
NOW()
FROM INFORMATION_SCHEMA.COLLATIONS A,
INFORMATION_SCHEMA.COLLATIONS B,
INFORMATION_SCHEMA.COLLATIONS C
LIMIT 1000000;
Now I would like to filter the records from tblEmployee while ntEmpID are in a comma delimited string:
SELECT *
FROM tblEmployee
WHERE FIND_IN_SET(ntEmpID,'500000,500001,500002,500003,500004,500005,500006,500007,500008,500009,500010,500011,500012,500013,500014,500015,500016,500017,500018,500019,500020,500021,500022,500023,500024,500025,500026,500027,500028,500029,500030,500031,500032,500033,500034,500035,500036,500037,500038,500039,500040,500041,500042,500043,500044,500045,500046,500047,500048,500049,500050,500051,500052,500053,500054,500055,500056,500057,500058,500059,500060,500061,500062,500063,500064,500065,500066,500067,500068,500069,500070,500071,500072,500073,500074,500075,500076,500077,500078,500079,500080,500081,500082,500083,500084,500085,500086,500087,500088,500089,500090,500091,500092,500093,500094,500095,500096,500097,500098,500099,500100');
When I executed above
query in my local system it took around 7 seconds.
While if same value is
passed in the IN clause, for example:
SELECT *
FROM tblEmployee
WHERE ntEmpID IN (500000,500001,500002,500003,500004,500005,500006,500007,500008,500009,500010,500011,500012,500013,500014,500015,500016,500017,500018,500019,500020,500021,500022,500023,500024,500025,500026,500027,500028,500029,500030,500031,500032,500033,500034,500035,500036,500037,500038,500039,500040,500041,500042,500043,500044,500045,500046,500047,500048,500049,500050,500051,500052,500053,500054,500055,500056,500057,500058,500059,500060,500061,500062,500063,500064,500065,500066,500067,500068,500069,500070,500071,500072,500073,500074,500075,500076,500077,500078,500079,500080,500081,500082,500083,500084,500085,500086,500087,500088,500089,500090,500091,500092,500093,500094,500095,500096,500097,500098,500099,500100);
It is taking only 0.016 seconds in my local system. Here there is significant difference in the execution time for same requirement. But problem is we cannot use IN clause since our data is in csv string format :(
Why there is so
difference. I now we are comparing the query execution plan by using explain
statement:
Explain of first query
(Using FIND_IN_SET):
ID
|
Select_Type
|
Table
|
Type
|
Key
|
Key_Len
|
Ref
|
Rows
|
Extra
|
1
|
Simple
|
tblEmployee
|
All
|
997906
|
Using Where
|
Explain of second query
(Using IN):
ID
|
Select_Type
|
Table
|
Type
|
Key
|
Key_Len
|
Ref
|
Rows
|
Extra
|
1
|
Simple
|
tblEmployee
|
Range
|
Primary
|
8
|
101
|
Using Where
|
We can observe the difference.
First query has to check 997906 rows while second query has to check only 101
rows to fetch only 101 records. This difference is due to second query is using PRIMARY index that is clustered index(Its key is ntEmpID) while first query is not able to.
After some effort I
found out a trick to use index while filter data is in a comma delimited
string.
Step 1: Create a table. Say tblIndex:
CREATE TABLE tblIndex(
ntIndex INTEGER PRIMARY KEY AUTO_INCREMENT
);
Step 2: Insert some records
into it according to max possible keys in the comma delimited string. In My case
comma delimited keys string cannot be more than 400.
INSERT INTO tblIndex(ntIndex)
SELECT NULL
FROM INFORMATION_SCHEMA.COLLATIONS A,
INFORMATION_SCHEMA.COLLATIONS B
LIMIT 400;
Step 3: Write same query
in the following way:
SELECT E.*
FROM tblEmployee E
INNER JOIN tblIndex
ON ntEmpID = (SUBSTRING_INDEX(SUBSTRING_INDEX(@vcKeyValue, ',', ntIndex), ',', -1))
WHERE ntIndex <= 1 + LENGTH(@vcKeyValue) - LENGTH(REPLACE(@vcKeyValue,',',''));
When I executed this
query in my local system it took only 0.032 seconds :)
It has solved my
problem. Now check the explain statement of this query:
ID
|
Select_Type
|
Table
|
Type
|
Key
|
Key_Len
|
Ref
|
Rows
|
Extra
|
1
|
Simple
|
TblIndex
|
Range
|
Primary
|
4
|
NULL
|
102
|
Using Where;Using Index
|
1
|
Simple
|
tblEmployee
|
Eq_Ref
|
Primary
|
8
|
Func
|
1
|
Using Where
|
Now query is using
Primary index!!!
2 comments:
it is good tutorials
Thanks for sharing hi-tech graphics designer is always awake to impressively make your content visible in the social media market in the best package among the best graphics designing agencies in Chennai.
Post a Comment