This post explains how to perform
search data from table just like Google search engine in SQL Server which
includes at least:
1.
Correct spelling (typo mistakes)
2.
Rank result on the basis of nearest match
3.
Ignore noise words. For examples, special characters, articles (a, an, the)
etc.
4.
Expand search considering verb forms (go, went, gone...).
Step 1: Creating a table
on which we have to perform search
CREATE TABLE University (
UniversityID
BIGINT IDENTITY
,UniversityName VARCHAR(2000)
CONSTRAINT PK_UniversityID PRIMARY
KEY (UniversityID)
)
Step 2: Populating some records into it:
INSERT INTO University
VALUES ('Massachusetts Institute of Technology (MIT)')
,('University of Cambridge')
,('Imperial College London')
,('Harvard University')
,('UCL (University College
London)')
,('Stanford University')
,('California Institute of
Technology (Caltech)')
,('Princeton University')
,('Yale University')
Step 3: For this full text index
must be installed in SQL Server. To verify:
SELECT FULLTEXTSERVICEPROPERTY('IsFullTextInstalled')
Output should be one.
Step 4: Creating full text catalog
and create full text index on search field of university table:
CREATE FULLTEXT CATALOG [Cat_University_Search]
WITH ACCENT_SENSITIVITY
= OFF AS DEFAULT
CREATE FULLTEXT INDEX ON [dbo].[University] ([UniversityName])
KEY INDEX PK_UniversityID ON ([Cat_University_Search])
WITH (
CHANGE_TRACKING = AUTO
,STOPLIST = SYSTEM
)
Step 5: Creating a utility
function which will be used to correct spelling latter:
CREATE FUNCTION [dbo].[Similar] (
@Text1 VARCHAR(50)
,@Text2 VARCHAR(50)
)
RETURNS SMALLINT
AS
BEGIN
DECLARE @Counter AS TINYINT = 1
DECLARE @Score AS SMALLINT = 0
DECLARE @Char AS CHAR(1) = SUBSTRING(@Text1, @Counter, 1)
WHILE @Char <> ''
BEGIN
IF @Char = SUBSTRING(@Text2, @Counter, 1)
IF @Counter = 1
SET @Score += 5
ELSE
SET @Score += 4
ELSE IF @Char = SUBSTRING(@Text2, @Counter - 1, 1)
OR @Char = SUBSTRING(@Text2, @Counter + 1, 1)
SET @Score += 2
ELSE
SET @Score -= 1
SET @Counter += 1
SET @Char = SUBSTRING(@Text1, @Counter, 1)
END
IF LEN(@Text2) = LEN(@Text1)
SET @Score += 2
ELSE
SET @Score = @Score - ABS((LEN(@Text1) - LEN(@Text2)))
RETURN @Score
END
Step 6: Creating final table
valued function to perform search like Google:
CREATE FUNCTION [dbo].[AdvanceSearch] (@SearchText AS VARCHAR(500))
RETURNS @Result TABLE (
UniversityID
BIGINT
,UniversityName VARCHAR(500)
,SortOrder INT
)
AS
BEGIN
DECLARE @Term AS VARCHAR(100)
DECLARE @Index AS INT
DECLARE @MaxPoint AS INT
DECLARE @CorrectSearchText AS
VARCHAR(500)
WHILE LEN(@SearchText) > 0
BEGIN
SET @Index = CASE
WHEN CHARINDEX(' ', @SearchText) = 0
THEN LEN(@SearchText)
ELSE CHARINDEX(' ', @SearchText)
END
SET @Term = LEFT(@SearchText,
@Index - 1)
SET @SearchText = STUFF(@SearchText, 1, @Index, '')
IF EXISTS (
SELECT *
FROM sys.dm_fts_index_keywords(DB_ID(), OBJECT_ID('University'))
WHERE display_term =
@Term
)
SET @CorrectSearchText =
ISNULL(@CorrectSearchText
+ ' ', '') + @Term
ELSE IF LEN(@Term) < 3
OR EXISTS (
SELECT *
FROM sys.fulltext_system_stopwords
WHERE language_id =
1033
AND stopword = @Term
)
CONTINUE
ELSE
BEGIN
SELECT TOP (3) @MaxPoint = CASE
WHEN @MaxPoint IS NULL
THEN dbo.[Similar](display_term, @Term)
ELSE @MaxPoint
END
,@CorrectSearchText =
ISNULL(@CorrectSearchText
+ ' ', '') + CASE
WHEN dbo.[Similar](display_term, @Term) > (ISNULL(@MaxPoint, 0) * 0.8)
THEN display_term
ELSE ''
END
FROM sys.dm_fts_index_keywords(DB_ID(), OBJECT_ID('University'))
WHERE LEN(display_term) BETWEEN LEN(@Term) - 2
AND LEN(@Term) + 2
AND DIFFERENCE(display_term, @Term) > 2
ORDER BY dbo.[Similar](display_term, @Term) DESC
END
END
INSERT INTO @Result
SELECT TOP (5) U.UniversityID
,U.UniversityName
,FTT.[Rank]
FROM University U
INNER JOIN FREETEXTTABLE(University, UniversityName,
@CorrectSearchText) FTT ON U.UniversityID = FTT.[Key]
RETURN
END
Now we have done it. To test it:
SELECT *
FROM dbo.[AdvanceSearch]('Londo Impeiral Collage')
ORDER BY SortOrder DESC
Output:
UniversityID
|
UniversityName
|
SortOrder
|
3
|
Imperial College London
|
98
|
5
|
UCL (University College London)
|
54
|