Nov 9, 2015

Google like advanced search query spelling mistake implement in SQL Server

This post explains how to perform search data from table just like Google search engine in SQL Server which includes at least:

1. Correct spelling (typo mistakes)
2. Rank result on the basis of nearest match
3. Ignore noise words. For examples, special characters, articles (a, an, the) etc.
4. Expand search considering verb forms (go, went, gone...).

Step 1:  Creating a table on which we have to perform search
   
CREATE TABLE University (
       UniversityID BIGINT IDENTITY
       ,UniversityName VARCHAR(2000)
       CONSTRAINT PK_UniversityID PRIMARY KEY (UniversityID)
)

Step 2: Populating some records into it:

INSERT INTO University
VALUES ('Massachusetts Institute of Technology (MIT)')
       ,('University of Cambridge')
       ,('Imperial College London')
       ,('Harvard University')
       ,('UCL (University College London)')
       ,('Stanford University')
       ,('California Institute of Technology (Caltech)')
       ,('Princeton University')
       ,('Yale University')

Step 3: For this full text index must be installed in SQL Server. To verify:

SELECT FULLTEXTSERVICEPROPERTY('IsFullTextInstalled')

Output should be one.
Step 4: Creating full text catalog and create full text index on search field of university table:

CREATE FULLTEXT CATALOG [Cat_University_Search]
       WITH ACCENT_SENSITIVITY = OFF AS DEFAULT

CREATE FULLTEXT INDEX ON [dbo].[University] ([UniversityName])
KEY INDEX PK_UniversityID ON ([Cat_University_Search])
WITH (
       CHANGE_TRACKING = AUTO
       ,STOPLIST = SYSTEM
)

Step 5: Creating a utility function which will be used to correct spelling latter:

CREATE FUNCTION [dbo].[Similar] (
       @Text1 VARCHAR(50)
       ,@Text2 VARCHAR(50)
       )
RETURNS SMALLINT
AS
BEGIN
       DECLARE @Counter AS TINYINT = 1
       DECLARE @Score AS SMALLINT = 0
       DECLARE @Char AS CHAR(1) = SUBSTRING(@Text1, @Counter, 1)

       WHILE @Char <> ''
       BEGIN
              IF @Char = SUBSTRING(@Text2, @Counter, 1)
                     IF @Counter = 1
                           SET @Score += 5
                     ELSE
                           SET @Score += 4
              ELSE IF @Char = SUBSTRING(@Text2, @Counter - 1, 1)
                     OR @Char = SUBSTRING(@Text2, @Counter + 1, 1)
                     SET @Score += 2
              ELSE
                     SET @Score -= 1

              SET @Counter += 1
              SET @Char = SUBSTRING(@Text1, @Counter, 1)
       END

       IF LEN(@Text2) = LEN(@Text1)
              SET @Score += 2
       ELSE
              SET @Score = @Score - ABS((LEN(@Text1) - LEN(@Text2)))

       RETURN @Score
END

Step 6: Creating final table valued function to perform search like Google:

CREATE FUNCTION [dbo].[AdvanceSearch] (@SearchText AS VARCHAR(500))
RETURNS @Result TABLE (
       UniversityID BIGINT
       ,UniversityName VARCHAR(500)
       ,SortOrder INT
       )
AS
BEGIN
       DECLARE @Term AS VARCHAR(100)
       DECLARE @Index AS INT
       DECLARE @MaxPoint AS INT
       DECLARE @CorrectSearchText AS VARCHAR(500)

       WHILE LEN(@SearchText) > 0
       BEGIN
              SET @Index = CASE
                           WHEN CHARINDEX(' ', @SearchText) = 0
                                  THEN LEN(@SearchText)
                           ELSE CHARINDEX(' ', @SearchText)
                           END
              SET @Term = LEFT(@SearchText, @Index - 1)
              SET @SearchText = STUFF(@SearchText, 1, @Index, '')

              IF EXISTS (
                           SELECT *
                           FROM sys.dm_fts_index_keywords(DB_ID(), OBJECT_ID('University'))
                           WHERE display_term = @Term
                           )
                     SET @CorrectSearchText = ISNULL(@CorrectSearchText + ' ', '') + @Term
              ELSE IF LEN(@Term) < 3
                     OR EXISTS (
                           SELECT *
                           FROM sys.fulltext_system_stopwords
                           WHERE language_id = 1033
                                  AND stopword = @Term
                           )
                     CONTINUE
              ELSE
              BEGIN
                     SELECT TOP (3) @MaxPoint = CASE
                                  WHEN @MaxPoint IS NULL
                                         THEN dbo.[Similar](display_term, @Term)
                                  ELSE @MaxPoint
                                  END
                           ,@CorrectSearchText = ISNULL(@CorrectSearchText + ' ', '') + CASE
                                  WHEN dbo.[Similar](display_term, @Term) > (ISNULL(@MaxPoint, 0) * 0.8)
                                         THEN display_term
                                  ELSE ''
                                  END
                     FROM sys.dm_fts_index_keywords(DB_ID(), OBJECT_ID('University'))
                     WHERE LEN(display_term) BETWEEN LEN(@Term) - 2
                                  AND LEN(@Term) + 2
                           AND DIFFERENCE(display_term, @Term) > 2
                     ORDER BY dbo.[Similar](display_term, @Term) DESC
              END
       END

       INSERT INTO @Result
       SELECT TOP (5) U.UniversityID
              ,U.UniversityName
              ,FTT.[Rank]
       FROM University U
       INNER JOIN FREETEXTTABLE(University, UniversityName, @CorrectSearchText) FTT ON U.UniversityID = FTT.[Key]

       RETURN
END

Now we have done it. To test it:

SELECT *
FROM dbo.[AdvanceSearch]('Londo Impeiral Collage')
ORDER BY SortOrder DESC

Output:

UniversityID
UniversityName
SortOrder
3
Imperial College London
98
5
UCL (University College London)
54

No comments:

Post a Comment