The original text is reproduced from: http://blog.csdn.net/yangyuankp/article/details/8085514
Foreword :
The basic characteristics of the algorithm have been explained in detail in the previous blogs. After continuous improvement and optimization, it is time to return to the warehouse. That is to say, the algorithm has come to an end and will no longer be updated.
As a final solution, a brief summary of the algorithm properties is provided for the convenience of the reader.
Purpose : Mainly used for multi-condition fuzzy matching.
Greedy feature : Return as many records as possible that satisfy the condition.
Weight feature : assign weights to keywords, representing the importance of keywords, and return records with high weights without destroying the greedy feature.
Required keyword specified features : The returned result must contain the specified keyword without destroying the greedy feature and weight feature.
Typical applications : Question-and-answer systems, such as Baidu Questions, JD Commodity Consulting.
After analysis, in the final solution, two versions of the algorithm are provided, which have been encapsulated into stored procedures and functions, and can be directly imported into the database.
Normal version :
Description : SQL-based LIKE statement implementation, easy to use, but limited by the LIKE statement, not suitable for processing large amounts of data. Specifying necessary words speeds up processing.
Scope of use : 10,000-level data volume. If the data volume exceeds 10,000, it will cause slow operation.
How to use : Run the script directly in the query analyzer to import the database.
Call example : execute proc_Common_SuperLike'id','t_test','content','20','|','[i]|o|c'Parameter
description : The primary key field name of the id table. t_test table name. content matches the content field name. 20 picks 20 records (lower and lower matches from top to bottom). |Delimiter for keywords. [i]|o|c has three keywords i, o, and c, separated by |, where i is a necessary word.
GO CREATE function Get_StrArrayLength ( @str varchar(1024), -- string to split @split varchar(10) -- separator character ) returns int as begin declare @location int declare @start int declare @length int set @str=ltrim(rtrim(@str)) set @location=charindex(@split,@str) set @length=1 while @location<>0 begin set @start=@location+1 set @location=charindex(@split,@str,@start) set @length=@length+1 end return @length end GO CREATE function Get_StrArrayStrOfIndex ( @str varchar(1024), -- string to split @split varchar(10), -- separator @index int -- take the first few elements ) returns varchar(1024) as begin declare @location int declare @start int declare @next int declare @seed int set @str=ltrim(rtrim(@str)) set @start=1 set @next=1 set @seed=len(@split) set @location=charindex(@split,@str) while @location<>0 and @index>@next begin set @start=@location+@seed set @location=charindex(@split,@str,@start) set @next=@next+1 end if @location =0 select @location =len(@str)+1 --There are two situations here: 1. There is no separator in the string. 2. There is a separator in the string. After jumping out of the while loop, @location is 0, and the default is that there is a separator after the string. return substring(@str,@start,@location-@start) end GO CREATE PROCEDURE proc_Common_SuperLike --The name of the primary key field of the table to be queried @primaryKeyName varchar(999), --The name of the table to query @talbeName varchar (999), --The field name of the table to be queried, that is, the field where the content is located @contentFieldName varchar(999), --The number of query records (TOP *), the more the number of matches, the higher the ranking @selectNumber varchar(999), --match character delimited tokens @splitString varchar(999), -- match character combination string @words varchar(999) AS declare @sqlFirst varchar(999) declare @sqlCenter varchar(999) declare @sqlLast varchar(999) declare @next int declare @arrayLength int declare @newWords varchar(999) declare @newTable varchar(999) BEGIN set @ newTable = @ talbeName set @newWords=@words set @next=dbo.Get_StrArrayLength(@words,'[') --Determine if necessary words if @next>1 begin set @ newTable = '' --Construct necessary table sql statement while @next>1 begin set @newTable=@newTable+@contentFieldName+' like ''%'+dbo.Get_StrArrayStrOfIndex(dbo.Get_StrArrayStrOfIndex(@words,'[',@next),']',1)+'%'' AND ' set @next=@next-1 end set @ newTable = left (@newTable, (len (@newTable) -4)) --Construct temporary table set @newTable='SELECT * into ##tempTable FROM '+ @talbeName + ' WHERE ' + @newTable execute (@newTable) --specify temporary table set @newTable='##tempTable' --Remove the necessary word tags in the keyword group set @newWords=REPLACE(REPLACE(@words,'[',''),']','') end set @sqlCenter='' set @next=1 set @arrayLength=dbo.Get_StrArrayLength(@newWords,@splitString) while @next<=@arrayLength begin --Construct sql query conditions (middle part) set @sqlCenter = @sqlCenter+'SELECT '+@primaryKeyName+','+CONVERT(varchar(999),@arrayLength-@next+1)+' AS wordPower FROM '+@newTable+' WHERE '+@contentFieldName+' like ''%'+dbo.Get_StrArrayStrOfIndex(@newWords,@splitString,@next)+'%'' UNION ALL ' set @next=@next+1 end --Process the middle part of the sql statement and remove the last useless statement set @sqlCenter=left(@sqlCenter,(len(@sqlCenter)-10)) --Construct the beginning of the sql statement set @sqlFirst='SELECT TOP '+@selectNumber+' '+@primaryKeyName+',COUNT(*)+SUM(wordPower) AS finalPower FROM (' --Construct the end part of the sql statement set @sqlLast=') AS t_Temp GROUP BY '+@primaryKeyName+' ORDER BY finalPower DESC' -- Splice out the complete sql statement and execute it Execute(@sqlFirst+@sqlCenter+@sqlLast) --Determine whether the temporary table exists, delete it if it exists, be sure to delete it! if OBJECT_ID('tempDb..##tempTable') is not null begin drop table ##tempTable end END
Large data volume version :
Description : SQL-based full-text index implementation, more complex to use, but extremely fast execution, suitable for processing large data volumes. Specifying necessary words can slow down processing.
Scope of use : data volume of tens of millions, i3 generation notebook processor, it only takes 2 seconds to query 10 million records.
How to use : Run the script in the query analyzer to import the database, then create a full-text index for the table to be queried, and set the index field to the field to be queried.
Call example : execute proc_Common_SuperLike'id','t_test','content','20','|','[i]|o|c'Parameter
description : The primary key field name of the id table. t_test table name. content matches the content field name. 20 picks 20 records (lower and lower matches from top to bottom). |Delimiter for keywords. [i]|o|c has three keywords i, o, and c, separated by |, where i is a necessary word.
GO CREATE function Get_StrArrayLength ( @str varchar(1024), -- string to split @split varchar(10) -- separator character ) returns int as begin declare @location int declare @start int declare @length int set @str=ltrim(rtrim(@str)) set @location=charindex(@split,@str) set @length=1 while @location<>0 begin set @start=@location+1 set @location=charindex(@split,@str,@start) set @length=@length+1 end return @length end GO CREATE function Get_StrArrayStrOfIndex ( @str varchar(1024), -- string to split @split varchar(10), -- separator @index int -- take the first few elements ) returns varchar(1024) as begin declare @location int declare @start int declare @next int declare @seed int set @str=ltrim(rtrim(@str)) set @start=1 set @next=1 set @seed=len(@split) set @location=charindex(@split,@str) while @location<>0 and @index>@next begin set @start=@location+@seed set @location=charindex(@split,@str,@start) set @next=@next+1 end if @location =0 select @location =len(@str)+1 --There are two situations here: 1. There is no separator in the string. 2. There is a separator in the string. After jumping out of the while loop, @location is 0, and the default is that there is a separator after the string. return substring(@str,@start,@location-@start) end GO CREATE PROCEDURE proc_Common_SuperLike --The name of the primary key field of the table to be queried @primaryKeyName varchar(999), --The name of the table to query @talbeName varchar (999), --The field name of the table to be queried, that is, the field where the content is located @contentFieldName varchar(999), --The number of query records (TOP *), the more the number of matches, the higher the ranking @selectNumber varchar(999), --match character delimited tokens @splitString varchar(999), -- match character combination string @words varchar(999) AS declare @sqlFirst varchar(999) declare @sqlCenter varchar(999) declare @sqlLast varchar(999) declare @next int declare @arrayLength int declare @newTable varchar(999) BEGIN set @ newTable = '' set @sqlCenter='' set @next=1 set @arrayLength=dbo.Get_StrArrayLength(@words,@splitString) while @next<=@arrayLength begin --Construct sql query conditions (middle part) --Determine whether it is a necessary word if CHARINDEX('[',dbo.Get_StrArrayStrOfIndex(@words,@splitString,@next))>0 begin set @sqlCenter = @sqlCenter+'SELECT '+@primaryKeyName+','+CONVERT(varchar(999),@arrayLength-@next+1)+' AS wordPower FROM '+@talbeName+' WHERE CONTAINS(' + @contentFieldName + ',''"*'+REPLACE(REPLACE(dbo.Get_StrArrayStrOfIndex(@words,@splitString,@next),'[',''),']','')+'*"'') UNION ALL ' --Construct necessary words set @newTable=@newTable+'CONTAINS(' + @contentFieldName + ',''"*'+REPLACE(REPLACE(dbo.Get_StrArrayStrOfIndex(@words,@splitString,@next),'[',''),']','')+'*"'') AND ' end else begin set @sqlCenter = @sqlCenter+'SELECT '+@primaryKeyName+','+CONVERT(varchar(999),@arrayLength-@next+1)+' AS wordPower FROM '+@talbeName+' WHERE CONTAINS(' + @contentFieldName + ',''"*'+dbo.Get_StrArrayStrOfIndex(@words,@splitString,@next)+'*"'') UNION ALL ' end set @next=@next+1 end --Determine if necessary words if CHARINDEX('[',@words)>0 begin --- Process the necessary word part and remove the last useless sentence set @ newTable = left (@newTable, (len (@newTable) -4)) set @newTable='AS t_Temp WHERE '+ @primaryKeyName +' IN (SELECT '+@primaryKeyName+' FROM ' + @talbeName+' WHERE ' + @newTable + ')' end else begin set @newTable='AS t_Temp' end --Process the middle part of the sql statement and remove the last useless statement set @sqlCenter=left(@sqlCenter,(len(@sqlCenter)-10)) --Construct the beginning of the sql statement set @sqlFirst='SELECT TOP '+@selectNumber+' '+@primaryKeyName+',COUNT(*)+SUM(wordPower) AS finalPower FROM (' --Construct the end part of the sql statement set @sqlLast=') ' + @newTable + ' GROUP BY '+@primaryKeyName+' ORDER BY finalPower DESC' -- Splice out the complete sql statement and execute it Execute(@sqlFirst+@sqlCenter+@sqlLast) END
Attached-SQL database table full-text index creation guide :
- text index
sp_fulltext_database enable --create
( created as a directory, used to put index files)
CREATE FULLTEXT CATALOG index directory name --for example myFullText --create
- text index
CREATE FULLTEXT INDEX ON table name (field name) -- create a full-text index for which field of which table, for example t_test(content)
KEY INDEX primary key index name ON index directory name -- note that it is the primary key index name, not the primary key field name! For example, PK__t_test__3213E83F0EA330E9; specify the full-text index directory, that is, which directory to put it in, such as myFullText
Note : If there are a large number of records in the database table before creating the full-text index of the database table, it will take time to create the full-text index, so create a full-text index The data may not be found if it is used immediately after indexing.