The original text is reproduced from: http://blog.csdn.net/yangyuankp/article/details/8069325
Enter the question directly, and the rest will not be wordy.
The last blog post was simply implementing "as many matches as possible" for multiple keywords.
But in practical applications, as many matches as possible are not necessarily reasonable.
Take the sentence "how to register users on the CSDN website" as an example, and split it into three words: "CSDN", "registration" and "user". Suppose a record in the database matches the words "CSDN" and "registration", and another record matches the words "registration" and "user". The number of matching words in the two records is two. If only the algorithm of the previous blog is used, the two records are equal, but obviously, they are not equal! The word "CSDN" is crucial in this sentence, he is a prerequisite. Therefore, the records matching the two words "CSDN" and "Registration" take precedence over the records matching the two words "Registration" and "User".
Based on the above facts, let us clearly realize that keywords are not equal, and they should be assigned a weight, and those with higher weights should be given priority.
SQL LIKE statement multi-condition greedy matching weighted improved version :
GO CREATE function Get_StrArrayLength ( @str varchar(1024), -- string to split @split varchar(10) -- separator character ) returns int as begin declare @location int declare @start int declare @length int set @str=ltrim(rtrim(@str)) set @location=charindex(@split,@str) set @length=1 while @location<>0 begin set @start=@location+1 set @location=charindex(@split,@str,@start) set @length=@length+1 end return @length end GO CREATE function Get_StrArrayStrOfIndex ( @str varchar(1024), -- string to split @split varchar(10), -- separator @index int -- take the first few elements ) returns varchar(1024) as begin declare @location int declare @start int declare @next int declare @seed int set @str=ltrim(rtrim(@str)) set @start=1 set @next=1 set @seed=len(@split) set @location=charindex(@split,@str) while @location<>0 and @index>@next begin set @start=@location+@seed set @location=charindex(@split,@str,@start) set @next=@next+1 end if @location =0 select @location =len(@str)+1 --There are two situations here: 1. There is no separator in the string. 2. There is a separator in the string. After jumping out of the while loop, @location is 0, and the default is that there is a separator after the string. return substring(@str,@start,@location-@start) end GO CREATE PROCEDURE proc_Common_SuperLike --The name of the primary key field of the table to be queried @primaryKeyName varchar(999), --The name of the table to query @talbeName varchar (999), --The field name of the table to be queried, that is, the field where the content is located @contentFieldName varchar(999), --The number of query records (TOP *), the more the number of matches, the higher the ranking @selectNumber varchar(999), --match character delimited tokens @splitString varchar(999), -- match character combination string @words varchar(999) AS declare @sqlFirst varchar(999) declare @sqlCenter varchar(999) declare @sqlLast varchar(999) BEGIN set @sqlCenter='' declare @next int declare @arrayLength int set @next=1 set @arrayLength=dbo.Get_StrArrayLength(@words,@splitString) while @next<=@arrayLength begin --Construct sql query conditions (middle part) set @sqlCenter = @sqlCenter+'SELECT '+@primaryKeyName+','+CONVERT(varchar(999),@arrayLength-@next+1)+' AS wordPower FROM '+@talbeName+' WHERE '+@contentFieldName+' like ''%'+dbo.Get_StrArrayStrOfIndex(@words,@splitString,@next)+'%'' UNION ALL ' set @next=@next+1 end --Process the middle part of the sql statement and remove the last useless statement set @sqlCenter=left(@sqlCenter,(len(@sqlCenter)-10)) --Construct the beginning of the sql statement set @sqlFirst='SELECT TOP '+@selectNumber+' '+@primaryKeyName+',COUNT(*)+SUM(wordPower) AS finalPower FROM (' --Construct the end part of the sql statement set @sqlLast=') AS t_Temp GROUP BY '+@primaryKeyName+' ORDER BY finalPower DESC' -- Splice out the complete sql statement and execute it execute(@sqlFirst+@sqlCenter+@sqlLast) END
The calling method is the same as the first version :
execute proc_Common_SuperLike 'id','t_test','content','20','|','i|o|c'
The primary key field name of the id table.
t_test table name.
content matches the content field name.
20 picks 20 records (lower and lower matches from top to bottom).
|Delimiter for keywords.
i|o|c has three keywords i, o, and c, separated by |.
The difference is that keywords have the concept of weight.
Rule: The three keywords i|o|c are weighted in descending order. The weight of i is 3, the weight of o is 2, and the weight of c is 1.
That is to say, put important keywords in the front, and put unimportant keywords in the back, and the higher the weight, the higher the weight.
It should be noted that :
This algorithm uses the comprehensive result of weight + matching number as the judgment standard, instead of blind greed.
For example, there are five keywords a|b|c|d|e. According to the rules, the weights are: a-5, b-4, c-3, d-2, e-1. If a record matches a , b, c three keywords, another record matches b, c, d, e four keywords.
According to the algorithm:
the final weight of the first record is 5(a weight)+4(b weight)+3(c weight)+3(number of matches)=15
The final weight of the second record is 4(b weight)+3( c weight) + 2 (d weight) + 1 (e weight) + 4 (number of matches) = 14.
Therefore, the first record is preferentially selected. Although the number of matches in the first record is not as many as the second, the final weight is high.