Turn: SQL LIKE statement multi-condition greedy weighted matching algorithm (improved version)

The original text is reproduced from: http://blog.csdn.net/yangyuankp/article/details/8069325

 

Enter the question directly, and the rest will not be wordy.

The last blog post was simply implementing "as many matches as possible" for multiple keywords.

But in practical applications, as many matches as possible are not necessarily reasonable.

Take the sentence "how to register users on the CSDN website" as an example, and split it into three words: "CSDN", "registration" and "user". Suppose a record in the database matches the words "CSDN" and "registration", and another record matches the words "registration" and "user". The number of matching words in the two records is two. If only the algorithm of the previous blog is used, the two records are equal, but obviously, they are not equal! The word "CSDN" is crucial in this sentence, he is a prerequisite. Therefore, the records matching the two words "CSDN" and "Registration" take precedence over the records matching the two words "Registration" and "User".

Based on the above facts, let us clearly realize that keywords are not equal, and they should be assigned a weight, and those with higher weights should be given priority.
SQL LIKE statement multi-condition greedy matching weighted improved version :

GO
CREATE function Get_StrArrayLength
(
 @str varchar(1024), -- string to split
 @split varchar(10) -- separator character
)
returns int
as
 begin
  declare @location int
  declare @start int
  declare @length int
  set @str=ltrim(rtrim(@str))
  set @location=charindex(@split,@str)
  set @length=1
   while @location<>0
     begin
      set @start=@location+1
      set @location=charindex(@split,@str,@start)
      set @length=@length+1
     end
   return @length
 end
 GO
 CREATE function Get_StrArrayStrOfIndex
(
 @str varchar(1024), -- string to split
 @split varchar(10), -- separator
 @index int -- take the first few elements
)
returns varchar(1024)
as
begin
 declare @location int
 declare @start int
 declare @next int
 declare @seed int
 set @str=ltrim(rtrim(@str))
 set @start=1
 set @next=1
 set @seed=len(@split)
 set @location=charindex(@split,@str)
 while @location<>0 and @index>@next
   begin
    set @start=@location+@seed
    set @location=charindex(@split,@str,@start)
    set @next=@next+1
   end
 if @location =0 select @location =len(@str)+1
 
--There are two situations here: 1. There is no separator in the string. 2. There is a separator in the string. After jumping out of the while loop, @location is 0, and the default is that there is a separator after the string.
 return substring(@str,@start,@location-@start)
end
GO
CREATE PROCEDURE proc_Common_SuperLike
	--The name of the primary key field of the table to be queried
	@primaryKeyName varchar(999),
	--The name of the table to query
	@talbeName varchar (999),
	--The field name of the table to be queried, that is, the field where the content is located
	@contentFieldName varchar(999),
	--The number of query records (TOP *), the more the number of matches, the higher the ranking
	@selectNumber varchar(999),
	--match character delimited tokens
	@splitString varchar(999),
	-- match character combination string
	@words varchar(999)
	
AS
	declare @sqlFirst varchar(999)
	declare @sqlCenter varchar(999)
	declare @sqlLast varchar(999)
BEGIN
	set @sqlCenter=''
	declare @next int  
	declare @arrayLength int
	set @next=1
	set @arrayLength=dbo.Get_StrArrayLength(@words,@splitString)
	while @next<=@arrayLength
	begin
		--Construct sql query conditions (middle part)
		set @sqlCenter = @sqlCenter+'SELECT '+@primaryKeyName+','+CONVERT(varchar(999),@arrayLength-@next+1)+' AS wordPower FROM '+@talbeName+' WHERE '+@contentFieldName+' like ''%'+dbo.Get_StrArrayStrOfIndex(@words,@splitString,@next)+'%'' UNION ALL '
		set @next=@next+1
	end
	--Process the middle part of the sql statement and remove the last useless statement
	set @sqlCenter=left(@sqlCenter,(len(@sqlCenter)-10))
	--Construct the beginning of the sql statement
	set @sqlFirst='SELECT TOP '+@selectNumber+' '+@primaryKeyName+',COUNT(*)+SUM(wordPower) AS finalPower FROM ('
	--Construct the end part of the sql statement
	set @sqlLast=') AS t_Temp GROUP BY '+@primaryKeyName+' ORDER BY finalPower DESC'
	-- Splice out the complete sql statement and execute it
	execute(@sqlFirst+@sqlCenter+@sqlLast)
END

 

The calling method is the same as the first version :

execute proc_Common_SuperLike 'id','t_test','content','20','|','i|o|c'

The primary key field name of the id table.
t_test table name.
content matches the content field name.
20 picks 20 records (lower and lower matches from top to bottom).
|Delimiter for keywords.
i|o|c has three keywords i, o, and c, separated by |.
 
The difference is that keywords have the concept of weight.
Rule: The three keywords i|o|c are weighted in descending order. The weight of i is 3, the weight of o is 2, and the weight of c is 1.
That is to say, put important keywords in the front, and put unimportant keywords in the back, and the higher the weight, the higher the weight.

 

It should be noted that :

This algorithm uses the comprehensive result of weight + matching number as the judgment standard, instead of blind greed.
For example, there are five keywords a|b|c|d|e. According to the rules, the weights are: a-5, b-4, c-3, d-2, e-1. If a record matches a , b, c three keywords, another record matches b, c, d, e four keywords.

According to the algorithm:
the final weight of the first record is 5(a weight)+4(b weight)+3(c weight)+3(number of matches)=15
The final weight of the second record is 4(b weight)+3( c weight) + 2 (d weight) + 1 (e weight) + 4 (number of matches) = 14.
Therefore, the first record is preferentially selected. Although the number of matches in the first record is not as many as the second, the final weight is high.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326057683&siteId=291194637