Transfer: SQL multi-condition fuzzy query solution (similar to Baidu search)

The original text is reproduced from: http://blog.csdn.net/yangyuankp/article/details/8085514

 

Foreword :
The basic characteristics of the algorithm have been explained in detail in the previous blogs. After continuous improvement and optimization, it is time to return to the warehouse. That is to say, the algorithm has come to an end and will no longer be updated.
As a final solution, a brief summary of the algorithm properties is provided for the convenience of the reader.      

Purpose : Mainly used for multi-condition fuzzy matching.
Greedy feature : Return as many records as possible that satisfy the condition.
Weight feature : assign weights to keywords, representing the importance of keywords, and return records with high weights without destroying the greedy feature.
Required keyword specified features : The returned result must contain the specified keyword without destroying the greedy feature and weight feature.
Typical applications : Question-and-answer systems, such as Baidu Questions, JD Commodity Consulting.
 
After analysis, in the final solution, two versions of the algorithm are provided, which have been encapsulated into stored procedures and functions, and can be directly imported into the database.

 

Normal version :

Description : SQL-based LIKE statement implementation, easy to use, but limited by the LIKE statement, not suitable for processing large amounts of data. Specifying necessary words speeds up processing.
Scope of use : 10,000-level data volume. If the data volume exceeds 10,000, it will cause slow operation.
How to use : Run the script directly in the query analyzer to import the database.
Call example : execute proc_Common_SuperLike'id','t_test','content','20','|','[i]|o|c'Parameter
description : The primary key field name of the id table. t_test table name. content matches the content field name. 20 picks 20 records (lower and lower matches from top to bottom). |Delimiter for keywords. [i]|o|c has three keywords i, o, and c, separated by |, where i is a necessary word.

GO
CREATE function Get_StrArrayLength
(
 @str varchar(1024), -- string to split
 @split varchar(10) -- separator character
)
returns int
as
 begin
  declare @location int
  declare @start int
  declare @length int
  set @str=ltrim(rtrim(@str))
  set @location=charindex(@split,@str)
  set @length=1
   while @location<>0
     begin
      set @start=@location+1
      set @location=charindex(@split,@str,@start)
      set @length=@length+1
     end
   return @length
 end
 GO
 CREATE function Get_StrArrayStrOfIndex
(
 @str varchar(1024), -- string to split
 @split varchar(10), -- separator
 @index int -- take the first few elements
)
returns varchar(1024)
as
begin
 declare @location int
 declare @start int
 declare @next int
 declare @seed int
 set @str=ltrim(rtrim(@str))
 set @start=1
 set @next=1
 set @seed=len(@split)
 set @location=charindex(@split,@str)
 while @location<>0 and @index>@next
   begin
    set @start=@location+@seed
    set @location=charindex(@split,@str,@start)
    set @next=@next+1
   end
 if @location =0 select @location =len(@str)+1
 
--There are two situations here: 1. There is no separator in the string. 2. There is a separator in the string. After jumping out of the while loop, @location is 0, and the default is that there is a separator after the string.
 return substring(@str,@start,@location-@start)
end
GO
CREATE PROCEDURE proc_Common_SuperLike
	--The name of the primary key field of the table to be queried
	@primaryKeyName varchar(999),
	--The name of the table to query
	@talbeName varchar (999),
	--The field name of the table to be queried, that is, the field where the content is located
	@contentFieldName varchar(999),
	--The number of query records (TOP *), the more the number of matches, the higher the ranking
	@selectNumber varchar(999),
	--match character delimited tokens
	@splitString varchar(999),
	-- match character combination string
	@words varchar(999)
	
AS
	declare @sqlFirst varchar(999)
	declare @sqlCenter varchar(999)
	declare @sqlLast varchar(999)
	declare @next int  
	declare @arrayLength int
	declare @newWords varchar(999)
	declare @newTable varchar(999)
BEGIN
	set @ newTable = @ talbeName
	set @newWords=@words
	set @next=dbo.Get_StrArrayLength(@words,'[')
	--Determine if necessary words
	if @next>1
	begin
		set @ newTable = ''
		--Construct necessary table sql statement
		while @next>1
		begin
			set @newTable=@newTable+@contentFieldName+' like ''%'+dbo.Get_StrArrayStrOfIndex(dbo.Get_StrArrayStrOfIndex(@words,'[',@next),']',1)+'%'' AND '
			set @next=@next-1
		end
		set @ newTable = left (@newTable, (len (@newTable) -4))
		--Construct temporary table
		set @newTable='SELECT * into ##tempTable FROM '+ @talbeName + ' WHERE ' + @newTable
		execute (@newTable)
		--specify temporary table
		set @newTable='##tempTable'
		--Remove the necessary word tags in the keyword group
		set @newWords=REPLACE(REPLACE(@words,'[',''),']','')
	end
	set @sqlCenter=''
	set @next=1
	set @arrayLength=dbo.Get_StrArrayLength(@newWords,@splitString)

	while @next<=@arrayLength
	begin
		--Construct sql query conditions (middle part)
		set @sqlCenter = @sqlCenter+'SELECT '+@primaryKeyName+','+CONVERT(varchar(999),@arrayLength-@next+1)+' AS wordPower FROM '+@newTable+' WHERE '+@contentFieldName+' like ''%'+dbo.Get_StrArrayStrOfIndex(@newWords,@splitString,@next)+'%'' UNION ALL '
		set @next=@next+1
	end
	--Process the middle part of the sql statement and remove the last useless statement
	set @sqlCenter=left(@sqlCenter,(len(@sqlCenter)-10))
	--Construct the beginning of the sql statement
	set @sqlFirst='SELECT TOP '+@selectNumber+' '+@primaryKeyName+',COUNT(*)+SUM(wordPower) AS finalPower FROM ('
	--Construct the end part of the sql statement
	set @sqlLast=') AS t_Temp GROUP BY '+@primaryKeyName+' ORDER BY finalPower DESC'
	-- Splice out the complete sql statement and execute it
	Execute(@sqlFirst+@sqlCenter+@sqlLast)
	--Determine whether the temporary table exists, delete it if it exists, be sure to delete it!
	if OBJECT_ID('tempDb..##tempTable') is not null
	begin
		drop table ##tempTable
	end
END

 

Large data volume version :

Description : SQL-based full-text index implementation, more complex to use, but extremely fast execution, suitable for processing large data volumes. Specifying necessary words can slow down processing.
Scope of use : data volume of tens of millions, i3 generation notebook processor, it only takes 2 seconds to query 10 million records.
How to use : Run the script in the query analyzer to import the database, then create a full-text index for the table to be queried, and set the index field to the field to be queried.
Call example : execute proc_Common_SuperLike'id','t_test','content','20','|','[i]|o|c'Parameter
description : The primary key field name of the id table. t_test table name. content matches the content field name. 20 picks 20 records (lower and lower matches from top to bottom). |Delimiter for keywords. [i]|o|c has three keywords i, o, and c, separated by |, where i is a necessary word.

GO
CREATE function Get_StrArrayLength
(
 @str varchar(1024), -- string to split
 @split varchar(10) -- separator character
)
returns int
as
 begin
  declare @location int
  declare @start int
  declare @length int
  set @str=ltrim(rtrim(@str))
  set @location=charindex(@split,@str)
  set @length=1
   while @location<>0
     begin
      set @start=@location+1
      set @location=charindex(@split,@str,@start)
      set @length=@length+1
     end
   return @length
 end
 GO
 CREATE function Get_StrArrayStrOfIndex
(
 @str varchar(1024), -- string to split
 @split varchar(10), -- separator
 @index int -- take the first few elements
)
returns varchar(1024)
as
begin
 declare @location int
 declare @start int
 declare @next int
 declare @seed int
 set @str=ltrim(rtrim(@str))
 set @start=1
 set @next=1
 set @seed=len(@split)
 set @location=charindex(@split,@str)
 while @location<>0 and @index>@next
   begin
    set @start=@location+@seed
    set @location=charindex(@split,@str,@start)
    set @next=@next+1
   end
 if @location =0 select @location =len(@str)+1
 
--There are two situations here: 1. There is no separator in the string. 2. There is a separator in the string. After jumping out of the while loop, @location is 0, and the default is that there is a separator after the string.
 return substring(@str,@start,@location-@start)
end
GO
CREATE PROCEDURE proc_Common_SuperLike
	--The name of the primary key field of the table to be queried
	@primaryKeyName varchar(999),
	--The name of the table to query
	@talbeName varchar (999),
	--The field name of the table to be queried, that is, the field where the content is located
	@contentFieldName varchar(999),
	--The number of query records (TOP *), the more the number of matches, the higher the ranking
	@selectNumber varchar(999),
	--match character delimited tokens
	@splitString varchar(999),
	-- match character combination string
	@words varchar(999)
	
AS
	declare @sqlFirst varchar(999)
	declare @sqlCenter varchar(999)
	declare @sqlLast varchar(999)
	declare @next int  
	declare @arrayLength int
	declare @newTable varchar(999)
BEGIN
	set @ newTable = ''
	set @sqlCenter=''
	set @next=1
	set @arrayLength=dbo.Get_StrArrayLength(@words,@splitString)

	while @next<=@arrayLength
	begin
		--Construct sql query conditions (middle part)
		--Determine whether it is a necessary word
		if CHARINDEX('[',dbo.Get_StrArrayStrOfIndex(@words,@splitString,@next))>0
		begin
			set @sqlCenter = @sqlCenter+'SELECT '+@primaryKeyName+','+CONVERT(varchar(999),@arrayLength-@next+1)+' AS wordPower FROM '+@talbeName+' WHERE CONTAINS(' + @contentFieldName + ',''"*'+REPLACE(REPLACE(dbo.Get_StrArrayStrOfIndex(@words,@splitString,@next),'[',''),']','')+'*"'') UNION ALL '
			--Construct necessary words
			set @newTable=@newTable+'CONTAINS(' + @contentFieldName + ',''"*'+REPLACE(REPLACE(dbo.Get_StrArrayStrOfIndex(@words,@splitString,@next),'[',''),']','')+'*"'') AND '
		end
		else
		begin
			set @sqlCenter = @sqlCenter+'SELECT '+@primaryKeyName+','+CONVERT(varchar(999),@arrayLength-@next+1)+' AS wordPower FROM '+@talbeName+' WHERE CONTAINS(' + @contentFieldName + ',''"*'+dbo.Get_StrArrayStrOfIndex(@words,@splitString,@next)+'*"'') UNION ALL '
		end
		
		set @next=@next+1
	end
	--Determine if necessary words
	if CHARINDEX('[',@words)>0
	begin
		--- Process the necessary word part and remove the last useless sentence
		set @ newTable = left (@newTable, (len (@newTable) -4))
		set @newTable='AS t_Temp WHERE '+ @primaryKeyName +' IN (SELECT '+@primaryKeyName+' FROM ' + @talbeName+' WHERE ' + @newTable + ')'
	end
	else
	begin
		set @newTable='AS t_Temp'
	end

	--Process the middle part of the sql statement and remove the last useless statement
	set @sqlCenter=left(@sqlCenter,(len(@sqlCenter)-10))
	--Construct the beginning of the sql statement
	set @sqlFirst='SELECT TOP '+@selectNumber+' '+@primaryKeyName+',COUNT(*)+SUM(wordPower) AS finalPower FROM ('
	--Construct the end part of the sql statement
	set @sqlLast=') ' + @newTable + ' GROUP BY '+@primaryKeyName+' ORDER BY finalPower DESC'
	-- Splice out the complete sql statement and execute it
	Execute(@sqlFirst+@sqlCenter+@sqlLast)
END

 

Attached-SQL database table full-text index creation guide :

- text index
sp_fulltext_database enable --create

( created as a directory, used to put index files)
CREATE FULLTEXT CATALOG index directory name --for example myFullText --create

- text index
CREATE FULLTEXT INDEX ON table name (field name) -- create a full-text index for which field of which table, for example t_test(content)

KEY INDEX primary key index name ON index directory name -- note that it is the primary key index name, not the primary key field name! For example, PK__t_test__3213E83F0EA330E9; specify the full-text index directory, that is, which directory to put it in, such as myFullText
 
Note : If there are a large number of records in the database table before creating the full-text index of the database table, it will take time to create the full-text index, so create a full-text index The data may not be found if it is used immediately after indexing.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326057666&siteId=291194637