Solr control the number of query conditions

In solr inside, how to reasonably control the number of hits?

In some everyday articles or some information, there are some high-frequency words, and these high-frequency words, while participating in the query, often resulting in a large number of result sets hit.
What does that mean? For example, if we are doing is searching the hotel, there is a name in this field in our index library, there is mostly xxx hotel, if when you search for a xxx search hotel, will be divided into words:
xxx
hotel
and then hit only 10 xxx result set, but the hotel really hit 200,000 result set, so the overall result may have been more than 20 million, the hit caused a large amount of data, on the one hand to display a wealth of information, other On the one hand may cause too much confusion to the user.
We are two important concepts in full-text search under analysis

Precision

Zhao full rate

In Lucene, Solr and ElasticSearch inside the general segmentation of the results of these two rates will make the deployment of a best effect, and the default correlation scoring rule is:

The highest correlation score standing in the front, which is the manifestation of precision
low correlation at the back, which is reflected in the recall of
course, the above conclusion, not a hundred percent correct because, due to the design of the underlying Lucene, may will lead to some strange effects, is the most accurate is not at the top, this issue only about 10% probability, we can index both fields, to avoid this problem, a word, a word regardless of the query time, you can query the two fields together.

Back to the hotel just that question, if you want to search for a now:
Beijing North Chedaogou small village Shili hotel, the situation after the word as follows:

Lane
ditch
north in
a small village
ten
incense
Hotel

Note that the data in the entire index database which contains most of the Search for hotels in Beijing and two words, so this is what almost all the index data out there are queries, although you can also query ranking, but the hit is too big, almost all xxxx Hotel Beijing after more than four, with nothing to search topic, so we can take some strategies to avoid this situation:
Solr default search strategy, relationship or after the term of the word, the final result all set to return, if we changed and, that is an exact match, but one thing is, if it is an exact match, sometimes incomplete word entered by the user will lose the meaning of the full-text search, so we have to take a comprehensive strategy, both to ensure precision, but also to ensure the recall, so as to achieve?

This thing directly with our full-text search framework is not implemented, there is a good idea that we want to search for words, sentences extracted from the trunk, then trunk part in the search, must be hit, if not hit , even if the word of the record and query, little correlation, this method is good, but only how precisely you put forward these precise words in the trunk of a large-scale data inside it? Using machine learning or text mining? The answer is yes can do, just need another design, this is the best solution too many hits search approach.

Another option, which is not the way to cure the symptoms, relatively easy to implement, is to limit the maximum number of matches each word after the term of points, which is like

Lane
ditch
north in
a small village
ten
incense
Hotel
have hit three or more term, I think it more relevant, or there is a limit to the percentage of more than 80% hit, even if this record is not bad. The use of edismax solr can be solved, as follows:

Use edismax , finished in q in
name: Hotel Beijing after xxxxx
written in Raw Query Paramters parameters inside
defType = edismax & mm = 80% 25

Then you can query, mm is the minimum number of matches, can be a fixed value, but also is a percentage, because San Xian is a page in admin solr query, it is necessary to replace the url characters% 25%, in order to send the correct specific information to the server solr's see:

edismax function introduction

Transfer from http://qindongliang.iteye.com/blog/2226905

Guess you like

Origin blog.csdn.net/qq_36209121/article/details/78262606
Recommended