Solr query execution order analysis and filter

I. Introduction

  Solr search consists of two operations: to find documents that match the request parameters; these documents are sorted and return the most relevant match documents. By default, documents are sorted according to relevance. This means that, after finding the matching set of documents, need another operation to calculate the relevance of each document matching score.

.Fq two parameters and q

  To effectively find the matching documents and calculate the correlation score documents, Solr will use two parameters: fq and q. fq filter parameter indicates a query, q indicates the query parameters. At first glance these two parameters may not be good to distinguish, because the same query syntax passed to these two parameters will return the same number of documents. Thus, many search requests only a single parameter q. It is understood that the difference between these two parameters, the search can be performed more efficiently.

  Relevance impact

  fq: document matches the query limit.

  . Q: 1 document matched the query limit; 2 provides a correlation algorithm and a list of related terms in the score;.

  Therefore, q parameter can be regarded as a special filter, which tells of terms Solr should be considered in correlation calculations. In view of this difference, the Solr user keyword input by the user will tend to put the parameter q, a filter in a machine-generated parameters fq.

  Cache and speed of execution

  Separated from the main filter query query for two purposes. First, the filter does not contain a query can usually be reused between any keyword search. Therefore, consider the results of the query cache filter in the filter cache. Second, since the relevant score must be calculated on the operations of the document matching the query q each lexical item, then split into a part of the query filter query fq, fq parameters of this section do not need to calculate additional relevance. After this treatment, the query can be used as a query for the relevant portion of the filter score calculation saves a lot of time.

  Specify multiple queries and filters

  Solr request may add as many parameters fq, but only contain one parameter q. For example, two Solr query q = keywords: solr & fq = category: technology & fq = year: 2020 and q = keywords: solr & fq = category: technology AND year: 2020 will return the same result documents in the same order. In addition to use of the cache for each fq fq parameter [parameter] may buffer independently, using a plurality of parameters fq functionally equivalent to the combination of these parameters into a parameter fq. To consider the relevance and caching effects specific choice, choose which parameters and parameter used in the form according to the actual situation.

  Execution order

  From a technical point of view:

  1. Find fq each parameter in the filter buffer. If present, the buffer will be returned DocSet, OpenBitSet encapsulated form, wherein each document in the index corresponds to a bit [0 or 1], to indicate that the document is included in the filter.

  2. If you do not find fq parameters in the filter cache, but the cache is enabled, then the filter will index was filtered to give a new DocSet, so its cache.

  3. All filters do docset intersection] [AND operation, to obtain a DocSet.

  4.q parameters passed together with the filter DocSet, as a Lucene query search. Query execution, Lucene query bypass filters in combination with treatment with the query into a unified filter object ID currently inside an integer []. If the query results and filters the same result object ID, then collect this ID, and the correlation process includes calculation of the matching score of the document.

  5. If the document contains any performing post-filters, which are collected as part of the process, the query made intersection filter processing, only the role of the document query match both combinations and compositions filters.

  So, when the cache is enabled, the filter will precede the main query execution. Queries and filters subsequently executed simultaneously in the collection process, the post-filter as a special filter, has found documents that match a query and then use the filter. Query and filter combination as follows:

  

  It looks complicated, in fact, true. Solr very well be hiding this complexity. However, this process helps to understand the high cost of using the filter performance optimization. Solr provides finer control can be specified that the filter needs to be cached execution sequence, and a filter, comprising a main query before, after, or simultaneously.

III. Filters process too costly

  The filter and a cache bypass the filter of the relevant part of the process, which can greatly save processing time. However, not all situations are the same filter. If you try to search results specified latitude and longitude geographic radius filter, as it involves mathematical calculations, so the computational cost of this filter may be high. So, if you want to generate different filters for tens of thousands of locations, then so much the filter may be difficult for caching. In some applications may need to generate a lot of unique filters, for example, a unique ID filter, causing the filter cache overloaded, resulting in the commonly used filter cache is deleted or search for warm-up time is too long. In this case, those capable of controlling the Solr filter should cache, and determines the execution order of the filter.

  Close filter cache

  In some cases, many filters do not need to be cached. Due to the maximum number of filters, if the most commonly used filter is always in a state cache, the best performance Solr instance. To prevent unimportant filter overload caused by the cache, you can use the syntax off certain filters caching feature:

  fq={!cache=false}id:123&...

  Changing the execution order of the filter

  If the search request includes a plurality of filters, a significant effect of their execution order will query speed. From the general Logically, make the most of the result set to reduce the filter should be executed, because the less the face of the document, filter performs faster. By the same token, perform complex calculations should consider implementing a filter on the list. The less they document processing, computing resources consumed would be relatively less. Takes more cost to the filters, the cost associated with execution by defining the filter, allowing them to perform the Solr rearward. The cost of providing a filter following syntax:

  fq={!cost=1}category:technology&...

  Execution cost is not necessarily continuous, but the relative order to each other. Greater than or equal to 100 is performed to enable post-filter costs!

  Post-filtration

  In some cases, the implementation cost of the filter will be very high, it is desirable to perform it again after all other queries and filters execution. Solr provides a special type of filter, called a post-filter. This filter is used after querying and filtering intersection processing. The cost parameter is defined as a filter, which is a method for converting a filter to a post filter. Execution cost of> = 100 filter are considered post-filtration, using a filter interface performs post.

  Solr post-filter is not be applicable to all types of queries and filters, that only applies to queries and filters used PostFilter interface, Frange having one type of query is the query after filter function. Further, the preparation may be performed after filter plug interface, using a custom filter after the main filter and the query execution.

Guess you like

Origin www.cnblogs.com/yszd/p/12294071.html