Solr search query parser usage analysis and overview

I. Introduction

  Most queries using a standard Solr syntax. This syntax is the most common Solr, the default query parser handles. The default Solr Lucene query parser query parser class implementation] [LuceneQParserPlugin. Lucene query parser fully supports some proprietary extensions Lucene and Solr of grammar.

Two .Lucene query syntax parser

  1. Field Search

    When searching for a value in Solr index, it is generally look in a particular field. Search field syntax is: field name + ':' + searches, for example as follows:

    

    title: solr or title: "apache solr" request_content_split :( process issues) Note: The brackets denote the set, each element separated by a space, meaning default representation OR

    Although not explicitly specified keyword search field approach is very common, large should be noted that the general keyword search on the default fields defined. For example, if the content is defined as the default field] [df = content, the following two queries are equivalent:

    solr or content: solr

    Also note that the expression range of fields and the colon must be clearly defined. The following two queries are equivalent [assume] df = content, but users may be other intentions in the first query.

    title:apache solr 或 title:apache content:solr

    If you search for multiple terms in the same field, use a combination of expression specifying the scope of terms in the search field:

    title:(apache solr)

    If you try to search for a phrase, use quotation marks instead of brackets [] to define the scope of the phrase, although this will cause all terms in the query requires phrase must appear at the same time.

    title:"apache solr"

  2. The essential terms in use [less]

    To specify one or more lexical items appear to be using unary + items to conjunctions. Unless the documents that contain the specified word entries, otherwise no match. If the document matching items must contain more than one word, using the binary operators AND or &&, or use the unary + entry for each word.

    

 

    If the default operator is AND, in the absence of other designated operators, each lexical items are essential requirements. Because each additional item will be essential to further limit the total number of words resulting document set, so you can speed up queries by using a plurality of terms necessary to further optimize the number of results.

  3. Optional lexical items

    Compared necessary to limit the practice field, matching the expansion of the number of documents it is applicable to other cases. The default operator is OR, unless otherwise specified otherwise, each expression is optional. Similarly, using a binary operator OR or || between a plurality of expressions, which represents a document matching entry comprises at least one word.

    

    It is worth noting that the more optional lexical items matching documents will lead to greater diversity, OR operational implementation costs higher than other Boolean operations. For search keywords, if the contents of a limited number, and hope the price was at the expense of precision, ensuring a higher return some results [recall], then consider using OR as the default operator. Because the document more optional lexical items matching usually results in a higher affinity score, use the OR as the default operator and ranked according to relevance scores, it is still possible to get that part of the results of the most relevant search results. However, the requirement to match all the different keywords that expand queries will get more some strange matches.

  4. phrase search [less use, and must be used in the use of the word's field]

    If you want to match multiple lexical items next to each other, use quotation marks to enclose them as a phrase. This query expression does not guarantee a match exactly the same text, the search field may contain a phrase in a term to modify the analyzer. The most reasonable of a particular search phrase should not be matched unrelated phrases. Phrase search process a particular field and multi-word name applies to the content.

  The combination of expressions used []

    To handle any complex Boolean clause, Solr use parentheses to combine query expressions.

    

    Expression compositions may be provided in the context of the expression, for example, indicate a plurality of search words in the same field. Expression can be nested in any combination.

  6. lexical items proximity

    短语搜索是词项相似度搜索的简化版本。通过添加波浪线和词项位置距离数搜索位置相近的词项,不一定是相邻的。

    

    短语搜索是词项位置距离为0的邻近搜索。

    

    词项距离为3表示查询两词之间词项距离<=3的搜索,两词项交换位置相当于移动了两个词项位置。

    指定足够大的有效邻近值,可以匹配出文档中任意位置的词项,这与AND查询效果类似。词项邻近度查询还有一个副作用,在文档中词项越靠近,该邻接查询对应的相关度得分就越高。与组合查询相比,当词项距离较大时,使用邻近词搜索花费成本更高。

  7.字符邻近

    不仅可以在词项之间进行邻近搜索,还可以对词项中的字符进行基于编辑距离的搜索,找到拼写相似的词项。字符邻近搜索的语法与词项邻近搜索类似,由于字符邻近搜索处理的是一个词项,因此不带引号。

    

    1表示与搜索词项最多有一个字符的差距,包括多一个字符,少一个字符和一个字符不一样三种情况。

  8.排除词项

    有时我们需要从查询中明确排除特定词项。在表达式上使用一元运算符-【减号】或在表达式之间使用NOT布尔运算符来排除词项。

    

    或

    

  9.区间搜索【方括号为闭区间,花括号为开区间】

    有时候我们不希望查询表达式只匹配出一个值,而是匹配出值的整个区间。区间可以是数值区间、日期区间或字符串区间。区间搜索能够找到指定的一组值,其语法为字段名加冒号再加一个方括号。

    

    如果没有指定区间的最大值和最小值,则需要对开区间的上限和下线使用通配符*

    

  10.通配符搜索

    有些情况下用户需要对Solr索引中单词或短语的变体进行匹配。对于用户输入的大多数关键词而言,词干提取这类技术让通配符搜索变得没那么重要了,然而对于查找以特定字符集开头的文档或替代单个字符的操作,通配符搜索还是还有用武之地的。

    

  11.权重表达式

    如果表达式后面指定了一个插入号【^】,无论是词项、短语还是组合表达式,都可以调整相关度权重。

    

 

  12.特殊字符转义【分词器字段除外】

    Solr中有些字符是保留字符,也就是说,它们被当做查询语法进行解析,而不是作为搜索词项。包括:

    

 

    如果要搜索保留字符,必须将保留字符用引号括起来,或者使用反斜杠对其进行转义。关键词中处理保留字符的推荐做法是在传入Solr之前去除没有搜索价值的保留字符,或者对它们依次使用反斜杠进行转义。

    

 

    当搜索字段为分词器字段时,保留字符会被分词过滤掉,因此搜索时不加保留字符也可以搜索到!

    

Guess you like

Origin www.cnblogs.com/yszd/p/12330744.html
Recommended