ElasticSearch structured search and full-text search

https://segmentfault.com/a/1190000019753737?utm_source=tag-newest

 

1, structured search

Find the exact value of 1.1

Filters are important because they perform very fast, it does not calculate correlation (skip the entire scoring stage) and can easily be cached. Please use filtering queries as much as possible.

term query looks for the exact values we specify. As its own, term queries are simple. It accepts a field name and value we want to look for:
{

"term" : {
    "price" : 20
}

}

When usually find a precise value, we do not want to query score is calculated. Only hope that the document is calculated to include or exclude, so we will use constant_score query to a non-scoring models to perform queries and to a term as a unified score. Because when the query does not need to calculate the score, so a constant_score looking for a way faster.
The end result is a combination of constant_score query that contains a term query:

GET /my_store/products/_search
{
  "query" : {
      "constant_score" : { 
          "filter" : {
              "term" : { 
                  "price" : 20 } } } } }

1.2 combined filter

1.2.1 Boolean filters

{
   "bool" : {
      "must" :     [],
      "should" :   [],
      "must_not" : [],
   }
}

must
所有的语句都 必须(must) 匹配,与 AND 等价。 must_not 所有的语句都 不能(must not) 匹配,与 NOT 等价。 should 至少有一个语句要匹配,与 OR 等价。 GET /my_store/products/_search { "query" : { "filtered" : { "filter" : { "bool" : { "should" : [ { "term" : {"price" : 20}}, { "term" : {"productID" : "XHDK-A-1293-#fJ3"}} ], "must_not" : { "term" : {"price" : 30} } } } } } }

1.2.2 Nested Boolean filter

SELECT document
FROM   products
WHERE  productID      = "KDKE-B-9947-#kL5"
  OR (     productID = "JODL-X-1937-#pV7"
       AND price     = 30 )

GET /my_store/products/_search
{
   "query" : {
      "filtered" : { "filter" : { "bool" : { "should" : [ { "term" : {"productID" : "KDKE-B-9947-#kL5"}}, { "bool" : { "must" : [ { "term" : {"productID" : "JODL-X-1937-#pV7"}}, { "term" : {"price" : 30}} ] }} ] } } } } }

Find the exact value of more than 1.3

如果我们想要查找价格字段值为 $20 或 $30 的文档该如何处理呢?
GET /my_store/products/_search
{
    "query" : {
        "constant_score" : {
            "filter" : {
                "terms" : { 
                    "price" : [20, 30] } } } } }

1.4 range

gt: > 大于(greater than)
lt: < 小于(less than)
gte: >= 大于或等于(greater than or equal to)
lte: <= 小于或等于(less than or equal to)

GET /my_store/products/_search { "query" : { "constant_score" : { "filter" : { "range" : { "price" : { "gte" : 20, "lt" : 40 } } } } } } 如果想要范围无界(比方说 >20 ),只须省略其中一边的限制: "range" : { "price" : { "gt" : 20 } } 日期范围 "range" : { "timestamp" : { "gt" : "2014-01-01 00:00:00", "lt" : "2014-01-07 00:00:00" } } 当使用它处理日期字段时, range 查询支持对 日期计算(date math) 进行操作,比方说,如果我们想查找时间戳在过去一小时内的所有文档: "range" : { "timestamp" : { "gt" : "now-1h" } } "range" : { "timestamp" : { "gt" : "2014-01-01 00:00:00", "lt" : "2014-01-01 00:00:00||+1M" } }

1.5 Processing null value

1.5.1 exist query

SELECT tags
FROM   posts
WHERE  tags IS NOT NULL GET /my_index/posts/_search { "query" : { "constant_score" : { "filter" : { "exists" : { "field" : "tags" } } } } }

1.5.2. Deletion inquiry

SELECT tags
FROM   posts
WHERE  tags IS NULL GET /my_index/posts/_search { "query" : { "constant_score" : { "filter": { "missing" : { "field" : "tags" } } } } }

2, full-text search

2.1 match query

match is a central query. No matter what the need, field, match the query should be the first choice of query. It is an advanced full-text query, which means that it handles both text fields, but also to deal with precise field. match queries main scenario is for full-text search.

2.1.1. A single word queries

GET /my_index/my_type/_search
{
    "query": {
        "match": {
            "title": "QUICK!"
        }
    }
}

2.1.2 multiple-word search

GET /my_index/my_type/_search
{
    "query": {
        "match": {
            "title": "BROWN DOG!"
        }
    }
}

With any matching query terms in a document may cause long tail appears irrelevant results. This is the kind of shotgun-style search. We may want to search for documents that contain all of terms, that is, not matching brown OR dog, and find all documents by matching brown AND dog.

match query operator operator can also accept as input parameters, the default case where the operator is or. We can change that to let all of the specified terms and entries must match:

GET /my_index/my_type/_search
{
    "query": {
        "match": {
            "title": {      
                "query":    "BROWN DOG!", "operator": "and" } } } }

2.2 Combined Query

GET /my_index/my_type/_search
{
  "query": {
    "bool": {
      "must":     { "match": { "title": "quick" }}, "must_not": { "match": { "title": "lazy" }}, "should": [ { "match": { "title": "brown" }}, { "match": { "title": "dog" }} ] } } }

Control accuracy 2.3

Just as we can control the accuracy of the match queries as needed to match the number of statement we should be controlled by minimum_should_match parameters It can be an absolute number, but also can be a percentage:

GET /my_index/my_type/_search
{
  "query": {
    "bool": {
      "should": [
        { "match": { "title": "brown" }}, { "match": { "title": "fox" }}, { "match": { "title": "dog" }} ], "minimum_should_match": 2 } } }

The results will be all the following conditions of the returned document: title field contains "brown" AND "fox", "brown" AND "dog" or "fox" AND "dog". If there is a document containing all three conditions, it will contain more than just two of the more relevant documents.

2.4 query lifting weights

Suppose you want to check the documentation on "full-text search (full text search)," but we want to mention "Elasticsearch" or "Lucene" document given more weight, more weight here means that if the document appeared " elasticsearch "or" Lucene ", they will document these words appear no higher than the correlation score _score, that is to say, they will appear in the result set further above.

GET /_search
{
    "query": {
        "bool": {
            "must": {
                "match": {
                    "content": { "query": "full text search", "operator": "and" } } }, "should": [ { "match": { "content": "Elasticsearch" }}, { "match": { "content": "Lucene" }} ] } } } should 语句匹配得越多表示文档的相关度越高。目前为止还挺好。 但是如果我们想让包含 Lucene 的有更高的权重,并且包含 Elasticsearch 的语句比 Lucene 的权重更高,该如何处理? 我们可以通过指定 boost 来控制任何查询语句的相对的权重, boost 的默认值为 1 ,大于 1 会提升一个语句的相对权重。所以下面重写之前的查询: GET /_search { "query": { "bool": { "must": { "match": { "content": { "query": "full text search", "operator": "and" } } }, "should": [ { "match": { "content": { "query": "Elasticsearch", "boost": 3 } }}, { "match": { "content": { "query": "Lucene", "boost": 2 } }} ] } } }

3, outlook

In the process of using ES online search to some third-party extensions like ES package, but did not find PHP wrapper classes more friendly use of third-party extensions like further encapsulated in the project in the project, and intends to customize a ES service class, currently under development, and therefore familiar with the document and summarized is the first step, after the completion of the next update will be released. The current thinking is for the common query operations, a method to package and support incoming parameter query field, sorting, paging. Search a similar way like writing a single method, and can support the search terms highlighted. Direct support for complex queries natively.

appendix

Reference documents address

Guess you like

Origin www.cnblogs.com/xiaohanlin/p/12342500.html