Elasticsearch series --- common way to search and aggregation analysis

Overview

This part describes the common six kinds of search, aggregation syntax analysis, basic machine is on combat, and may relational database for comparison, if before understand relational database, it only needs to know Benpian search and aggregation rules of grammar can a.

Search response message

music index above articles to establish an example, let's look at what are the properties of search results

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1,
    "hits": [
      {
        "_index": "music",
        "_type": "children",
        "_id": "1",
        "_score": 1,
        "_source": {
          "name": "gymbo",
          "content": "I hava a friend who loves smile, gymbo is his name",
          "length": "75"
        }
      }
    ]
  }
}

The main parameters are as follows:

  • took: time-consuming, in milliseconds.
  • timed_out: whether a timeout, true overtime, false no timeout.
  • _shards: data is split into five slices, so the search request, to all primary shard queries, or one of its replica shard.
  • hits.total: number match the query, a document.
  • hits.max_score: score is a qualifying score maximum document.
  • hits.hits.score: This document represents the current level score match score of relevance of search conditions, more relevant, the more matches, the score is also high.
  • hits.hits: contains detailed data document matches the search criteria.

Search mode

Search all the data

GET /music/children/_search

Conditional search

GET /music/children/_search?q=name:gymbo&sort=length:asc

Features of this search syntax is that all of the conditions, query string used to sort all http requests to incidental. This syntax is generally used when a simple query presentation or curl command line, NA build complex queries conditions, the production has been rarely used.

Query DSL

DSL: Domain Specified Language domain-specific language

http request body: format request body, body build syntaxes json, can build complex syntax.

All data query

GET /music/children/_search
{
  "query":{
    "match_all": {}
  }
}

Conditional + Sort:

GET /music/children/_search
{
  "query":{
    "match": {
      "name": "gymbo"
    }
  },
  "sort":[{"length":"desc"}]
}

Paging query, size starts from 0, the command fetch section 10 to section 19 data

GET /music/children/_search
{
  "query": {
    "match_all":{}
  },
  "from": 10,
  "size": 10
}

Specify check out the property

GET /music/children/_search
{
  "query": {
    "match_all" : {}
  },
  "_source": ["name","content"]
}

query filter

With a plurality of filter conditions: song title is gymbo, and the length between 65 and 80 seconds

GET /music/children/_search
{
  "query":{
    "bool":{
      "must": [
        {"match": {
          "name": "gymbo"
        }}
      ],
      "filter": {"range": {
        "length": {
          "gte": 65,
          "lte": 80
        }
      }}
    }
  }
}

Full Text Search

GET /music/children/_search
{
  "query":{
    "match": {
      "content":"friend smile"
    }
  }
}

Results are content field by relevance score to sort, search conditions, the new document has been established in the inverted index, then press to match the highest order, the principle of full-text indexing.

Phrase Searching

GET /music/children/_search
{
  "query":{
    "match_phrase": {
      "content":"friend smile"
    }
  }
}

Full-text search will match word-breaking, case-insensitive, and then go to the inverted index matching, phrase search, regardless of the word, case-sensitive, requiring only match the search string exactly the same.

Highlight retrieval

GET /music/children/_search
{
  "query":{
    "match_phrase":{
      "content":"friend smile"
    }
  },
  "highlight": {
    "fields": {
      "content":{}
    }
  }
}

Keywords match highlights show, highlighted by the contents of the label has reached the mark effect.

Aggregated analysis

Statistical analysis of packet aggregation is similar to relational data, and the name of the syntax used in many of the mysql similar to here, to see a lot of familiar methods.

Single field group statistics

Requirements: count the number of songs in each language.

size of 0 indicates that the document does not meet the conditions displayed records show only statistics, do not write, then the default value is 10

GET /music/children/_search
{
  "size": 0,
  "aggs": {
    "group_by_lang": {
      "terms": {
        "field": "language"
      }
    }
  }
}

In response to the results:

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "group_by_lang": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "english",
          "doc_count": 1
        }
      ]
    }
  }
}

The following error message occurs if the aggregate query:

"root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [language] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
      }
    ]

fielddata property necessary for the field of the packet is true

PUT /music/_mapping/children
{
  "properties": {
    "language": {
      "type": "text",
      "fielddata": true
    }
  }
}

Packet statistics with the query conditions

Demand: the emergence lyrics "friend" of the song, to calculate the number of songs in each language

GET /music/children/_search
{
  "size": 0,
  "query": {
    "match": {
      "content": "friend"
    }
  },
  "aggs": {
    "all_languages": {
      "terms": {
        "field": "language"
      }
    }
  }
}

Averaging

Demand: Calculate song in each language, how much is the average length of time

GET /music/children/_search
{
    "size": 0,
    "aggs": {
        "group_by_languages": {
            "terms": {
                "field": "language"
            },
            "aggs": {
                "avg_length": {
                    "avg": {
                        "field": "length"
                    }
                }
            }
        }
    }
}

Packet sorted

Demand: Calculate song in each language, the average duration is how much, according to the average duration in descending order

GET /music/children/_search
{
    "size": 0,
    "aggs": {
        "group_by_languages": {
            "terms": {
                "field": "language",
                "order": {
                  "avg_length": "desc"
                }
            },
            "aggs": {
                "avg_length": {
                    "avg": {
                        "field": "length"
                    }
                }
            }
        }
    }
}

Nested queries, grouping interval + + the average packet statistics

Demand: the specified time range grouping interval, then grouped by language Within each group, the length of the average of the last recalculation

GET /music/children/_search
{
  "size": 0,
  "aggs": {
    "group_by_price": {
      "range": {
        "field": "length",
        "ranges": [
          {
            "from": 0,
            "to": 60
          },
          {
            "from": 60,
            "to": 120
          },
          {
            "from": 120,
            "to": 180
          }
        ]
      },
      "aggs": {
        "group_by_languages": {
          "terms": {
            "field": "language"
          },
          "aggs": {
            "average_length": {
              "avg": {
                "field": "length"
              }
            }
          }
        }
      }
    }
  }
}

Batch query

Example above requests are issued by a single individual, there is a elasticsearch syntax, may be incorporated plurality batch query requests, thus reducing the network overhead for each individual request, the most basic example of the syntax is as follows:

GET /_mget
{
  "docs": [
    {
      "_index" : "music",
       "_type" : "children",
       "_id" :    1
    },
    {
      "_index" : "music",
       "_type" : "children",
       "_id" :    2
    }
  ]
}

The following docs mget parameter is an array, which array each element can _index define a document, and _id _type metadata, _index may be the same or not the same, may be defined field _source metadata specifies desired.

Example response:

{
  "docs": [
    {
      "_index": "music",
      "_type": "children",
      "_id": "1",
      "_version": 4,
      "found": true,
      "_source": {
        "name": "gymbo",
        "content": "I hava a friend who loves smile, gymbo is his name",
        "language": "english",
        "length": "75",
        "likes": 0
      }
    },
    {
      "_index": "music",
      "_type": "children",
      "_id": "2",
      "_version": 13,
      "found": true,
      "_source": {
        "name": "wake me, shark me",
        "content": "don't let me sleep too late, gonna get up brightly early in the morning",
        "language": "english",
        "length": "55",
        "likes": 9
      }
    }
  ]
}

It is also a response docs array, when the array length is consistent with the request, if the document does not exist, or not to search for other causes of error does not affect the overall results, mget of http response code is still 200, each document searches are independent.

If the document batch query is in the same index below can be _index metadata (_type way I remove metadata) to the request line:

GET /music/children/_mget
{
  "docs": [
    {
       "_id" :    1
    },
    {
       "_id" :    2
    }
  ]
}

Or directly use the simpler array ids:

GET /music/children/_mget
{
  "ids":[1,2]
}

The query result is the same.

The importance of mget

mget is very important to perform this query, if you want to query multiple data one time, then the api must use batch bulk operations, to minimize the number of network overhead, it may be possible to enhance the performance several times, even several times .

summary

This introduction of the most commonly used search queries and batch polymerization written scenario, packets containing statistics, averages, sorting, grouping interval. This is the basic routine, basically contains our common needs, mysql familiar with the case, to master very quickly, familiarize yourself with the syntax Restful, the basic OK.

High focus on Java concurrency, distributed architecture, more dry goods share technology and experience, please pay attention to the public number: Java Architecture Community
Java Architecture Community

Guess you like

Origin www.cnblogs.com/huangying2124/p/11914404.html