Overview
This part describes the common six kinds of search, aggregation syntax analysis, basic machine is on combat, and may relational database for comparison, if before understand relational database, it only needs to know Benpian search and aggregation rules of grammar can a.
Search response message
music index above articles to establish an example, let's look at what are the properties of search results
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "music",
"_type": "children",
"_id": "1",
"_score": 1,
"_source": {
"name": "gymbo",
"content": "I hava a friend who loves smile, gymbo is his name",
"length": "75"
}
}
]
}
}
The main parameters are as follows:
- took: time-consuming, in milliseconds.
- timed_out: whether a timeout, true overtime, false no timeout.
- _shards: data is split into five slices, so the search request, to all primary shard queries, or one of its replica shard.
- hits.total: number match the query, a document.
- hits.max_score: score is a qualifying score maximum document.
- hits.hits.score: This document represents the current level score match score of relevance of search conditions, more relevant, the more matches, the score is also high.
- hits.hits: contains detailed data document matches the search criteria.
Search mode
query string search
Search all the data
GET /music/children/_search
Conditional search
GET /music/children/_search?q=name:gymbo&sort=length:asc
Features of this search syntax is that all of the conditions, query string used to sort all http requests to incidental. This syntax is generally used when a simple query presentation or curl command line, NA build complex queries conditions, the production has been rarely used.
Query DSL
DSL: Domain Specified Language domain-specific language
http request body: format request body, body build syntaxes json, can build complex syntax.
All data query
GET /music/children/_search
{
"query":{
"match_all": {}
}
}
Conditional + Sort:
GET /music/children/_search
{
"query":{
"match": {
"name": "gymbo"
}
},
"sort":[{"length":"desc"}]
}
Paging query, size starts from 0, the command fetch section 10 to section 19 data
GET /music/children/_search
{
"query": {
"match_all":{}
},
"from": 10,
"size": 10
}
Specify check out the property
GET /music/children/_search
{
"query": {
"match_all" : {}
},
"_source": ["name","content"]
}
query filter
With a plurality of filter conditions: song title is gymbo, and the length between 65 and 80 seconds
GET /music/children/_search
{
"query":{
"bool":{
"must": [
{"match": {
"name": "gymbo"
}}
],
"filter": {"range": {
"length": {
"gte": 65,
"lte": 80
}
}}
}
}
}
Full Text Search
GET /music/children/_search
{
"query":{
"match": {
"content":"friend smile"
}
}
}
Results are content field by relevance score to sort, search conditions, the new document has been established in the inverted index, then press to match the highest order, the principle of full-text indexing.
Phrase Searching
GET /music/children/_search
{
"query":{
"match_phrase": {
"content":"friend smile"
}
}
}
Full-text search will match word-breaking, case-insensitive, and then go to the inverted index matching, phrase search, regardless of the word, case-sensitive, requiring only match the search string exactly the same.
Highlight retrieval
GET /music/children/_search
{
"query":{
"match_phrase":{
"content":"friend smile"
}
},
"highlight": {
"fields": {
"content":{}
}
}
}
Keywords match highlights show, highlighted by the contents of the label has reached the mark effect.
Aggregated analysis
Statistical analysis of packet aggregation is similar to relational data, and the name of the syntax used in many of the mysql similar to here, to see a lot of familiar methods.
Single field group statistics
Requirements: count the number of songs in each language.
size of 0 indicates that the document does not meet the conditions displayed records show only statistics, do not write, then the default value is 10
GET /music/children/_search
{
"size": 0,
"aggs": {
"group_by_lang": {
"terms": {
"field": "language"
}
}
}
}
In response to the results:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0,
"hits": []
},
"aggregations": {
"group_by_lang": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "english",
"doc_count": 1
}
]
}
}
}
The following error message occurs if the aggregate query:
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [language] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
}
]
fielddata property necessary for the field of the packet is true
PUT /music/_mapping/children
{
"properties": {
"language": {
"type": "text",
"fielddata": true
}
}
}
Packet statistics with the query conditions
Demand: the emergence lyrics "friend" of the song, to calculate the number of songs in each language
GET /music/children/_search
{
"size": 0,
"query": {
"match": {
"content": "friend"
}
},
"aggs": {
"all_languages": {
"terms": {
"field": "language"
}
}
}
}
Averaging
Demand: Calculate song in each language, how much is the average length of time
GET /music/children/_search
{
"size": 0,
"aggs": {
"group_by_languages": {
"terms": {
"field": "language"
},
"aggs": {
"avg_length": {
"avg": {
"field": "length"
}
}
}
}
}
}
Packet sorted
Demand: Calculate song in each language, the average duration is how much, according to the average duration in descending order
GET /music/children/_search
{
"size": 0,
"aggs": {
"group_by_languages": {
"terms": {
"field": "language",
"order": {
"avg_length": "desc"
}
},
"aggs": {
"avg_length": {
"avg": {
"field": "length"
}
}
}
}
}
}
Nested queries, grouping interval + + the average packet statistics
Demand: the specified time range grouping interval, then grouped by language Within each group, the length of the average of the last recalculation
GET /music/children/_search
{
"size": 0,
"aggs": {
"group_by_price": {
"range": {
"field": "length",
"ranges": [
{
"from": 0,
"to": 60
},
{
"from": 60,
"to": 120
},
{
"from": 120,
"to": 180
}
]
},
"aggs": {
"group_by_languages": {
"terms": {
"field": "language"
},
"aggs": {
"average_length": {
"avg": {
"field": "length"
}
}
}
}
}
}
}
}
Batch query
Example above requests are issued by a single individual, there is a elasticsearch syntax, may be incorporated plurality batch query requests, thus reducing the network overhead for each individual request, the most basic example of the syntax is as follows:
GET /_mget
{
"docs": [
{
"_index" : "music",
"_type" : "children",
"_id" : 1
},
{
"_index" : "music",
"_type" : "children",
"_id" : 2
}
]
}
The following docs mget parameter is an array, which array each element can _index define a document, and _id _type metadata, _index may be the same or not the same, may be defined field _source metadata specifies desired.
Example response:
{
"docs": [
{
"_index": "music",
"_type": "children",
"_id": "1",
"_version": 4,
"found": true,
"_source": {
"name": "gymbo",
"content": "I hava a friend who loves smile, gymbo is his name",
"language": "english",
"length": "75",
"likes": 0
}
},
{
"_index": "music",
"_type": "children",
"_id": "2",
"_version": 13,
"found": true,
"_source": {
"name": "wake me, shark me",
"content": "don't let me sleep too late, gonna get up brightly early in the morning",
"language": "english",
"length": "55",
"likes": 9
}
}
]
}
It is also a response docs array, when the array length is consistent with the request, if the document does not exist, or not to search for other causes of error does not affect the overall results, mget of http response code is still 200, each document searches are independent.
If the document batch query is in the same index below can be _index metadata (_type way I remove metadata) to the request line:
GET /music/children/_mget
{
"docs": [
{
"_id" : 1
},
{
"_id" : 2
}
]
}
Or directly use the simpler array ids:
GET /music/children/_mget
{
"ids":[1,2]
}
The query result is the same.
The importance of mget
mget is very important to perform this query, if you want to query multiple data one time, then the api must use batch bulk operations, to minimize the number of network overhead, it may be possible to enhance the performance several times, even several times .
summary
This introduction of the most commonly used search queries and batch polymerization written scenario, packets containing statistics, averages, sorting, grouping interval. This is the basic routine, basically contains our common needs, mysql familiar with the case, to master very quickly, familiarize yourself with the syntax Restful, the basic OK.
High focus on Java concurrency, distributed architecture, more dry goods share technology and experience, please pay attention to the public number: Java Architecture Community