Detailed search es of

Elasticesearch core function is to search, now introduces ES Search API and its usage.

To help explain, here prepare some test data, save data to a file website.json in:

{"index":{"_index":"website","_id":"1"}}

{ "Address": "Beijing's Changping District, Nan Fung Road", "province": "Beijing", "city": "Beijing", "district": "Changping District", "location": "40.150775,116.2841456 "," title ":" Phoenix bike "," category ": [" shopping "," bicycle monopoly "]}

{"index":{"_index":"website","_id":"2"}}

{ "Address": "Beijing Changping District", "province": "Beijing", "city": "Beijing", "district": "Changping District", "location": "40.109854,116.274349", " title ":" Paige speed "," category ": [" shopping "," other shopping "]}

{"index":{"_index":"website","_id":"3"}}

{ "Address": "Beijing Changping District", "province": "Beijing", "city": "Beijing", "district": "Changping District", "location": "40.165716,116.270237", " title ":" shared bicycle parking lot "," category ": [" service life "," other life service "]}

{"index":{"_index":"website","_id":"4"}}

{ "Address": "Beijing Changping District X030", "province": "Beijing", "city": "Beijing", "district": "Changping District", "location": "40.16806,116.32344", "title": "Changping public bicycle parking lot", "category": [ "service life", "other life service"]}

{"index":{"_index":"website","_id":"5"}}

{ "Address": "Beijing's Changping District 100 Sand Road", "province": "Beijing", "city": "Beijing", "district": "Changping District", "location": "40.149193,116.28929 "," title ":" shared bicycle parking lot "," category ": [" service life "," other life service "]}

{"index":{"_index":"website","_id":"6"}}

{ "Address": "Changping District of Beijing Beiqi Jia town white village No. 200 Beijing", "province": "Beijing", "city": "Beijing", "district": "Changping District", "location ":" 40.10261,116.38784 "," title ":" Green logistics Austrian-store sales "," category ": [" shopping "," bicycle monopoly "]}

{"index":{"_index":"website","_id":"7"}}

{ "Address": "Beiqijia Changping District in the town of Baimiao Street 200 Beijing", "province": "Beijing", "city": "Beijing", "district": "Changping District," " location ":" 40.1026,116.38751 "," title ":" The new electric vehicles (North-Road) "," category ": [" shopping "," bicycle monopoly "]}

{"index":{"_index":"website","_id":"8"}}

{ "Address": "Street No. 202 50 m west white village area Beiqijia Changping Town", "province": "Beijing", "city": "Beijing", "district": " Changping District "," location ":" 40.102558,116.387483 "," title ":" urban wind electric vehicles (North-Road) "," category ": [" shopping "," bicycle monopoly "]}

{"index":{"_index":"website","_id":"9"}}

{ "Address": "Beijing Xiaotangshan near the town of Xiaotangshan Xiaotangshan center mafang Ma Fangcun primary school in Changping District", "province": "Beijing", "city": "Beijing", "district": "Changping area "," location ":" 40.1547354,116.40153086 "," title ":" Yadi electric vehicles (Ma Fang shop) "," category ": [" shopping "," bicycle monopoly "]}

{"index":{"_index":"website","_id":"10"}}

{ "Address": "Beijing's Changping District, near the Northeast Road", "province": "Beijing", "city": "Beijing", "district": "Changping District", "location": "40.134651,116.433778 "," title ":" Harting electric car "," category ": [" shopping "," bicycle monopoly "]}

And creating an index set of settings and mapping, called Website designated index, the number of copies is 1, the number of slice 3, the following command:

{

"settings": {

"number_of_shards": 3,

"number_of_replicas": 1

"mappings": {

"properties": {

"address": {

"type": "text"

"category": {

"type": "keyword"

"city": {

"type": "keyword"

"district": {

"type": "keyword"

"location": {

"type": "geo_point"

"province": {

"type": "keyword"

"title": {

"type": "text"

}

The last execution bulk bulk import command to import documents into ES:

$curl -X POST "localhost:9200/_bulk?pretty" -H 'Content-Type: application/json' --data-binary @website.json

1. Search mechanism

1.1 All data query

GET website/_search

{

"query":{

"match_all":{}

}

It can also be written as:

GET website/_search

1.2 specify a return field

Return results for all field information containing the document by default. You can also specify to return certain fields.

GET website/_search

{

"_source":["title","city"],

"query":{

"term":{

"City": "Beijing"

}

java examples are as follows:

SearchSourceBuilder builder = new SearchSourceBuilder();

// return the specified field

String[] includes = {"title","city"};

// exclude fields returned

String[] excludes = {};

builder.fetchSource(includes,excludes);

Returns the version number 1.3 is specified

GET website/_search

{

"version":true,

"query":{

"term":{

"City": "Beijing"

}

java examples are as follows:

SearchSourceBuilder builder = new SearchSourceBuilder();

builder.version(true);

1.4 filtered low-frequency component data

ES provides a minimum score of filtering mechanism, you can use relatively low scores this data to filter out.

GET website/_search

{

"min_score":2,

"query":{

"match":{

"title":"单车"

}

java examples are as follows:

SearchSourceBuilder builder = new SearchSourceBuilder();

builder.minScore (2f);

1.5 highlight query

{

"query":{

"match":{

"title":"单车"

}

"highlight":{

"fields":{

"title":{}

}

java examples are as follows:

SearchSourceBuilder builder = new SearchSourceBuilder();

MatchQueryBuilder query = QueryBuilders.matchQuery("title", "单车");

HighlightBuilder highlight = new HighlightBuilder();

highlight.field("title");

builder.query(query);

builder.highlighter(highlight);

2. The full-text query

A high level of full-text search on text fields usually used for full-text search, full-text queries understand how the query field is indexed and analyzed before performing the word breaker for each field (or search word is) used in the query string .

2.1 match query

match the query will resolve the query. It will query strings after word after word to an inquiry. Query matching to the document.

{

"query":{

"match":{

"title":"单车"

}

java examples are as follows:

MatchQueryBuilder query = QueryBuilders.matchQuery("title", "单车");

2.2 match_phrase query

match_phrase query will query the contents of first word, but the document also meet the following two conditions in order to be it is to:

After all sub-lexical items should appear in the string.

Terms in order to be consistent in the field.

{

"query":{

"match_phrase":{

"Title": "Sharing Bicycle"

}

java examples are as follows:

MatchPhraseQueryBuilder query = QueryBuilders.matchPhraseQuery ( "title", "Shared bicycle");

2.3 match_phrase_prefix query

match_phrase_prefix and match_phrase similar, but match_phrase_prefix support prefix match last term:

{

"query":{

"match_phrase_prefix":{

"Title": "Sharing single"

}

java examples are as follows:

QueryBuilders.matchPhrasePrefixQuery ( "title", "share a single");

2.4 multi_match query

multi_match is upgraded match for searching multiple fields.

{

"query":{

"multi_match":{

"query":"北京",

"fields":["title","address"]

}

multi_match support Name field to search for the wildcard: examples are as follows:

{

"query":{

"multi_match":{

"query":"北京",

"fields":["title","*_address"]

}

Also supports the right of the search field of the specified character index weight. Weights assigned keyword appears in the title is present in a weight three times the address field, command as follows:

{

"query":{

"multi_match":{

"query":"北京",

"fields":["title^3","address"]

}

java examples are as follows:

QueryBuilders.multiMatchQuery("北京","title","address");

2.5 common_terms query

common_terms query is an alternative stop words to improve search precision and recall rate programs without sacrificing performance.

2.5.1 problem

Each term in the query has cost. Search "The brown fox" query requires three terms, each query a "the", "brown" and "fox" All queries are performed for all documents in the index. Query "the" possible matches and many documents, so the impact on the correlation is much smaller than the other two terms.

Previously, the solution to this problem is to ignore the terms of high frequency. By being "the" treated as stop words, we reduce the size of the index and reduce the number of terms of the query to be executed.

The problem with this approach is that, although the impact of stop words correlation is very small, but they are still important. If we remove the stop words, we will lose accuracy, we can not distinguish between "happy" and "not happy", "The The" or "To be or not to be" no longer exists in the index, search the precision and recall rate will be reduced.

2.5.2 Solution

common_terms query improve a solution, after which the query word lexical items into an important lexical items (low frequency of terms) and unimportant lexical items (high-frequency words, that is, before the stop words). When the search, first search term matcher and important documents, these documents is a term and a term appears less influenced document its score. Then execute the second query, smaller search term effect on the ratings high frequency words, but not counting the scores of all the documents, but only count the first query has been matched to the document score. If the first query contains only high-frequency words, it will be a separate query execution by and connectors, in other words, will search for all the lexical items.

Is a high frequency word lexical items or low word is set to a threshold value by cutoff_frequency.

Perhaps the most interesting query attribute is that it automatically adapts to domain-specific stop words. For example, on a video hosting site, common terms such as clip or video will automatically stop word performance without the need to manually maintain a list.

2.5.3 Example

For example, the document frequency of a term will be more than 0.1% as the high-frequency words, word frequency may be directly connected to low_freq_operator, high_freq_operator parameters. Set the low-frequency word operator is "and" all the low-frequency words are to be searched.

{

"query":{

"common":{

"body":{

"query":"nelly the elephant as a cartoon",

"cutoff_frequency":0.001,

"low_freq_opterator":"and"

}

The above-described operation will be equivalent to:

{

"query":{

"bool":{

"must":[

{"term":{"body":"helly"}},

{"term":{"body":"elephant"}},

{"term":{"body":"cartoon"}}

"should":[

{"term":{"body":"the"}},

{"term":{"body":"as"}},

{"term":{"body":"a"}}

]

}

java examples are as follows:

QueryBuilders.commonTermsQuery("body","nelly the elephant as a cartoon").cutoffFrequency(0.001f).lowFreqOperator(Operator.AND);

2.6 query_string query

query_string query is combined with Lucene query syntax very close a query, allowing the use of several special conditions in a query in the query (eg: AND | OR | NOT) multiple fields inquiries, suggestions familiar with Lucene query syntax users to use.

java examples are as follows:

QueryBuilders.queryStringQuery("大数据") .field("title").defaultOperator(Operator.AND);

3. The terms in the query

Prior to the implementation of full-text search query analysis query string, for a term inverted index stored in the precise terms in the search operation. A term usually used for query-level structured data, such as numbers, dates, and enumerated types.

3.1 term query

term query for exact matches of a word.

{

"query":{

"term":{

"City": "Beijing"

}

java examples are as follows:

QueryBuilders.termQuery("city","北京市");

3.2 terms query

query terms are upgrading trem query, the query can be used to document your document contains more words. To find such a document city field contains the keyword "Beijing" or "Tianjin" of

{

"query":{

"terms":{

"City": [ "Beijing", "Tianjin"]

}

java examples are as follows:

QueryBuilders.termsQuery ( "city", "Beijing", "Tianjin");

3.3 range query

range for query matching documents in a certain numerical range, the date or String field. Use range queries can only check a field, you can not act on multiple fields. range query parameters supported are the following:

gt: greater than

gte: greater than or equal

lt: less than

lte: less than or equal

Check prices e.g. 20 <price <= 80 data:

{

"query":{

"range":{

"price":{

"gt":20,

"lte":80

}

Queries date 2020-01-01 - 2020-01-08 data:

{

"query":{

"range":{

"Index": {

"gte":"2020-01-01",

"lte":"2020-01-08",

"format":"yyyy-MM-dd"

}

java examples are as follows:

QueryBuilders.rangeQuery("price").gt(20).lte(80);

3.4 exists query

exists query returns fields have at least one non-null value of the document.

{

"query":{

"exists":{

"field":"city"

}

java examples are as follows:

QueryBuilders.existsQuery("city");

3.5 prefix query

prefix query for querying document in a field given prefix began.

{

"query":{

"prefix":{

"city":"北京"

}

java examples are as follows:

QueryBuilders.prefixQuery("city","北京");

3.6 wildcard query

translated into Chinese wildcard query wildcard queries, support for single-character wildcard (?, used to match any one character) and multiple character wildcard (*, used to match zero or more characters).

{

"query":{

"wildcard":{

"city":"北?市"

}

java examples are as follows:

QueryBuilders.wildcardQuery("city","北?市");

3.7 regexp query

ES also supports regular expressions query, you can query the specified field by regexp query documents that contain the specified regular expression matching. May represent any character, "ace" and "ab ..." are matched with "abcde", a {3} b {3}, a {2,3} b {2,4}, a {2,} {2, } string are matched with "aaabbb".

For example, need to match begins with W followed by a digit ZIP code, use regular expressions to query query structure is as follows:

{

"query":{

"regexp":{

"postcode":"W[0-9].+"

}

java examples are as follows:

QueryBuilders.regexpQuery("postcode","W[0-9].+");

"wildcard":{

"city":"北?市"

}

java examples are as follows: