Elasticesearch core function is to search, now introduces ES Search API and its usage.
To help explain, here prepare some test data, save data to a file website.json in:
{"index":{"_index":"website","_id":"1"}}
{ "Address": "Beijing's Changping District, Nan Fung Road", "province": "Beijing", "city": "Beijing", "district": "Changping District", "location": "40.150775,116.2841456 "," title ":" Phoenix bike "," category ": [" shopping "," bicycle monopoly "]}
{"index":{"_index":"website","_id":"2"}}
{ "Address": "Beijing Changping District", "province": "Beijing", "city": "Beijing", "district": "Changping District", "location": "40.109854,116.274349", " title ":" Paige speed "," category ": [" shopping "," other shopping "]}
{"index":{"_index":"website","_id":"3"}}
{ "Address": "Beijing Changping District", "province": "Beijing", "city": "Beijing", "district": "Changping District", "location": "40.165716,116.270237", " title ":" shared bicycle parking lot "," category ": [" service life "," other life service "]}
{"index":{"_index":"website","_id":"4"}}
{ "Address": "Beijing Changping District X030", "province": "Beijing", "city": "Beijing", "district": "Changping District", "location": "40.16806,116.32344", "title": "Changping public bicycle parking lot", "category": [ "service life", "other life service"]}
{"index":{"_index":"website","_id":"5"}}
{ "Address": "Beijing's Changping District 100 Sand Road", "province": "Beijing", "city": "Beijing", "district": "Changping District", "location": "40.149193,116.28929 "," title ":" shared bicycle parking lot "," category ": [" service life "," other life service "]}
{"index":{"_index":"website","_id":"6"}}
{ "Address": "Changping District of Beijing Beiqi Jia town white village No. 200 Beijing", "province": "Beijing", "city": "Beijing", "district": "Changping District", "location ":" 40.10261,116.38784 "," title ":" Green logistics Austrian-store sales "," category ": [" shopping "," bicycle monopoly "]}
{"index":{"_index":"website","_id":"7"}}
{ "Address": "Beiqijia Changping District in the town of Baimiao Street 200 Beijing", "province": "Beijing", "city": "Beijing", "district": "Changping District," " location ":" 40.1026,116.38751 "," title ":" The new electric vehicles (North-Road) "," category ": [" shopping "," bicycle monopoly "]}
{"index":{"_index":"website","_id":"8"}}
{ "Address": "Street No. 202 50 m west white village area Beiqijia Changping Town", "province": "Beijing", "city": "Beijing", "district": " Changping District "," location ":" 40.102558,116.387483 "," title ":" urban wind electric vehicles (North-Road) "," category ": [" shopping "," bicycle monopoly "]}
{"index":{"_index":"website","_id":"9"}}
{ "Address": "Beijing Xiaotangshan near the town of Xiaotangshan Xiaotangshan center mafang Ma Fangcun primary school in Changping District", "province": "Beijing", "city": "Beijing", "district": "Changping area "," location ":" 40.1547354,116.40153086 "," title ":" Yadi electric vehicles (Ma Fang shop) "," category ": [" shopping "," bicycle monopoly "]}
{"index":{"_index":"website","_id":"10"}}
{ "Address": "Beijing's Changping District, near the Northeast Road", "province": "Beijing", "city": "Beijing", "district": "Changping District", "location": "40.134651,116.433778 "," title ":" Harting electric car "," category ": [" shopping "," bicycle monopoly "]}
And creating an index set of settings and mapping, called Website designated index, the number of copies is 1, the number of slice 3, the following command:
{
"settings": {
"number_of_shards": 3,
"number_of_replicas": 1
},
"mappings": {
"properties": {
"address": {
"type": "text"
},
"category": {
"type": "keyword"
},
"city": {
"type": "keyword"
},
"district": {
"type": "keyword"
},
"location": {
"type": "geo_point"
},
"province": {
"type": "keyword"
},
"title": {
"type": "text"
}
}
}
}
The last execution bulk bulk import command to import documents into ES:
$curl -X POST "localhost:9200/_bulk?pretty" -H 'Content-Type: application/json' --data-binary @website.json
1
1. Search mechanism
1.1 All data query
GET website/_search
{
"query":{
"match_all":{}
}
}
It can also be written as:
GET website/_search
1
1.2 specify a return field
Return results for all field information containing the document by default. You can also specify to return certain fields.
GET website/_search
{
"_source":["title","city"],
"query":{
"term":{
"City": "Beijing"
}
}
}
java examples are as follows:
SearchSourceBuilder builder = new SearchSourceBuilder();
// return the specified field
String[] includes = {"title","city"};
// exclude fields returned
String[] excludes = {};
builder.fetchSource(includes,excludes);
Returns the version number 1.3 is specified
GET website/_search
{
"version":true,
"query":{
"term":{
"City": "Beijing"
}
}
}
java examples are as follows:
SearchSourceBuilder builder = new SearchSourceBuilder();
builder.version(true);
1.4 filtered low-frequency component data
ES provides a minimum score of filtering mechanism, you can use relatively low scores this data to filter out.
GET website/_search
{
"min_score":2,
"query":{
"match":{
"title":"单车"
}
}
}
java examples are as follows:
SearchSourceBuilder builder = new SearchSourceBuilder();
builder.minScore (2f);
1.5 highlight query
{
"query":{
"match":{
"title":"单车"
}
},
"highlight":{
"fields":{
"title":{}
}
}
java examples are as follows:
SearchSourceBuilder builder = new SearchSourceBuilder();
MatchQueryBuilder query = QueryBuilders.matchQuery("title", "单车");
HighlightBuilder highlight = new HighlightBuilder();
highlight.field("title");
builder.query(query);
builder.highlighter(highlight);
2. The full-text query
A high level of full-text search on text fields usually used for full-text search, full-text queries understand how the query field is indexed and analyzed before performing the word breaker for each field (or search word is) used in the query string .
2.1 match query
match the query will resolve the query. It will query strings after word after word to an inquiry. Query matching to the document.
{
"query":{
"match":{
"title":"单车"
}
}
}
java examples are as follows:
MatchQueryBuilder query = QueryBuilders.matchQuery("title", "单车");
2.2 match_phrase query
match_phrase query will query the contents of first word, but the document also meet the following two conditions in order to be it is to:
After all sub-lexical items should appear in the string.
Terms in order to be consistent in the field.
{
"query":{
"match_phrase":{
"Title": "Sharing Bicycle"
}
}
}
java examples are as follows:
MatchPhraseQueryBuilder query = QueryBuilders.matchPhraseQuery ( "title", "Shared bicycle");
2.3 match_phrase_prefix query
match_phrase_prefix and match_phrase similar, but match_phrase_prefix support prefix match last term:
{
"query":{
"match_phrase_prefix":{
"Title": "Sharing single"
}
}
}
java examples are as follows:
QueryBuilders.matchPhrasePrefixQuery ( "title", "share a single");
1
2.4 multi_match query
multi_match is upgraded match for searching multiple fields.
{
"query":{
"multi_match":{
"query":"北京",
"fields":["title","address"]
}
}
}
multi_match support Name field to search for the wildcard: examples are as follows:
{
"query":{
"multi_match":{
"query":"北京",
"fields":["title","*_address"]
}
}
Also supports the right of the search field of the specified character index weight. Weights assigned keyword appears in the title is present in a weight three times the address field, command as follows:
{
"query":{
"multi_match":{
"query":"北京",
"fields":["title^3","address"]
}
}
}
java examples are as follows:
QueryBuilders.multiMatchQuery("北京","title","address");
1
2.5 common_terms query
common_terms query is an alternative stop words to improve search precision and recall rate programs without sacrificing performance.
2.5.1 problem
Each term in the query has cost. Search "The brown fox" query requires three terms, each query a "the", "brown" and "fox" All queries are performed for all documents in the index. Query "the" possible matches and many documents, so the impact on the correlation is much smaller than the other two terms.
Previously, the solution to this problem is to ignore the terms of high frequency. By being "the" treated as stop words, we reduce the size of the index and reduce the number of terms of the query to be executed.
The problem with this approach is that, although the impact of stop words correlation is very small, but they are still important. If we remove the stop words, we will lose accuracy, we can not distinguish between "happy" and "not happy", "The The" or "To be or not to be" no longer exists in the index, search the precision and recall rate will be reduced.
2.5.2 Solution
common_terms query improve a solution, after which the query word lexical items into an important lexical items (low frequency of terms) and unimportant lexical items (high-frequency words, that is, before the stop words). When the search, first search term matcher and important documents, these documents is a term and a term appears less influenced document its score. Then execute the second query, smaller search term effect on the ratings high frequency words, but not counting the scores of all the documents, but only count the first query has been matched to the document score. If the first query contains only high-frequency words, it will be a separate query execution by and connectors, in other words, will search for all the lexical items.
Is a high frequency word lexical items or low word is set to a threshold value by cutoff_frequency.
Perhaps the most interesting query attribute is that it automatically adapts to domain-specific stop words. For example, on a video hosting site, common terms such as clip or video will automatically stop word performance without the need to manually maintain a list.
2.5.3 Example
For example, the document frequency of a term will be more than 0.1% as the high-frequency words, word frequency may be directly connected to low_freq_operator, high_freq_operator parameters. Set the low-frequency word operator is "and" all the low-frequency words are to be searched.
{
"query":{
"common":{
"body":{
"query":"nelly the elephant as a cartoon",
"cutoff_frequency":0.001,
"low_freq_opterator":"and"
}
}
}
}
The above-described operation will be equivalent to:
{
"query":{
"bool":{
"must":[
{"term":{"body":"helly"}},
{"term":{"body":"elephant"}},
{"term":{"body":"cartoon"}}
],
"should":[
{"term":{"body":"the"}},
{"term":{"body":"as"}},
{"term":{"body":"a"}}
]
}
}
}
java examples are as follows:
QueryBuilders.commonTermsQuery("body","nelly the elephant as a cartoon").cutoffFrequency(0.001f).lowFreqOperator(Operator.AND);
2.6 query_string query
query_string query is combined with Lucene query syntax very close a query, allowing the use of several special conditions in a query in the query (eg: AND | OR | NOT) multiple fields inquiries, suggestions familiar with Lucene query syntax users to use.
java examples are as follows:
QueryBuilders.queryStringQuery("大数据") .field("title").defaultOperator(Operator.AND);
1
3. The terms in the query
Prior to the implementation of full-text search query analysis query string, for a term inverted index stored in the precise terms in the search operation. A term usually used for query-level structured data, such as numbers, dates, and enumerated types.
3.1 term query
term query for exact matches of a word.
{
"query":{
"term":{
"City": "Beijing"
}
}
}
java examples are as follows:
QueryBuilders.termQuery("city","北京市");
1
3.2 terms query
query terms are upgrading trem query, the query can be used to document your document contains more words. To find such a document city field contains the keyword "Beijing" or "Tianjin" of
{
"query":{
"terms":{
"City": [ "Beijing", "Tianjin"]
}
}
}
java examples are as follows:
QueryBuilders.termsQuery ( "city", "Beijing", "Tianjin");
1
3.3 range query
range for query matching documents in a certain numerical range, the date or String field. Use range queries can only check a field, you can not act on multiple fields. range query parameters supported are the following:
gt: greater than
gte: greater than or equal
lt: less than
lte: less than or equal
Check prices e.g. 20 <price <= 80 data:
{
"query":{
"range":{
"price":{
"gt":20,
"lte":80
}
}
}
}
Queries date 2020-01-01 - 2020-01-08 data:
{
"query":{
"range":{
"Index": {
"gte":"2020-01-01",
"lte":"2020-01-08",
"format":"yyyy-MM-dd"
}
}
}
}
java examples are as follows:
QueryBuilders.rangeQuery("price").gt(20).lte(80);
1
3.4 exists query
exists query returns fields have at least one non-null value of the document.
{
"query":{
"exists":{
"field":"city"
}
}
}
3
java examples are as follows:
QueryBuilders.existsQuery("city");
1
3.5 prefix query
prefix query for querying document in a field given prefix began.
{
"query":{
"prefix":{
"city":"北京"
}
}
}
java examples are as follows:
QueryBuilders.prefixQuery("city","北京");
1
3.6 wildcard query
translated into Chinese wildcard query wildcard queries, support for single-character wildcard (?, used to match any one character) and multiple character wildcard (*, used to match zero or more characters).
{
"query":{
"wildcard":{
"city":"北?市"
}
}
}
java examples are as follows:
QueryBuilders.wildcardQuery("city","北?市");
1
3.7 regexp query
ES also supports regular expressions query, you can query the specified field by regexp query documents that contain the specified regular expression matching. May represent any character, "ace" and "ab ..." are matched with "abcde", a {3} b {3}, a {2,3} b {2,4}, a {2,} {2, } string are matched with "aaabbb".
For example, need to match begins with W followed by a digit ZIP code, use regular expressions to query query structure is as follows:
{
"query":{
"regexp":{
"postcode":"W[0-9].+"
}
}
}
java examples are as follows:
QueryBuilders.regexpQuery("postcode","W[0-9].+");
"wildcard":{
"city":"北?市"
}
}
}
java examples are as follows:
QueryBuilders.wildcardQuery("city","北?市");
Article from: https://blog.csdn.net/dwjf321/article/details/103904001