1, the prefix inquiry
To enter data:
PUT /my_index/address/1 { "postcode": "W1 3DG" } PUT /my_index/address/2 { "postcode": "W2F 8HW" } PUT /my_index/address/3 { "postcode": "W1 7HW" } PUT /my_index/address/4 { "postcode": "WC1N 1LZ" } PUT /my_index/address/5 { "postcode": "SW5 0BE" }
To find all W1
Zip beginning, you can use a simple prefix
query:
类似于SQL: select * from table where xx like 'xx%';
GET /my_index/address/_search { "query": { "prefix": { "postcode": "W1" } } }
2, phrase matching the query (match_phrase)
When performing phrase matching the query, ElasticSearch engine first analysis (analyze) the query string, construct the query phrase from the text after the analysis, which means you must match all word phrases, and ensure the relative position of each word is the same:
POST /_search -d { "from":1, "size":100, "fields":[ "eventname"], "query":{ "match_phrase":{ "eventname":"Open Source" } } }
3, the phrase prefix matching the query (match_phrase_prefix)
In addition to the last word prefix matching query text only, match_phrase_prefix and match_phrase essentially the same query, parameters max_expansions control the last word will be re-written the number of prefixes, that is, to control the number prefix extension component of the word, default is 50. The more extended prefix number, the greater the number of documents found; too little if the prefix number of extensions may not find the appropriate documents, missing data. As shown in the code, the document can be found eventname include "Open Source Hack Night" is.
POST /_search -d { "from":1, "size":100, "fields":[ "eventname" ], "query":{ "match_phrase_prefix":{ "eventname":{ "query":"Open Source hac", "max_expansions":50 } } } }
Use match performance tend to be very high, W1-> scanning the inverted index -> Once the scan is to W1, it can be stopped, because it is two doc with W1 has been found -> no need to continue to search for another term a;
4, wildcard and regular expression queries
With prefix
similar characteristics prefix query wildcard
wildcard queries based underlayer also a search term, the prefix is that it allows different query matches the specified regular expression. It uses the standard shell wildcard queries: ?
matches any character *
matches zero or more characters.
This query contains matches W1F 7HW
and W2F 8HW
documents:
GET /my_index/address/_search { "query": { "wildcard": { "postcode": "W?F*HW" } } }
?
Match 1
and 2
, *
with the space and 7
and 8
match.
Imagine if we want to match W
all the zip code, prefix matching area will include WC
all the zip code, problems encountered at the beginning of the wildcard match is similar to, if you want to match only W
the beginning and follow a zip code for all numbers, regexp
regular expressions allow write queries such a more complex patterns:
GET /my_index/address/_search { "query": { "regexp": { "postcode": "W[0-9].+" } } }
QueryBuilders.regexpQuery("postcode", "W[0-9].+");
This regular expression requires word must W
begin with, followed by any number from 0-9, and then take one or more of the other characters.
wildcard and regexp, consistent with the prefix principle, will scan the entire index, poor performance; pre-processing the data in the index helps to improve the efficiency of the prefix match, and wildcards and regular expression queries can only be done at query time, although these queries have their scenarios, but with still cautious.