This article first appeared in Internet technology vivo micro-channel public number https://mp.weixin.qq.com/s/AAkVdzmkgdBisuQZldsnvg
the original English text: https://qbox.io/blog/elasticsearch-search-tuning-part-2
Author: Adam Vanderbush
Translator: Yangzhen Tao
table of Contents
- Pre index data
- Mapping
- Avoid using a script
- Forced to merge read-only index
Elasticsearch search tuning Definitive Guide, is one of QBOX posted on his blog a series of articles, this article is the second in the series, introduces the index preprocessing, mapping is established, to avoid the use of scripts, merger or other search index segment performance related tuning method.
This article is Elasticsearch search tuning series three in the first two, the first one found here (click) . This series of tutorials designed to further discussions on Elasticsearch 5.0 and above versions of search tuning techniques, strategies and recommendations.
1. Pre index data
In order to optimize the indexing of data, some of it should be preset mode in the query. For example, if all the documents have a call price of price field, and most of the query execution on a fixed list range range of polymerization, then you can pre-index range to the index and using a terms polymerization, to accelerate the polymerization.
For example, the following documents:
curl -XPUT 'ES_HOST:ES_PORT/index/type/1
?pretty' -H 'Content-Type: application/json' -d '{
"designation": "bowl",
"price": 13
}'
And the search request as follows:
curl -XGET 'ES_HOST:ES_PORT/index/_search
?pretty' -H 'Content-Type: application/json' -d '{
"aggs": {
"price_ranges": {
"range": {
"field": "price",
"ranges": [
{ "to": 10 },
{ "from": 10, "to": 100 },
{ "from": 100 }
]
}
}
}
}'
You can then add a price_range field in the index stage, the field should be mapped to a keyword:
curl -XPUT 'ES_HOST:ES_PORT/index
?pretty' -H 'Content-Type: application/json' -d '{
"mappings": {
"type": {
"properties": {
"price_range": {
"type": "keyword"
}
}
}
}
}'
curl -XPUT 'ES_HOST:ES_PORT/index/type/1
?pretty' -H 'Content-Type: application/json' -d '{
"designation": "bowl",
"price": 13,
"price_range": "10-100"
}'
Next, the search request can be polymerized in this new field, rather than the price performing a range of polymeric field.
curl -XGET 'ES_HOST:ES_PORT/index/_search
?pretty' -H 'Content-Type: application/json' -d '{
"aggs": {
"price_ranges": {
"terms": {
"field": "price_range"
}
}
}
}'
2. Mapping
In fact, some numerical data, does not mean always to be mapped to a numeric field. Typically, those memory such as a ISBN identifier or the like, or any number of other field identifies the records in the database, a keyword may be mapped to a specific mapping to integer or long type better.
Keyword type index for structured content, such as email address, host name, status code, zip code or label.
Typically used to filter (for example, to find all the published blog post), sorting and aggregation. Keywords field can only be obtained by searching its exact value.
If you need to index the full text content such as email content or product descriptions, you may have to use a text field.
Here is an example of a key field mapping:
curl -XPUT 'ES_HOST:ES_PORT/my_index
?pretty' -H 'Content-Type: application/json' -d '{
"mappings": {
"my_type": {
"properties": {
"tags": {
"type": "keyword"
}
}
}
}
}'
From 2.x index import version does not support keyword; instead, they will try to keyword type downgraded to a string type. This supports the consolidation of new maps and old maps. Long-standing index, you must upgrade to 6.x reconstruction of the former version, but offers the opportunity to downgrade map according to their own plans to implement reconstruction.
3. Avoid using a script
In general try to avoid using a script; if you have to use, preference Painless and expression engine.
Painless is a simple and secure scripting language designed specifically for the Elasticsearch use and design of the Elasticsearch default scripting language that can be safely used and stored inline script. About Painless syntax and language features in more detail, please refer Painless language specification.
Please refer to " Painless Scripting in elasticsearch " a deeper understanding of Painless scripting language guide.
- Lucene expression language
Lucene expression would a javascript expression is compiled to bytecode, custom designed for high performance ratings and sorting functions, and supports inline and default storage script.
- performance
Expression with respect to custom Lucene code is concerned , it has a better performance; its performance relative to other scripting engine has a lower cost single document: the expression more "lead."
This allows very fast implementation, especially local than write their own scripts much faster.
- grammar
Expressions support a ja vascript subset of the syntax: a single expression. See the documentation expression modules for supported operators and functions.
Expressions script variables can be accessed are:
- Document fields, such as doc [ 'myfield']. Value
- Supported field variables and methods, such as doc [ 'myfield']. Empty
- Passed to the script parameters, such as mymodifier
- The current document score, _score (when used in script_score effective only)
Expression scripts can be used script_score, script_fields, scripts, and sorting numerical polymerization script, simply to set the parameters to the expression.
4. A combined read-only mandatory index
Read-only index will be very benefit after the merger into a single paragraph. A typical case is based on an index of time: the current index only time will become a new document window, while the old index to be read-only.
Forced merger API support through API forced to merge one or more indexes. Combined with each shard Lucene number of related segment index. Forced to reduce the number of merge operation supported by merging segments.
The call will be completed before the merger is blocked. If the http connection is broken, the request will continue in the background, before a forced merger front, all new requests will be blocked.
curl _XPOST 'ES_HOST:ES_POST/twitter/_forcemer
ge?pretty'
The combined force API accepts requests following parameters:
- max_num_segments - the number of segments to be merged. To fully merge the index can be set to 1. The default will simply check whether a merger need to be performed, and if so, will be executed.
- only_expunge_deletes - a merger of the schemes are merely erasing section contains deleted. In Lucene , the document does not directly delete a section from just marked for deletion. In a segment of the merger process, a new section could be created, this new section does not include those deleted. This tag supports only parameter associated with the deleted segment, and the default is false . Note that this does not override the threshold index.merge.policy.expunge_deletes_allowed.
- flush - whether executed after the forced merger flush , defaults to true .
More Stay tuned vivo Internet technology micro-channel public number
Note: Please reprint the article with the Micro Signal: labs2020 contact.