Elasticsearch search tuning Definitive Guide (2/3)

This article first appeared in Internet technology vivo micro-channel public number https://mp.weixin.qq.com/s/AAkVdzmkgdBisuQZldsnvg
the original English text: https://qbox.io/blog/elasticsearch-search-tuning-part-2
Author: Adam Vanderbush
Translator: Yangzhen Tao

table of Contents

  1. Pre index data
  2. Mapping
  3. Avoid using a script
  4. Forced to merge read-only index

Elasticsearch search tuning Definitive Guide, is one of QBOX posted on his blog a series of articles, this article is the second in the series, introduces the index preprocessing, mapping is established, to avoid the use of scripts, merger or other search index segment performance related tuning method.

This article is Elasticsearch search tuning series three in the first two, the first one found here (click) . This series of tutorials designed to further discussions on Elasticsearch 5.0 and above versions of search tuning techniques, strategies and recommendations.

1. Pre index data

In order to optimize the indexing of data, some of it should be preset mode in the query. For example, if all the documents have a call price of price field, and most of the query execution on a fixed list range range of polymerization, then you can pre-index range to the index and using a terms polymerization, to accelerate the polymerization.

For example, the following documents:

curl -XPUT 'ES_HOST:ES_PORT/index/type/1
?pretty' -H 'Content-Type: application/json' -d '{
 "designation": "bowl",
 "price": 13
}'

And the search request as follows:

curl -XGET 'ES_HOST:ES_PORT/index/_search
?pretty' -H 'Content-Type: application/json' -d '{
 "aggs": {
   "price_ranges": {
     "range": {
       "field": "price",
       "ranges": [
         { "to": 10 },
         { "from": 10, "to": 100 },
         { "from": 100 }
       ]
     }
   }
 }
}'

You can then add a price_range field in the index stage, the field should be mapped to a keyword:

curl -XPUT 'ES_HOST:ES_PORT/index
?pretty' -H 'Content-Type: application/json' -d '{
 "mappings": {
   "type": {
     "properties": {
       "price_range": {
         "type": "keyword"
       }
     }
   }
 }
}'

curl -XPUT 'ES_HOST:ES_PORT/index/type/1
?pretty' -H 'Content-Type: application/json' -d '{
 "designation": "bowl",
 "price": 13,
 "price_range": "10-100"
}'

Next, the search request can be polymerized in this new field, rather than the  price  performing a range of polymeric field.

curl -XGET 'ES_HOST:ES_PORT/index/_search
?pretty' -H 'Content-Type: application/json' -d '{
 "aggs": {
   "price_ranges": {
     "terms": {
       "field": "price_range"
     }
   }
 }
}'

2. Mapping

In fact, some numerical data, does not mean always to be mapped to a numeric field. Typically, those memory such as a  ISBN  identifier or the like, or any number of other field identifies the records in the database, a keyword may be mapped to a specific mapping to  integer  or  long  type better.

Keyword type index for structured content, such as  email  address, host name, status code, zip code or label.

Typically used to filter (for example, to find all the published blog post), sorting and aggregation. Keywords field can only be obtained by searching its exact value.

If you need to index the full text content such as  email  content or product descriptions, you may have to use a text field.

Here is an example of a key field mapping:

curl -XPUT 'ES_HOST:ES_PORT/my_index
?pretty' -H 'Content-Type: application/json' -d '{
 "mappings": {
   "my_type": {
     "properties": {
       "tags": {
         "type":  "keyword"
       }
     }
   }
 }
}'

From  2.x  index import version does not support keyword; instead, they will try to  keyword  type downgraded to a  string  type. This supports the consolidation of new maps and old maps. Long-standing index, you must upgrade to 6.x  reconstruction of the former version, but offers the opportunity to downgrade map according to their own plans to implement reconstruction.

3. Avoid using a script

In general try to avoid using a script; if you have to use, preference  Painless  and expression engine.

Painless  is a simple and secure scripting language designed specifically for the  Elasticsearch  use and design of the  Elasticsearch  default scripting language that can be safely used and stored inline script. About  Painless  syntax and language features in more detail, please refer  Painless  language specification.

Please refer to "  Painless Scripting in elasticsearch  " a deeper understanding of  Painless  scripting language guide.

  • Lucene expression language

Lucene  expression would a  javascript  expression is compiled to bytecode, custom designed for high performance ratings and sorting functions, and supports  inline  and default storage script.

  • performance

Expression with respect to custom  Lucene  code is concerned , it has a better performance; its performance relative to other scripting engine has a lower cost single document: the expression more "lead."

This allows very fast implementation, especially local than write their own scripts much faster.

  • grammar

Expressions support a  ja vascript  subset of the syntax: a single expression. See the documentation expression modules for supported operators and functions.

Expressions script variables can be accessed are:

  • Document fields, such as doc [ 'myfield']. Value
  • Supported field variables and methods, such as doc [ 'myfield']. Empty
  • Passed to the script parameters, such as mymodifier
  • The current document score, _score (when used in script_score effective only)

Expression scripts can be used script_score, script_fields, scripts, and sorting numerical polymerization script, simply to set the parameters to the expression.

4. A combined read-only mandatory index

Read-only index will be very benefit after the merger into a single paragraph. A typical case is based on an index of time: the current index only time will become a new document window, while the old index to be read-only.

Forced merger  API  support through  API  forced to merge one or more indexes. Combined with each shard  Lucene number of related segment index. Forced to reduce the number of merge operation supported by merging segments.

The call will be completed before the merger is blocked. If the  http  connection is broken, the request will continue in the background, before a forced merger front, all new requests will be blocked.

curl _XPOST 'ES_HOST:ES_POST/twitter/_forcemer
ge?pretty'

The combined force  API  accepts requests following parameters:

  • max_num_segments - the number of segments to be merged. To fully merge the index can be set to 1. The default will simply check whether a merger need to be performed, and if so, will be executed.
  • only_expunge_deletes - a merger of the schemes are merely erasing section contains deleted. In  Lucene  , the document does not directly delete a section from just marked for deletion. In a segment of the merger process, a new section could be created, this new section does not include those deleted. This tag supports only parameter associated with the deleted segment, and the default is false . Note that this does not override the threshold index.merge.policy.expunge_deletes_allowed.
  • flush - whether executed after the forced merger  flush , defaults  to true .

More Stay tuned  vivo Internet technology  micro-channel public number

 

Note: Please reprint the article with the Micro Signal: labs2020  contact.

Guess you like

Origin www.cnblogs.com/vivotech/p/11130716.html