Elasticsearch common practical configuration and instructions (2)

1. Common settings

{
    
    
"settings": {
    
    
    "number_of_shards": "3",          // 主分片的个数
    "number_of_replicas": "1",        // 主分片的副本个数
    "refresh_interval": "5s"          // index-buffer刷新时间
  }
}

The most commonly used settings for ES index are the number of shards and the number of replicas, and there is a refresh time. For how refresh is operated, please refer to Elasticsearch inverted index and document adding principle

parameter Description
index.number_of_replicas The number of replicas of each primary shard, the default is 1
index.number_of_shards The number of primary shards can only be set when the index is created and cannot be modified
index.auto_expand_replicas Automatically allocate the number of replicas based on the number of available nodes, the default is false
index.refresh_interval refresh frequency, the default is 1s, -1 to disable refresh
index.max_result_window The maximum value of from+size when searching, the default is 10000
index.blocks.read_only True index and index metadata are read-only, false allows writing and metadata changes
index.blocks.read true disables index read operations
index.blocks.write true disables index write operations
index.blocks.metadata true disables index metadata reading and writing

Refresh_interval is set to -1 to disable refresh. Generally, it is useful when migrating data needs to add documents in bulk.

Two, translog related settings

{
    
    
"settings": {
    
    
    "translog": {
    
    
      "flush_threshold_size": "2gb",//translog到达2gb刷新
      "sync_interval": "30s",//30s刷新一次
      "durability": "async"//异步刷新
    }
  }
}

The setting of the translog part of ES mainly affects the log placement. If you want to ensure that the data is not lost, you must use the synchronization method.

parameter Description
index.translog.flush_threshold_ops How many operations do flush once, the default is unlimited
index.translog.flush_threshold_size Flush when the translog size reaches this value, the default is 512mb
index.translog.flush_threshold_period There is at least one flush in this time, the default is 30m
index.translog.interval How many time intervals will the translog size be checked once, the default is 5s

Three, analyze related settings

{
    
    
"settings": {
    
    
    "analysis": {
    
    
         "char_filter": {
    
    },             // 字符过滤器
         "tokenizer":   {
    
    },             // 分词器
         "filter":  {
    
    },                 // 标记过滤器
         "analyzer": {
    
    },                // 分析器
         "normalizer":{
    
    }                // 规范化
    }
  }
}

ES analysis configuration

When ES adds documents, an important step is analysis. Analysis will perform character filtering, word segmentation, normalization, token filtering and other operations. These components can all be configured in analysis.

For the configuration example of analysis, please refer to the following section. For the configuration of each component in analysis, please refer to the article mentioned above.

Four, complete configuration example

{
    
    
  "settings": {
    
    
    "number_of_shards": "3",
    "number_of_replicas": "1",
    "refresh_interval": "5s",
    "translog": {
    
    
      "flush_threshold_size": "256mb",
      "sync_interval": "30s",
      "durability": "async"
    },
    "analysis": {
    
    
      "analyzer": {
    
    
        "my_analyzer": {
    
    
          "char_filter": ["html_strip", "&_to_and", "replace_dot"],
          "filter": ["lowercase", "filter_stop_one", "filter_stop_two"],
          "tokenizer": "my_tokenizer",
          "type": "custom"
        }
      },
      "char_filter": {
    
    
        "&_to_and": {
    
    
          "mappings": ["&=>and"],
          "type": "mapping"
        },
        "replace_dot": {
    
    
          "pattern": "\\.",
          "replacement": " ",
          "type": "pattern_replace"
        }
      },
      "filter": {
    
    
        "filter_stop_one": {
    
    
          "stopwords": "_spanish_",
          "type": "stop"
        },
        "filter_stop_two": {
    
    
          "stopwords": ["the", "a"],
          "type": "stop"
        }
      },
      "normalizer": {
    
    
        "my_normalizer": {
    
    
          "char_filter": [],
          "filter": ["lowercase", "asciifolding"],
          "type": "custom"
        }
      },
      "tokenizer": {
    
    
        "my_tokenizer": {
    
    
          "type": "standard",
          "max_token_length": 5
        }
      }
    }
  }
}

Five, documentation

analyzed

analysis-tokenizers

Custom analyzer

Custom tokenizer

Character filter

Guess you like

Origin blog.csdn.net/trayvontang/article/details/103550755