Elasticsearch常见实用配置与说明(二)

一、常用settings

{
    
    
"settings": {
    
    
    "number_of_shards": "3",          // 主分片的个数
    "number_of_replicas": "1",        // 主分片的副本个数
    "refresh_interval": "5s"          // index-buffer刷新时间
  }
}

ES索引最常用的设置就是分片数量和副本的数量了，还有一个refresh的时间，关于refresh是怎样的操作，可以参考Elasticsearch倒排索引与文档添加原理

参数	说明
index.number_of_replicas	每个主分片的副本数，默认为1
index.number_of_shards	主分片数，只能在创建索引时设置，不能修改
index.auto_expand_replicas	基于可用节点的数量自动分配副本数量,默认为false
index.refresh_interval	refresh频率，默认为1s，-1以禁用刷新
index.max_result_window	搜索时from+size的最大值，默认为10000
index.blocks.read_only	true索引和索引元数据为只读，false允许写入和元数据更改
index.blocks.read	true禁用索引读取操作
index.blocks.write	true禁用索引写入操作
index.blocks.metadata	true禁用索引元数据读取和写入

refresh_interval设置为-1禁用刷新一般在迁移数据需要大批量的添加文档的时候有用。

二、translog相关settings

{
    
    
"settings": {
    
    
    "translog": {
    
    
      "flush_threshold_size": "2gb",//translog到达2gb刷新
      "sync_interval": "30s",//30s刷新一次
      "durability": "async"//异步刷新
    }
  }
}

ES的translog部分的设置主要影响的是日志落盘，要想保证数据的不丢失，就要使用同步的方式。

参数	说明
index.translog.flush_threshold_ops	多少次操作时执行一次flush，默认是unlimited
index.translog.flush_threshold_size	translog大小达到此值时flush，默认是512mb
index.translog.flush_threshold_period	在该时间内至少有一次flush，默认是30m
index.translog.interval	多少时间间隔内会检查一次translog大小，默认是5s

三、分析相关settings

{
    
    
"settings": {
    
    
    "analysis": {
    
    
         "char_filter": {
    
    },             // 字符过滤器
         "tokenizer":   {
    
    },             // 分词器
         "filter":  {
    
    },                 // 标记过滤器
         "analyzer": {
    
    },                // 分析器
         "normalizer":{
    
    }                // 规范化
    }
  }
}

ES分析配置

ES添加文档的时候一个重要的步骤就是analysis，analysis会执行字符过滤，分词、规范化、token过滤等操作，这些组件都可以在analysis中配置。

关于analysis的配置实例可以看后面一小节，关于analysis中配置每一个组件可以参考前面提到的文章。

四、完整配置示例

{
    
    
  "settings": {
    
    
    "number_of_shards": "3",
    "number_of_replicas": "1",
    "refresh_interval": "5s",
    "translog": {
    
    
      "flush_threshold_size": "256mb",
      "sync_interval": "30s",
      "durability": "async"
    },
    "analysis": {
    
    
      "analyzer": {
    
    
        "my_analyzer": {
    
    
          "char_filter": ["html_strip", "&_to_and", "replace_dot"],
          "filter": ["lowercase", "filter_stop_one", "filter_stop_two"],
          "tokenizer": "my_tokenizer",
          "type": "custom"
        }
      },
      "char_filter": {
    
    
        "&_to_and": {
    
    
          "mappings": ["&=>and"],
          "type": "mapping"
        },
        "replace_dot": {
    
    
          "pattern": "\\.",
          "replacement": " ",
          "type": "pattern_replace"
        }
      },
      "filter": {
    
    
        "filter_stop_one": {
    
    
          "stopwords": "_spanish_",
          "type": "stop"
        },
        "filter_stop_two": {
    
    
          "stopwords": ["the", "a"],
          "type": "stop"
        }
      },
      "normalizer": {
    
    
        "my_normalizer": {
    
    
          "char_filter": [],
          "filter": ["lowercase", "asciifolding"],
          "type": "custom"
        }
      },
      "tokenizer": {
    
    
        "my_tokenizer": {
    
    
          "type": "standard",
          "max_token_length": 5
        }
      }
    }
  }
}