Elasticsearch基础篇(四):Elasticsearch7.x的官方文档学习(Set up Elasticsearch)

Set up Elasticsearch

1 Configuring Elasticsearch(配置 Elasticsearch)

1.1 Setting JVM Options(设置JVM选项)

You should rarely need to change Java Virtual Machine (JVM) options.
If you do, the most likely change is setting the heap size.

通常情况下,您应该很少需要更改Java虚拟机(JVM)选项。如果需要更改,最常见的更改是设置堆大小。

The remainder of this document explains in detail how to set JVM options. You can set options either with jvm.options files or with the ES_JAVA_OPTS environment variable.

本文的其余部分将详细解释如何设置JVM选项。您可以使用jvm.options文件或ES_JAVA_OPTS环境变量来设置选项。

The preferred method of setting or overriding JVM options is via JVM options files. When installing from the tar or zip distributions, the root jvm.options configuration file is config/jvm.options and custom JVM options files can be added to config/jvm.options.d/.

首选设置或覆盖JVM选项的方法是使用JVM选项文件。在从tar或zip分发包安装时,根jvm.options配置文件位于config/jvm.options,而自定义的JVM选项文件可以添加到config/jvm.options.d/中。

When installing from the Debian or RPM packages, the root jvm.options configuration file is /etc/elasticsearch/jvm.options and custom JVM options files can be added to /etc/elasticsearch/jvm.options.d/.

在从Debian或RPM包安装时,根jvm.options配置文件位于/etc/elasticsearch/jvm.options,而自定义的JVM选项文件可以添加到/etc/elasticsearch/jvm.options.d/中。

When using the Docker distribution of Elasticsearch you can bind mount custom JVM options files into /usr/share/elasticsearch/config/jvm.options.d/. You should never need to modify the root jvm.options file instead preferring to use custom JVM options files. The processing ordering of custom JVM options is lexicographic.

在使用Elasticsearch的Docker分发时,您可以将自定义的JVM选项文件绑定挂载到/usr/share/elasticsearch/config/jvm.options.d/中。您永远不需要修改根jvm.options文件,而是应该使用自定义的JVM选项文件。自定义JVM选项的处理顺序是按字母顺序的。

JVM options files must have the suffix .options and contain a line-delimited list of JVM arguments following a special syntax:

JVM选项文件必须具有后缀.options,并包含遵循特定语法的逐行列出的JVM参数:

  • lines consisting of whitespace only are ignored
    仅包含空白的行将被忽略

  • lines beginning with # are treated as comments and are ignored
    以#开头的行将被视为注释并被忽略

  • lines beginning with a - are treated as a JVM option that applies independent of the version of the JVM
    以-开头的行将被视为独立于JVM版本的JVM选项

  • lines beginning with a number followed by a : followed by a - are treated as a JVM option that applies only if the version of the JVM matches the number
    以数字开头,后跟:和-的行将被视为仅在JVM版本匹配该数字时适用的JVM选项

  • lines beginning with a number followed by a - followed by a : are treated as a JVM option that applies only if the version of the JVM is greater than or equal to the number
    以数字开头,后跟-和:的行将被视为仅在JVM版本大于或等于该数字时适用的JVM选项

  • lines beginning with a number followed by a - followed by a number followed by a : are treated as a JVM option that applies only if the version of the JVM falls in the range of the two numbers
    以数字开头,后跟-、数字和:的行将被视为仅在JVM版本落在两个数字范围内时适用的JVM选项

  • all other lines are rejected
    所有其他行将被拒绝

An alternative mechanism for setting Java Virtual Machine options is via the ES_JAVA_OPTS environment variable. For instance:

设置Java虚拟机选项的另一种机制是使用ES_JAVA_OPTS环境变量。例如:

export ES_JAVA_OPTS="$ES_JAVA_OPTS -Djava.io.tmpdir=/path/to/temp/dir"
./bin/elasticsearch

When using the RPM or Debian packages, ES_JAVA_OPTS can be specified in the system configuration file.

在使用RPM或Debian软件包时,可以在系统配置文件中指定ES_JAVA_OPTS。

The JVM has a built-in mechanism for observing the JAVA_TOOL_OPTIONS environment variable. We intentionally ignore this environment variable in our packaging scripts. The primary reason for this is that on some OS (e.g., Ubuntu) there are agents installed by default via this environment variable that we do not want interfering with Elasticsearch.

JVM具有内置机制来观察JAVA_TOOL_OPTIONS环境变量。我们在打包脚本中故意忽略了此环境变量。主要原因是在某些操作系统(例如Ubuntu)上,通过此环境变量默认安装了我们不希望与Elasticsearch干扰的代理程序。

Additionally, some other Java programs support the JAVA_OPTS environment variable. This is not a mechanism built into the JVM but instead a convention in the ecosystem. However, we do not support this environment variable, instead supporting setting JVM options via the jvm.options file or the environment variable ES_JAVA_OPTS as above.

此外,其他一些Java程序支持JAVA_OPTS环境变量。这不是JVM内置的机制,而是生态系统中的一种约定。然而,我们不支持此环境变量,而是支持通过jvm.options文件或上述的ES_JAVA_OPTS环境变量设置JVM选项。

1.2 Secure Settings(安全设置)

Introduction(介绍)

In Elasticsearch, there are certain settings that are sensitive and cannot rely solely on file system permissions for protection. To address this, Elasticsearch provides a keystore along with the elasticsearch-keystore tool for managing these sensitive settings.

在 Elasticsearch 中,存在一些敏感设置,不能仅仅依赖文件系统权限来保护其值。为了解决这个问题,Elasticsearch 提供了一个密钥库(keystore)以及 elasticsearch-keystore 工具来管理这些敏感设置。

Using the Keystore(使用密钥库)

Only some settings are designed to be read from the keystore, and it’s important to note that the keystore does not validate unsupported settings. If you add unsupported settings to the keystore, Elasticsearch may fail to start. To determine if a setting can be configured in the keystore, look for the “Secure” qualifier in the setting’s documentation.

只有一些设置允许从密钥库中读取,并且需要注意的是密钥库不会验证不受支持的设置。如果将不受支持的设置添加到密钥库中,Elasticsearch 可能无法启动。要确定某个设置是否可以在密钥库中配置,请查看设置文档中是否包含 “Secure” 修饰符。

Applying Changes(应用更改)

All modifications made to the keystore take effect only after Elasticsearch is restarted. These settings, similar to regular configurations in the elasticsearch.yml file, must be specified on each node in the cluster. Currently, all secure settings are node-specific and must have the same values on every node.

所有对密钥库的修改只有在重新启动 Elasticsearch 后才会生效。这些设置与 elasticsearch.yml 文件中的常规配置类似,必须在集群中的每个节点上指定。目前,所有安全设置都是特定于节点的,每个节点上的值必须相同。

Reloadable Secure Settings(可重新加载的安全设置)

Similar to the settings in elasticsearch.yml, changes to the keystore contents are not automatically applied to a running Elasticsearch node. Re-reading settings requires a node restart. However, specific secure settings are marked as reloadable, which means they can be re-read and applied on a running node.

elasticsearch.yml 中的设置类似,对密钥库内容的更改不会自动应用到正在运行的 Elasticsearch 节点上。重新读取设置需要重新启动节点。但是,某些安全设置标记为可重新加载,这意味着它们可以在运行中的节点上重新读取和应用。

Reloading Secure Settings(重新加载安全设置)

To apply changes to reloadable secure settings on a running Elasticsearch node, use the bin/elasticsearch-keystore add command to make the desired changes and then execute a POST request to the /_nodes/reload_secure_settings endpoint, specifying the password for the keystore.

要在运行中的 Elasticsearch 节点上应用对可重新加载的安全设置的更改,使用 bin/elasticsearch-keystore add 命令进行所需的更改,然后执行一个 POST 请求到 /_nodes/reload_secure_settings 端点,并指定密钥库的密码。

Example - 示例:

POST _nodes/reload_secure_settings
{
    
    
  "secure_settings_password": "your_password_here"
}

This API decrypts and re-reads the entire keystore on every cluster node, but only the reloadable secure settings are applied. Changes to other settings do not take effect until the next restart. Once the API call returns, the reload is complete, and all internal data structures dependent on these settings have been updated.

此 API 在每个集群节点上解密并重新读取整个密钥库,但只会应用可重新加载的安全设置。对于其他设置的更改将在下次重新启动后生效。一旦 API 调用返回,重新加载已经完成,这意味着所有依赖这些设置的内部数据结构都已更新。

Reloadable Secure Settings(可重载的安全设置)

Certain secure settings are marked as reloadable and can be re-read and applied on a running node. However, all secure settings, whether reloadable or not, must have identical values across all cluster nodes.

某些安全设置标记为可重新加载,可以在运行中的节点上重新读取和应用。但是,所有安全设置,无论是否可重新加载,都必须在集群中的所有节点上具有相同的值。

There are reloadable secure settings for:

  • The Azure repository plugin
  • The EC2 discovery plugin
  • The GCS repository plugin
  • The S3 repository plugin
  • Monitoring settings

目前,以下插件具有可重新加载的安全设置:

  • Azure 存储库插件
  • EC2 发现插件
  • GCS 存储库插件
  • S3 存储库插件
  • 监控设置

1.3 审计安全设置(不常用)

You can use audit logging to record security-related events, such as authentication failures, refused connections, and data-access events.

您可以使用审计日志记录与安全相关的事件,例如身份验证失败、拒绝连接和数据访问事件。

If configured, auditing settings must be set on every node in the cluster. Static settings, such as xpack.security.audit.enabled, must be configured in elasticsearch.yml on each node. For dynamic auditing settings, use the cluster update settings API to ensure the setting is the same on all nodes.

如果已配置审计设置,则必须在集群中的每个节点上设置这些设置。静态设置(例如 xpack.security.audit.enabled)必须在每个节点的 elasticsearch.yml 文件中进行配置。对于动态审计设置,请使用集群更新设置 API,以确保所有节点上的设置相同。

通用审计设置

  • xpack.security.audit.enabled
    (Static)Set to true to enable auditing on the node. The default value is false. This puts the auditing events in a dedicated file named _audit.json on each node.

(静态)设置为 true 以在节点上启用审计。默认值为 false。这将把审计事件记录在每个节点上的专用文件中,文件名为 <clustername>_audit.json

If enabled, this setting must be configured in elasticsearch.yml on all nodes in the cluster.

如果启用此设置,则必须在集群中的所有节点的 elasticsearch.yml 文件中进行配置。

审计事件设置

The events and some other information about what gets logged can be controlled by using the following settings:

可以使用以下设置来控制所记录的事件以及其他一些信息:

  • xpack.security.audit.logfile.events.include
    (Dynamic) Specifies which events to include in the auditing output. The default value is: access_denied, access_granted, anonymous_access_denied, authentication_failed, connection_denied, tampered_request, run_as_denied, run_as_granted.

(动态)指定要包括在审计输出中的事件。默认值为:access_denied, access_granted, anonymous_access_denied, authentication_failed, connection_denied, tampered_request, run_as_denied, run_as_granted

  • xpack.security.audit.logfile.events.exclude
    (Dynamic) Excludes the specified events from the output. By default, no events are excluded.

(动态)排除输出中的指定事件。默认情况下,不排除任何事件。

  • xpack.security.audit.logfile.events.emit_request_body
    (Dynamic) Specifies whether to include the request body from REST requests on certain event types such as authentication_failed. The default value is false.

(动态)指定是否在某些事件类型(如 authentication_failed)的审计事件中包括 REST 请求的请求正文。默认值为 false。

No filtering is performed when auditing, so sensitive data may be audited in plain text when including the request body in audit events.

在审计时不执行筛选,因此如果在审计事件中包括请求正文,则可能以纯文本形式记录敏感数据。

本地节点信息设置

  • xpack.security.audit.logfile.emit_node_name
    (Dynamic) Specifies whether to include the node name as a field in each audit event. The default value is false.

(动态)指定是否在每个审计事件中将节点名称包含为字段。默认值为 false。

  • xpack.security.audit.logfile.emit_node_host_address
    (Dynamic) Specifies whether to include the node’s IP address as a field in each audit event. The default value is false.

(动态)指定是否在每个审计事件中将节点的 IP 地址包含为字段。默认值为 false。

  • xpack.security.audit.logfile.emit_node_host_name
    (Dynamic) Specifies whether to include the node’s host name as a field in each audit event. The default value is false.

(动态)指定是否在每个审计事件中将节点的主机名包含为字段。默认值为 false。

  • xpack.security.audit.logfile.emit_node_id
    (Dynamic) Specifies whether to include the node id as a field in each audit event. This is available for the new format only. That is to say, this information does not exist in the _access.log file. Unlike node name, whose value might change if the administrator changes the setting in the config file, the node id will persist across cluster restarts and the administrator cannot change it. The default value is true.

(动态)指定是否在每个审计事件中将节点 ID 包含为字段。仅适用于新格式。也就是说,在 <clustername>_access.log 文件中不存在这些信息。与节点名称不同,如果管理员更改配置文件中的设置,则节点 ID 的值可能会发生变化,但节点 ID 将在集群重新启动时保持不变,管理员无法更改它。默认值为 true。

审计日志文件事件忽略策略

These settings affect the ignore policies that enable fine-grained control over which audit events are printed to the log file. All of the settings with the same policy name combine to form a single policy. If an event matches all of the conditions for a specific policy, it is ignored and not printed.

这些设置影响忽略策略,可对要打印到日志文件的审计事件进行细粒度控制。所有具有相同策略名称的设置将合并为一个单一策略。如果事件符合特定策略的所有条件,它将被忽略 并且不会被打印。

1.4 断路器(不常用)

Elasticsearch contains multiple circuit breakers used to prevent operations from causing an OutOfMemoryError. Each breaker specifies a limit for how much memory it can use. Additionally, there is a parent-level breaker that specifies the total amount of memory that can be used across all breakers.

除非另有说明,这些设置可以使用 cluster-update-settings API 在运行中的集群上动态更新。

Parent Circuit Breaker(父级断路器)

The parent-level breaker can be configured with the following settings:

父级断路器可以配置以下设置:

  1. indices.breaker.total.use_real_memory(Static)(静态):Determines whether the parent breaker should take real memory usage into account (true) or only consider the amount that is reserved by child circuit breakers (false). Defaults to true.(确定父断路器是否应考虑实际内存使用情况(true)或仅考虑子断路器保留的内存量(false)。默认为 true。)

  2. indices.breaker.total.limit(Dynamic)(动态):Starting limit for the overall parent breaker. Defaults to 70% of JVM heap if indices.breaker.total.use_real_memory is false. If indices.breaker.total.use_real_memory is true, defaults to 95% of the JVM heap.(整体父断路器的起始限制。如果 indices.breaker.total.use_real_memory 为 false,则默认为 JVM 堆的 70%。如果 indices.breaker.total.use_real_memory 为 true,则默认为 JVM 堆的 95%。)

Field Data Circuit Breaker(字段数据断路器)

The field data circuit breaker estimates the heap memory required to load a field into the field data cache. If loading the field would cause the cache to exceed a predefined memory limit, the circuit breaker stops the operation and returns an error.

字段数据断路器估算了将字段加载到字段数据缓存中所需的堆内存。如果加载字段会导致缓存超过预定义的内存限制,断路器将停止操作并返回错误。

  1. indices.breaker.fielddata.limit(Dynamic)(动态):Limit for fielddata breaker. Defaults to 40% of JVM heap.(字段数据断路器的限制。默认为 JVM 堆的 40%。)

  2. indices.breaker.fielddata.overhead(Dynamic)(动态):A constant that all field data estimations are multiplied with to determine a final estimation. Defaults to 1.03.(用于确定最终估算的所有字段数据估算的乘法常数。默认为 1.03。)

Request Circuit Breaker(请求断路器)

The request circuit breaker allows Elasticsearch to prevent per-request data structures (for example, memory used for calculating aggregations during a request) from exceeding a certain amount of memory.

请求断路器允许 Elasticsearch 防止每个请求的数据结构(例如,在请求期间用于计算聚合的内存)超过一定的内存量。

  1. indices.breaker.request.limit(Dynamic)(动态):Limit for request breaker, defaults to 60% of JVM heap.(请求断路器的限制,默认为 JVM 堆的 60%。)

  2. indices.breaker.request.overhead(Dynamic)(动态):A constant that all request estimations are multiplied with to determine a final estimation. Defaults to 1.(用于确定最终估算的所有请求估算的乘法常数。默认为 1。)

In-flight Requests Circuit Breaker(正在处理的请求断路器)

The in-flight requests circuit breaker allows Elasticsearch to limit the memory usage of all currently active incoming requests on transport or HTTP level from exceeding a certain amount of memory on a node. The memory usage is based on the content length of the request itself. This circuit breaker also considers that memory is not only needed for representing the raw request but also as a structured object which is reflected by default overhead.

正在处理的请求断路器允许 Elasticsearch 限制所有当前活动的传入请求在传输或 HTTP 级别上使用的内存量,确保不超过节点上的一定内存量。内存使用基于请求本身的内容长度,还考虑了内存不仅需要表示原始请求,还需要表示结构化对象的情况,这是默认开销的一部分。

  1. network.breaker.inflight_requests.limit(Dynamic)(动态):Limit for in-flight requests breaker, defaults to 100% of JVM heap. This means that it is bound by the limit configured for the parent circuit breaker.(正在处理的请求断路器的限制,默认为 JVM 堆的 100%。这意味着它受到配置为父断路器的限制的约束。)

  2. network.breaker.inflight_requests.overhead(Dynamic)(动态):A constant that all in-flight requests estimations are multiplied with to determine a final estimation. Defaults to 2.(用于确定最终估算的所有正在处理的请求估算的乘法常数。默认为 2。)

Accounting Requests Circuit Breaker(记账请求断路器)

The accounting circuit breaker allows Elasticsearch to limit the memory usage of things held in memory that are not released when a request is completed. This includes things like the Lucene segment memory.

记账断路器允许 Elasticsearch 限制在请求完成时未释放的内存中保存的内存使用情况,包括诸如 Lucene 段内存之类的内容。

  1. indices.breaker.accounting.limit(Dynamic)(动态):Limit for accounting breaker, defaults to 100% of JVM heap. This means that it is bound by the limit configured for the parent circuit breaker.(记账断路器的限制,默认为 JVM 堆的 100%。这意味着它受到配置为父断路器的限制的约束。)

  2. indices.breaker.accounting.overhead(Dynamic)(动态):A constant that all accounting estimations are multiplied with to determine a final estimation. Defaults to 1.(用于确定最终估算的所有记账估算的乘法常数。默认为 1。)

Script Compilation Circuit Breaker(脚本编译断路器)

Slightly different than the previous memory-based circuit breaker, the script compilation circuit breaker limits the number of inline script compilations within a period of time.

与以前基于内存的断路器略有不同,脚 本编译断路器限制了一段时间内内联脚本编译的数量。

1.5 Cluster-level shard allocation and routing(集群级别的分片分配和路由)

https://www.elastic.co/guide/en/elasticsearch/reference/7.9/modules-cluster.html

1.6 Discovery and cluster formation settings(发现和集群形成设置)

当配置Elasticsearch的发现和集群形成时,有一些重要的设置需要考虑。以下是这些设置的详细说明:

discovery.seed_hosts(静态): 这个设置提供了集群中主节点的地址列表。它可以是一个包含用逗号分隔的地址的字符串,也可以是一个包含地址的列表。每个地址的格式为host:port或host。host可以是需要通过DNS解析的主机名、IPv4地址或IPv6地址。IPv6地址必须用方括号括起来。如果主机名通过DNS解析为多个地址,Elasticsearch会使用所有这些地址。DNS查找受JVM DNS缓存的影响。如果没有指定端口,那么将按顺序检查以下设置来确定端口:

  • transport.profiles.default.port
  • transport.port

如果都没有设置,那么默认端口为9300。discovery.seed_hosts的默认值是[“127.0.0.1”, “[::1]”]。

discovery.seed_providers(静态): 这个设置指定要使用哪种类型的种子主机提供程序来获取用于启动发现过程的种子节点的地址。默认情况下,它是基于设置的种子主机提供程序,它从discovery.seed_hosts设置中获取种子节点的地址。这个设置以前被称为discovery.zen.hosts_provider。它的旧名称已经不推荐使用,但为了保持向后兼容性,仍然有效。以后的版本将不再支持旧名称。

discovery.type(静态): 这个设置指定Elasticsearch是否应该形成一个多节点集群。默认情况下,Elasticsearch在形成集群时会发现其他节点,并允许其他节点随后加入集群。如果将discovery.type设置为single-node,Elasticsearch将形成一个单节点集群,并抑制由cluster.publish.timeout和cluster.join.timeout设置的超时。有关何时使用此设置的更多信息,请参阅Single-node discovery。

cluster.initial_master_nodes: 这个设置用于设置全新集群中的初始主节点集。默认情况下,此列表为空,这意味着该节点期望加入已经引导的集群。专家级别的设置

除了上述设置外,还有一些专家级别的设置会影响发现和集群形成。尽管不建议更改这些设置的默认值,但如果您调整了这些设置,可能会导致您的集群无法正确形成或对某些故障变得不稳定。

discovery.cluster_formation_warning_timeout(静态): 设置节点在尝试形成集群之前等待多长时间,然后记录一个警告,说明集群没有形成。默认为10秒。如果在discovery.cluster_formation_warning_timeout经过后仍未形成集群,节点将记录一条以"master not discovered"开头的警告消息,描述了发现过程的当前状态。

discovery.find_peers_interval(静态): 设置节点在尝试另一个发现轮之前等待的时间。默认为1秒。

discovery.probe.connect_timeout(静态): 设置节点在尝试连接到每个地址时等待的时间。默认为3秒。

discovery.probe.handshake_timeout(静态): 设置节点在尝试通过握手识别远程节点时等待的时间。默认为1秒。

discovery.request_peers_timeout(静态): 设置节点在再次请求其对等节点后等待的时间,然后考虑请求失败。默认为3秒。

discovery.seed_resolver.max_concurrent_resolvers(静态): 指定解析种子节点地址时要执行的并发DNS查找数量。默认为10。这个设置以前被称为discovery.zen.ping.unicast.concurrent_connects。它的旧名称已经不推荐使用,但为了保持向后兼容性,仍然有效。以后的版本将不再支持旧名称。

discovery.seed_resolver.timeout(静态): 指定解析种子节点地址时要等待每个DNS查找的时间。默认为5秒。这个设置以前被称为discovery.zen.ping.unicast.hosts.resolve_timeout。它的旧名称已经不推荐使用,但为了保持向后兼容性,仍然有效。以后的版本将不再支持旧名称。

cluster.auto_shrink_voting_configuration(动态): 控制投票配置是否自动减少已离开的节点,只要它仍然包含至少3个节点。默认值为true。如果设置为false,投票配置永远不会自动减少,您必须使用投票配置排除API手动删除已离开的节点。

cluster.election.back_off_time(静态): 设置在每次选举失败后等待增加等待时间上限的时间量。请注意,这是线性的回退。默认为100毫秒。如果从默认值更改此设置,可能会导致您的集群无法选举出主节点。

cluster.election.duration(静态): 设置每次选举允许的持续时间,然后节点会认为选举失败并安排重试。默认为500毫秒。如果从默认值更改此设置,可能会导致您的集群无法选举出主节点。

cluster.election.initial_timeout(静态): 设置节点在首次等待选举之前或在选举出的主节点失败后等待的时间上限。默认为100毫秒。如果从默认值更改此设置,可能会导致您的集群无法选举出主节点。

cluster.election.max_timeout(静态): 设置节点在首次选举之前等待的时间上限,以及网络分区持续时间过长

时不会导致过于稀疏的选举。默认为10秒。如果从默认值更改此设置,可能会导致您的集群无法选举出主节点。

cluster.fault_detection.follower_check.interval(静态): 设置当选举出的主节点在对集群中的其他节点执行跟随者检查时等待的时间。默认为1秒。如果从默认值更改此设置,可能会导致您的集群变得不稳定。

cluster.fault_detection.follower_check.timeout(静态): 设置选举出的主节点在等待跟随者检查的响应超时之前等待的时间。默认为10秒。如果从默认值更改此设置,可能会导致您的集群变得不稳定。

cluster.fault_detection.follower_check.retry_count(静态): 设置每个节点在选举出的主节点认为该节点故障并将其从集群中删除之前必须连续发生多少次跟随者检查失败。默认为3。如果从默认值更改此设置,可能会导致您的集群变得不稳定。

cluster.fault_detection.leader_check.interval(静态): 设置每个节点在检查选举出的主节点时等待的时间。默认为1秒。如果从默认值更改此设置,可能会导致您的集群变得不稳定。

cluster.fault_detection.leader_check.timeout(静态): 设置每个节点在等待来自选举出的主节点的领导者检查响应之前等待的时间。默认为10秒。如果从默认值更改此设置,可能会导致您的集群变得不稳定。

cluster.fault_detection.leader_check.retry_count(静态): 设置在节点认为选举出的主节点故障并尝试查找或选举新主节点之前,必须连续发生多少次领导者检查失败。默认为3。如果从默认值更改此设置,可能会导致您的集群变得不稳定。

cluster.follower_lag.timeout(静态): 设置主节点等待从滞后的节点接收到有关集群状态更新的确认的时间。默认值为90秒。如果节点无法在此时间段内成功应用集群状态更新,它将被视为故障并从集群中删除。请参阅Publishing the cluster state。

cluster.join.timeout(静态): 设置节点在发送加入集群请求后等待的时间,然后考虑请求失败并进行重试,除非将discovery.type设置为single-node。默认为60秒。

cluster.max_voting_config_exclusions(动态): 设置在任何时候限制投票配置排除的数量。默认值为10。请参阅Adding and removing nodes。

cluster.publish.info_timeout(静态): 设置主节点在每个集群状态更新完全发布到所有节点之前等待的时间,然后记录一条消息,指示某些节点响应较慢。默认值为10秒。

cluster.publish.timeout(静态): 设置主节点在每个集群状态更新完全发布到所有节点之前等待的时间,除非将discovery.type设置为single-node。默认值为30秒。请参阅Publishing the cluster state。

cluster.no_master_block(动态): 指定在集群中没有活动主节点时拒绝哪些操作。该设置有两个有效值:

  • all:拒绝节点上的所有操作(包括读和写操作)。这也适用于API集群状态的读或写操作,如获取索引设置、设置映射和集群状态API。

  • write(默认值):拒绝写操作。读操作基于最后已知的集群配置成功进行,这可能会导致部分读取陈旧数据,因为该节点可能与集群的其余部分隔离。

cluster.no_master_block设置不适用于基于节点的API(例如,集群统计信息、节点信息和节点统计信息API)。这些API的请求不会被阻止,并且可以在任何可用节点上运行。要使集群完全运行,必须有活动主节点。此设置替代了较早版本中的discovery.zen.no_master_block设置。discovery.zen.no_master_block设置已被忽略。

monitor.fs.health.enabled(动态): 如果为true,则节点会定期运行文件系统健康检查。默认为true。

monitor.fs.health.refresh_interval(静态): 连续的文件系统健康检查之间的间隔。默认为2分钟。

monitor.fs.health.slow_path_logging_threshold(动态): 如果文件系统健康检查所需的时间超过此阈值,Elasticsearch会记录警告。默认为5秒。

1.7 HTTP

HTTP

The HTTP layer exposes Elasticsearch’s REST APIs over HTTP.

HTTP层通过HTTP暴露了Elasticsearch的REST API。

The HTTP mechanism is completely asynchronous in nature, meaning that there is no blocking thread waiting for a response. The benefit of using asynchronous communication for HTTP is solving the C10k problem.

HTTP机制完全是异步的,这意味着没有阻塞线程在等待响应。使用异步通信进行HTTP通信的好处在于解决了C10k问题。

When possible, consider using HTTP keep alive when connecting for better performance and try to get your favorite client not to do HTTP chunking.

在可能的情况下,考虑在连接时使用HTTP保持活动以获得更好的性能,并尽量让您喜欢的客户端不进行HTTP分块。

HTTP设置

The following settings can be configured for HTTP. These settings also use the common network settings.

以下设置可以为HTTP进行配置。这些设置也使用常见的网络设置。

http.port
(Static) A bind port range. Defaults to 9200-9300.

http.port(静态): 绑定端口范围。默认为9200-9300。

http.bind_host
(Static) The host address to bind the HTTP service to. Defaults to http.host (if set) or network.bind_host.

http.publish_port(静态): HTTP客户端在与此节点通信时应使用的端口。当集群节点位于代理或防火墙后面,http.port无法直接从外部访问时,这个设置很有用。默认为通过http.port分配的实际端口。

http.bind_host
(Static) The host address to bind the HTTP service to. Defaults to http.host (if set) or network.bind_host.

http.bind_host(静态): 绑定HTTP服务的主机地址。默认为http.host(如果已设置)或network.bind_host。

http.publish_host
(Static) The host address to publish for HTTP clients to connect to. Defaults to http.host (if set) or network.publish_host.

http.publish_host(静态): 发布供HTTP客户端连接的主机地址。默认为http.host(如果已设置)或network.publish_host。

http.host
(Static) Used to set the http.bind_host and the http.publish_host.

http.host(静态): 用于设置http.bind_host和http.publish_host。

http.max_content_length
(Static) Maximum length of an HTTP request body. Defaults to 100MB.

http.max_content_length(静态): HTTP请求正文的最大长度。默认为100MB。

http.max_initial_line_length
(Static) The max length of an HTTP URL. Defaults to 4KB.

http.max_initial_line_length(静态): HTTP URL的最大长度。默认为4KB。

http.max_header_size
(Static) The max size of allowed headers. Defaults to 8KB.

http.max_header_size(静态): 允许的头部最大大小。默认为8KB。

http.compression logo cloud
(Static) Support for compression when possible (with Accept-Encoding). If HTTPS is enabled, defaults to false. Otherwise, defaults to true.
Disabling compression for HTTPS mitigates potential security risks, such as a BREACH attack. To compress HTTPS traffic, you must explicitly set http.compression to true.

http.compression(静态): 在可能的情况下支持压缩(使用Accept-Encoding)。如果启用了HTTPS,则默认为false。否则,默认为true。
禁用HTTPS的压缩可以减轻潜在的安全风险,比如BREACH攻击。要压缩HTTPS流量,必须显式将http.compression设置为true。

http.compression_level
(Static) Defines the compression level to use for HTTP responses. Valid values are in the range of 1 (minimum compression) and 9 (maximum compression). Defaults to 3.

http.compression_level(静态): 定义要用于HTTP响应的压缩级别。有效值范围为1(最小压缩)到9(最大压缩)。默认为3。

http.cors.enabled logo cloud
(Static) Enable or disable cross-origin resource sharing, which determines whether a browser on another origin can execute requests against Elasticsearch. Set to true to enable Elasticsearch to process pre-flight CORS requests. Elasticsearch will respond to those requests with the Access-Control-Allow-Origin header if the Origin sent in the request is permitted by the http.cors.allow-origin list. Set to false (the default) to make Elasticsearch ignore the Origin request header, effectively disabling CORS requests because Elasticsearch will never respond with the Access-Control-Allow-Origin response header.

If the client does not send a pre-flight request with an Origin header or it does not check the response headers from the server to validate the Access-Control-Allow-Origin response header, then cross-origin security is compromised. If CORS is not enabled on Elasticsearch, the only way for the client to know is to send a pre-flight request and realize the required response headers are missing.

http.cors.enabled(静态): 启用或禁用跨源资源共享,用于确定另一个源上的浏览器是否可以执行针对Elasticsearch的请求。设置为true以启用Elasticsearch处理预检CORS请求。如果请求中发送的Origin在http.cors.allow-origin列表中被允许,Elasticsearch将使用Access-Control-Allow-Origin头来响应这些请求。设置为false(默认值)会使Elasticsearch忽略Origin请求头,从根本上禁用CORS请求,因为Elasticsearch永远不会响应Access-Control-Allow-Origin响应头。

如果客户端没有发送带有Origin头的预检请求,或者它没有检查来自服务器的响应头以验证Access-Control-Allow-Origin响应头,那么跨源安全性将受到威胁。如果未在Elasticsearch上启用CORS,客户端唯一知道的方法是发送预检请求并意识到所需的响应头缺失。

http.cors.allow-origin logo cloud
(Static) Which origins to allow. If you prepend and append a forward slash (/) to the value, this will be treated as a regular expression, allowing you to support HTTP and HTTPs. For example, using /https?: //localhost(:[0-9]+)?/ would return the request header appropriately in both cases. Defaults to no origins allowed.

A wildcard (*) is a valid value but is considered a security risk, as your Elasticsearch instance is open to cross origin requests from anywhere.

http.cors.allow-origin(静态): 允许哪些来源。如果在值之前和之后添加斜杠(/),则它将被视为正则表达式,允许您支持HTTP和HTTPs。例如,使用/https?: //localhost(:[0-9]+)?/可以在两种情况下适当地返回请求头。默认不允许任何来源。

通配符(*)是有效的值,但被视为安全风险,因为您的Elasticsearch实例对来自任何地方的跨源请求都是开放的。

http.cors.max-age logo cloud
(Static) Browsers send a “preflight” OPTIONS-request to determine CORS settings. max-age defines how long the result should be cached for. Defaults to 1728000 (20 days).

http.cors.max-age(静态): 浏览器发送"preflight" OPTIONS请求以确定CORS设置。max-age定义了结果应该被缓存多长时间。默认为1728000(20天)。

http.cors.allow-methods logo cloud
(Static) Which methods to allow. Defaults to OPTIONS, HEAD, GET, POST, PUT, DELETE.

http.cors.allow-methods(静态): 允许哪些方法。默认为OPTIONS、HEAD、GET、POST、PUT、DELETE。

http.cors.allow-headers logo cloud
(Static) Which headers to allow. Defaults to X-Requested-With, Content-Type, Content-Length.

http.cors.allow-headers(静态): 允许哪些头部。默认为X-Requested-With、Content-Type、Content-Length。

http.cors.allow-credentials logo cloud
(Static) Whether the Access-Control-Allow-Credentials header should be returned. Defaults to false.
This header is only returned when the setting is set to true.

http.cors.allow-credentials(静态): 是否应该返回Access-Control-Allow-Credentials头。默认为false。
只有在将设置为true时才会返回此头。

http.detailed_errors.enabled
(Static) If true, enables the output of detailed error messages and stack traces in the response output. Defaults to true.
If false, use the error_trace parameter to enable stack traces and return detailed error messages. Otherwise, only a simple message will be returned.

http.detailed_errors.enabled(静态): 如果为true,则启用响应输出中的详细错误消息和堆栈跟踪。默认为true。
如果为false,则使用error_trace参数启用堆栈跟踪并返回详细的错误消息。否则,将只返回简单的消息。

http.pipelining.max_events
(Static) The maximum number of events to be queued up in memory before an HTTP connection is closed, defaults to 10000.

http.pipelining.max_events(静态): 在内存中排队的事件最大数量,然后关闭HTTP连接。默认为10000。

http.max_warning_header_count
(Static) The maximum number of warning headers in client HTTP responses. Defaults to unbounded.

http.max_warning_header_count(静态): 客户端HTTP响应中警告头的最大数量。默认为无限制。

http.max_warning_header_count
(Static) The maximum number of warning headers in client HTTP responses. Defaults to unbounded.

http.max_warning_header_size(静态): 客户端HTTP响应中警告头的最大总大小。默认为无限制。

REST请求跟踪器

The HTTP layer has a dedicated tracer logger which, when activated, logs incoming requests. The log can be dynamically activated by setting the level of the org.elasticsearch.http.HttpTracer logger to TRACE:

HTTP层有一个专用的跟踪器记录传入请求。可以通过将org.elasticsearch.http.HttpTracer记录器的级别动态设置为TRACE来激活日志记录:

PUT _cluster/settings
{
   "transient" : {
      "logger.org.elasticsearch.http.HttpTracer" : "TRACE"
   }
}

You can also control which uris will be traced, using a set of include and exclude wildcard patterns. By default every request will be traced.

您还可以使用一组包含和排除通配符模式来控制哪些URI将被跟踪。默认情况下,将跟踪每个请求。

PUT _cluster/settings
{
   "transient" : {


      "http.tracer.include" : "*",
      "http.tracer.exclude" : ""
   }
}

1.8 Index lifecycle management settings in Elasticsearch(索引生命周期管理(ILM)的设置)

Cluster level settings(集群级别设置)

xpack.ilm.enabled

(Static, Boolean) [7.8.0]Deprecated in 7.8.0. Basic License features are always enabled
This deprecated setting has no effect and will be removed in Elasticsearch 8.0.

(静态,布尔值)[7.8.0] 在7.8.0中已弃用。基本许可证功能始终启用。
此弃用设置无效,将在Elasticsearch 8.0中删除。

indices.lifecycle.history_index_enabled

(Static, Boolean) Whether ILM’s history index is enabled. If enabled, ILM will record the history of actions taken as part of ILM policies to the ilm-history-* indices. Defaults to true.

(静态,布尔值)是否启用ILM的历史索引。如果启用,ILM将记录作为ILM策略一部分执行的操作历史记录到ilm-history-*索引中。默认为true。

indices.lifecycle.poll_interval

(Dynamic, time unit value) How often index lifecycle management checks for indices that meet policy criteria. Defaults to 10m.

(动态,时间单位值)索引生命周期管理检查符合策略条件的索引的频率。默认为10分钟。

Index level settings(索引级别设置)

These index-level ILM settings are typically configured through index templates. For more information, see Create a lifecycle policy.

这些索引级别的ILM设置通常通过索引模板进行配置。有关更多信息,请参阅创建生命周期策略。

index.lifecycle.indexing_complete

(Dynamic, Boolean) Indicates whether or not the index has been rolled over. Automatically set to true when ILM completes the rollover action. You can explicitly set it to skip rollover. Defaults to false.

(动态,布尔值)指示索引是否已经进行了切换。当ILM完成切换操作时,会自动设置为true。您可以显式设置它以跳过切换。默认为false。

index.lifecycle.name

(Dynamic, string) The name of the policy to use to manage the index.

(动态,字符串)用于管理索引的策略名称。

index.lifecycle.origination_date

(Dynamic, long) If specified, this is the timestamp used to calculate the index age for its phase transitions. Use this setting if you create a new index that contains old data and want to use the original creation date to calculate the index age. Specified as a Unix epoch value.

(动态,长整型)如果指定,这是用于计算索引阶段转换的索引年龄的时间戳。如果创建了包含旧数据的新索引,并希望使用原始创建日期来计算索引年龄,则使用此设置。指定为Unix纪元值。

index.lifecycle.parse_origination_date

(Dynamic, Boolean) Set to true to parse the origination date from the index name. This origination date is used to calculate the index age for its phase transitions. The index name must match the pattern ^.*-{date_format}-\d+, where the date_format is yyyy.MM.dd and the trailing digits are optional. An index that was rolled over would normally match the full format, for example logs-2016.10.31-000002). If the index name doesn’t match the pattern, index creation fails.

(动态,布尔值)设置为true以从索引名称中解析起始日期。此起始日期用于计算索引阶段转换的索引年龄。索引名称必须与模式^.*-{date_format}-\d+匹配,其中date_format为yyyy.MM.dd,尾随数字是可选的。通常,已切换的索引将与完整格式匹配,例如logs-2016.10.31-000002)。如果索引名称不匹配模式,索引创建将失败。

index.lifecycle.rollover_alias

(Dynamic, string) The index alias to update when the index rolls over. Specify when using a policy that contains a rollover action. When the index rolls over, the alias is updated to reflect that the index is no longer the write index. For more information about rolling indices, see Rollover.

(动态,字符串)索引切换时要更新的索引别名。在使用包含切换操作的策略时指定。当索引切换时,别名将更新,以反映索引不再是写入索引。有关滚动索引的更多信息,请参阅Rollover。

1.9 Index management settings(索引管理设置)

action.auto_create_index

(Dynamic) Automatically create an index if it doesn’t already exist and apply any configured index templates. Defaults to true.

(动态)如果索引不存在,则自动创建索引并应用任何配置的索引模板。默认为true。

action.destructive_requires_name

(Dynamic) When set to true, you must specify the index name to delete an index. It is not possible to delete all indices with _all or use wildcards.

(动态)当设置为true时,必须指定索引名称才能删除索引。不可以使用_all或通配符删除所有索引。

cluster.indices.close.enable

(Dynamic) Enables closing of open indices in Elasticsearch. If false, you cannot close open indices. Defaults to true.
Closed indices still consume a significant amount of disk space.

(动态)启用Elasticsearch中打开索引的关闭。如果设置为false,则无法关闭打开的索引。默认为true。
已关闭的索引仍会占用大量磁盘空间。

reindex.remote.whitelist

(Static) Specifies the hosts that can be reindexed from remotely. Expects a YAML array of host:port strings. Consists of a comma-delimited list of host:port entries. Defaults to [“*.io:", "*.com:”].

(静态)指定可以从远程重新索引的主机。预期是host:port字符串的YAML数组。包括逗号分隔的host:port条目的列表。默认为[“*.io:", "*.com:”]。

stack.templates.enabled

(Static) Specifies the hosts that can be reindexed from remotely. Expects a YAML array of host:port strings. Consists of a comma-delimited list of host:port entries. Defaults to [“*.io:", "*.com:”].

(静态)如果为true,则启用内置的索引和组件模板。Elastic Agent使用这些模板来创建数据流。如果为false,则Elasticsearch将禁用这些索引和组件模板。默认为true。

此设置影响以下内置索引模板:

  • logs--
  • metrics--

此设置还影响以下内置组件模板:

  • logs-mappings
  • logs-settings
  • metrics-mappings
  • metrics-settings
  • synthetics-mapping
  • synthetics-settings

1.10 索引恢复设置

Peer recovery syncs data from a primary shard to a new or existing shard copy.
Peer recovery automatically occurs when Elasticsearch:
Recreates a shard lost during node failure
Relocates a shard to another node due to a cluster rebalance or changes to the shard allocation settings
You can view a list of in-progress and completed recoveries using the cat recovery API.

对等恢复是将主分片的数据同步到新的或现有的分片副本的过程。
在 Elasticsearch 中,对等恢复会在以下情况下自动发生:

  1. 重新创建在节点故障期间丢失的分片。
  2. 由于集群重新平衡或更改分片分配设置,将分片重定位到另一个节点。
    您可以使用 cat recovery API 查看正在进行和已完成的恢复操作。

Recovery settings(恢复设置)

ndices.recovery.max_bytes_per_sec
(Dynamic) Limits total inbound and outbound recovery traffic for each node. Applies to both peer recoveries as well as snapshot recoveries (i.e., restores from a snapshot). Defaults to 40mb.
This limit applies to each node separately. If multiple nodes in a cluster perform recoveries at the same time, the cluster’s total recovery traffic may exceed this limit.
If this limit is too high, ongoing recoveries may consume an excess of bandwidth and other resources, which can destabilize the cluster.

This is a dynamic setting, which means you can set it in each node’s elasticsearch.yml config file and you can update it dynamically using the cluster update settings API. If you set it dynamically then the same limit applies on every node in the cluster. If you do not set it dynamically then you can set a different limit on each node, which is useful if some of your nodes have better bandwidth than others. For example, if you are using Index Lifecycle Management then you may be able to give your hot nodes a higher recovery bandwidth limit than your warm nodes.

indices.recovery.max_bytes_per_sec

(动态设置)限制每个节点的总入站和出站恢复流量。适用于对等恢复以及从快照还原的情况(即从快照还原) 默认值为 40mb。
此限制单独适用于每个节点。如果集群中的多个节点同时执行恢复操作,则集群的总恢复流量可能会超过此限制。
如果此限制设置得过高,正在进行的恢复操作可能会消耗过多的带宽和其他资源,可能会使集群不稳定。

这是一个动态设置,这意味着您可以在每个节点的 elasticsearch.yml 配置文件中设置它,还可以使用集群更新设置 API 动态更新它。如果动态设置了它,那么相同的限制将适用于集群中的每个节点。但是,如果您不以动态方式设置它,则可以为每个节点设置不同的限制,这对于某些节点具有更好带宽的情况非常有用。例如,如果您正在使用索引生命周期管理,则可以为热节点分配比温暖节点更高的恢复带宽限制。

Expert peer recovery settings(专业对等恢复设置)

indices.recovery.max_concurrent_file_chunks

(Dynamic, Expert) Number of file chunk requests sent in parallel for each recovery. Defaults to 2.
You can increase the value of this setting when the recovery of a single shard is not reaching the traffic limit set by indices.recovery.max_bytes_per_sec.

(动态,专业设置)每个恢复操作并行发送的文件块请求数量。默认值为 2。

当单个分片的恢复未达到 indices.recovery.max_bytes_per_sec 设置的流量限制时,可以增加此设置的值。

indices.recovery.max_concurrent_operations

(Dynamic, Expert) Number of operations sent in parallel for each recovery. Defaults to 1.

(动态,专业设置)每个恢复操作并行发送的操作数量。默认值为 1。

Concurrently replaying operations during recovery can be very resource-intensive and may interfere with indexing, search, and other activities in your cluster. Do not increase this setting without carefully verifying that your cluster has the resources available to handle the extra load that will result.

在恢复过程中并行重放操作可能非常消耗资源,并可能干扰索引、搜索和集群中的其他活动。请在仔细验证集群是否有足够的资源来处理将产生的额外负载之前,再增加此设置。

1.11 Indexing buffer settings(索引缓冲区设置)

The indexing buffer is used to store newly indexed documents. When it fills up, the documents in the buffer are written to a segment on disk. It is divided between all shards on the node.

索引缓冲区用于存储新索引的文档。当它填满时,缓冲区中的文档将被写入磁盘上的一个段。它在节点上的所有分片之间进行划分。

以下设置是静态设置,必须在集群中的每个数据节点上进行配置:

indices.memory.index_buffer_size

(Static) Accepts either a percentage or a byte size value. It defaults to 10%, meaning that 10% of the total heap allocated to a node will be used as the indexing buffer size shared across all shards.

(静态)接受百分比或字节大小值。默认值为10%,意味着将分配给节点的总堆内存的10%将用作所有分片共享的索引缓冲区大小。

indices.memory.min_index_buffer_size

(Static) If the index_buffer_size is specified as a percentage, then this setting can be used to specify an absolute minimum. Defaults to 48mb.

(静态)如果index_buffer_size指定为百分比,则可以使用此设置指定绝对最小值。默认值为48MB。

indices.memory.max_index_buffer_size

(Static) If the index_buffer_size is specified as a percentage, then this setting can be used to specify an absolute maximum. Defaults to unbounded.

(静态)如果index_buffer_size指定为百分比,则可以使用此设置指定绝对最大值。默认情况下为无限制。

1.12 License settings(许可证设置)

You can configure this licensing setting in the elasticsearch.yml file. For more information, see License management.

您可以在elasticsearch.yml文件中配置此许可证设置。有关更多信息,请参阅许可证管理。

xpack.license.self_generated.type

(Static) Set to basic (default) to enable basic X-Pack features.
If set to trial, the self-generated license gives access only to all the features of a x-pack for 30 days. You can later downgrade the cluster to a basic license if needed.

(静态)设置为basic(默认值)以启用基本的X-Pack功能。
如果设置为trial,则自动生成的许可证仅在30天内提供对X-Pack的所有功能的访问权限。以后如果需要,您可以将集群降级为基本许可证。

1.13 Local gateway settings(本地网关设置)

The local gateway stores the cluster state and shard data across full cluster restarts.

本地网关用于在完整的集群重启期间存储集群状态和分片数据。

The following static settings, which must be set on every master node, control how long a freshly elected master should wait before it tries to recover the cluster state and the cluster’s data.
These settings only take effect on a full cluster restart.

以下静态设置必须在每个主节点上设置,用于控制新选举的主节点在尝试恢复集群状态和集群数据之前应等待多长时间。
这些设置仅在进行完整的集群重启时生效。

gateway.expected_nodes

(Static) [7.7.0] Deprecated in 7.7.0. This setting will be removed in 8.0. Use gateway.expected_data_nodes instead.Number of data or master nodes expected in the cluster. Recovery of local shards begins when the expected number of nodes join the cluster. Defaults to 0.

(静态)[7.7.0] 在7.7.0中已弃用,将在8.0中删除。请改用gateway.expected_data_nodes。期望集群中的数据节点或主节点数量。当预期数量的节点加入集群时,本地分片的恢复过程将开始。默认为0。

gateway.expected_master_nodes

(Static) [7.7.0] Deprecated in 7.7.0. This setting will be removed in 8.0. Use gateway.expected_data_nodes instead.Number of master nodes expected in the cluster. Recovery of local shards begins when the expected number of master nodes join the cluster. Defaults to 0.

(静态)[7.7.0] 在7.7.0中已弃用,将在8.0中删除。请改用gateway.expected_data_nodes。期望集群中的主节点数量。当预期数量的主节点加入集群时,本地分片的恢复过程将开始。默认为0。

gateway.expected_data_nodes

(Static) Number of data nodes expected in the cluster. Recovery of local shards begins when the expected number of data nodes join the cluster. Defaults to 0.

(静态)期望集群中的数据节点数量。当预期数量的数据节点加入集群时,本地分片的恢复过程将开始。默认为0。

gateway.recover_after_time

(Static) Number of data nodes expected in the cluster. Recovery of local shards begins when the expected number of data nodes join the cluster. Defaults to 0.
Once the recover_after_time duration has timed out, recovery will start as long as the following conditions are met:

(静态)如果未达到预期的节点数量,则恢复过程将等待配置的时间量,然后尝试恢复。如果已配置了其中一个expected_nodes设置,则默认为5分钟。
一旦recover_after_time持续时间超时,只要满足以下条件,恢复将开始:

gateway.recover_after_nodes

(Static) [7.7.0] Deprecated in 7.7.0. This setting will be removed in 8.0. Use gateway.recover_after_data_nodes instead.Recover as long as this many data or master nodes have joined the cluster.

(静态)[7.7.0] 在7.7.0中已弃用,将在8.0中删除。请改用gateway.recover_after_data_nodes。只要有这么多数据节点或主节点加入了集群,就进行恢复。

gateway.recover_after_master_nodes

(Static) [7.7.0] Deprecated in 7.7.0. This setting will be removed in 8.0. Use gateway.recover_after_data_nodes instead.Recover as long as this many data or master nodes have joined the cluster.

(静态)[7.7.0] 在7.7.0中已弃用,将在8.0中删除。请改用gateway.recover_after_data_nodes。只要有这么多主节点加入了集群,就进行恢复。

gateway.recover_after_data_nodes

(Static) Recover as long as this many data nodes have joined the cluster.

(静态)只要有这么多数据节点加入了集群,就进行恢复。

Dangling indices(悬挂索引)

When a node joins the cluster, if it finds any shards stored in its local data directory that do not already exist in the cluster, it will consider those shards to be “dangling”. Importing dangling indices into the cluster using gateway.auto_import_dangling_indices is not safe. Instead, use the Dangling indices API. Neither mechanism provides any guarantees as to whether the imported data truly represents the latest state of the data when the index was still part of the cluster.

当一个节点加入集群时,如果它在其本地数据目录中发现任何尚不存在于集群中的分片,它将视这些分片为“悬挂”的。使用gateway.auto_import_dangling_indices自动导入悬挂索引到集群状态是不安全的。相反,请使用悬挂索引API。无论哪种机制都不能保证导入的数据是否真正代表了索引仍然是集群的一部分时的最新状态。

gateway.auto_import_dangling_indices

[7.9.0] Deprecated in 7.9.0. This setting will be removed in 8.0. You should use the dedicated dangling indices API instead.Whether to automatically import dangling indices into the cluster state, provided no indices already exist with the same name. Defaults to false.

[7.9.0] 在7.9.0中已弃用,将在8.0中删除。您应该改用专用的悬挂索引API。是否自动将悬挂索引导入集群状态,前提是没有同名索引已经存在。默认为false。

The auto-import functionality was intended as a best effort to help users who lose all master nodes. For example, if a new master node were to be started which was unaware of the other indices in the cluster, adding the old nodes would cause the old indices to be imported, instead of being deleted. However there are several issues with automatic importing, and its use is strongly discouraged in favour of the <<dangling-indices-api,dedicated API>.

自动导入功能旨在尽力帮助那些失去所有主节点的用户。例如,如果启动了一个不知道集群中其他索引的新主节点,那么添加旧节点将导致导入旧索引,而不是删除它们。然而,自动导入存在几个问题,强烈建议使用专用的API而不是自动导入。

Losing all master nodes is a situation that should be avoided at all costs, as it puts your cluster’s metadata and data at risk.

失去所有主节点是应该尽一切努力避免的情况,因为它会使集群的元数据和数据处于风险之中。

1.14 Logging(日志记录)

For Linux .tar.gz installations, Elasticsearch writes logs to $ES_HOME/logs.
Files in $ES_HOME risk deletion during an upgrade. In production, we strongly recommend you set path.logs to a location outside of $ES_HOME. See path.data and path.logs.
If you run Elasticsearch from the command line, Elasticsearch prints logs to the standard output (stdout).

对于Linux .tar.gz安装,Elasticsearch会将日志写入$ES_HOME/logs目录。
在升级期间,$ES_HOME中的文件有风险被删除。在生产环境中,强烈建议将path.logs设置为$ES_HOME之外的位置。请参阅path.data和path.logs。
如果您从命令行运行Elasticsearch,Elasticsearch会将日志打印到标准输出(stdout)

日志配置

Elasticsearch uses Log4j 2 for logging. Log4j 2 can be configured using the log4j2.properties file. Elasticsearch exposes three properties, ${sys:es.logs.base_path}, ${sys:es.logs.cluster_name}, and ${sys:es.logs.node_name} that can be referenced in the configuration file to determine the location of the log files. The property ${sys:es.logs.base_path} will resolve to the log directory, ${sys:es.logs.cluster_name} will resolve to the cluster name (used as the prefix of log filenames in the default configuration), and ${sys:es.logs.node_name} will resolve to the node name (if the node name is explicitly set).

Elasticsearch使用Log4j 2进行日志记录。可以使用log4j2.properties文件配置Log4j 2。Elasticsearch公开了三个属性,${sys:es.logs.base_path}、${sys:es.logs.cluster_name}和${sys:es.logs.node_name},可以在配置文件中引用,以确定日志文件的位置。属性${sys:es.logs.base_path}将解析为日志目录,${sys:es.logs.cluster_name}将解析为集群名称(在默认配置中用作日志文件名前缀),${sys:es.logs.node_name}将解析为节点名称(如果明确设置了节点名称)。

For example, if your log directory (path.logs) is /var/log/elasticsearch and your cluster is named production then ${sys:es.logs.base_path} will resolve to /var/log/elasticsearch and ${sys:es.logs.base_path}${sys:file.separator}${sys:es.logs.cluster_name}.log will resolve to /var/log/elasticsearch/production.log.

例如,如果您的日志目录(path.logs)为/var/log/elasticsearch,您的集群命名为production,那么${sys:es.logs.base_path}将解析为/var/log/elasticsearch,${sys:es.logs.base_path}${sys:file.separator}${sys:es.logs.cluster_name}.log将解析为/var/log/elasticsearch/production.log。

# 配置RollingFile appender
appender.rolling.type = RollingFile 
appender.rolling.name = rolling
# 日志记录到/var/log/elasticsearch/production_server.json
appender.rolling.fileName = ${
    
    sys:es.logs.base_path}${
    
    sys:file.separator}${
    
    sys:es.logs.cluster_name}_server.json 
# 使用JSON布局。
appender.rolling.layout.type = ESJsonLayout 
#  type_name是ESJsonLayout中的类型字段,用于在解析日志时更容易区分不同类型的日志。
appender.rolling.layout.type_name = server 
# 将日志滚动到/var/log/elasticsearch/production-yyyy-MM-dd-i.json;每次滚动都会压缩日志,i会递增
appender.rolling.filePattern = ${
    
    sys:es.logs.base_path}${
    
    sys:file.separator}${
    
    sys:es.logs.cluster_name}-%d{
    
    yyyy-MM-dd}-%i.json.gz 
appender.rolling.policies.type = Policies
appender.rolling.policies.time.type = TimeBasedTriggeringPolicy 
# 按天滚动日志
appender.rolling.policies.time.interval = 1 
# 在日期边界上对齐滚动(与每隔24小时滚动不同)
appender.rolling.policies.time.modulate = true 
# 使用基于大小的滚动策略
appender.rolling.policies.size.type = SizeBasedTriggeringPolicy 
# 在256 MB后滚动日志
appender.rolling.policies.size.size = 256MB 
appender.rolling.strategy.type = DefaultRolloverStrategy
appender.rolling.strategy.fileIndex = nomax
# 在滚动日志时使用删除操作
appender.rolling.strategy.action.type = Delete 
appender.rolling.strategy.action.basepath = ${
    
    sys:es.logs.base_path}
# 仅删除与文件模式匹配的日志
appender.rolling.strategy.action.condition.type = IfFileName 
# 模式是仅删除主要日志
appender.rolling.strategy.action.condition.glob = ${
    
    sys:es.logs.cluster_name}-* 
# 仅在累积压缩日志过多时删除
appender.rolling.strategy.action.condition.nested_condition.type = IfAccumulatedFileSize 
# 压缩日志的大小条件为2 GB
appender.rolling.strategy.action.condition.nested_condition.exceeds = 2GB 
######## 服务器 - 旧样式模式 ###########
appender.rolling_old.type = RollingFile
appender.rolling_old.name = rolling_old
appender.rolling_old.fileName = ${
    
    sys:es.logs.base_path}${
    
    sys:file.separator}${
    
    sys:es.logs.cluster_name}_server.log 
appender.rolling_old.layout.type = PatternLayout
appender.rolling_old.layout.pattern = [%d{
    
    ISO8601}][%-5p][%-25c{
    
    1.}] [%node_name]%marker %m%n
appender.rolling_old.filePattern = ${
    
    sys:es.logs.base_path}${
    
    sys:file.separator}${
    
    sys:es.logs.cluster_name}-%d{
    
    yyyy-MM-dd}-%i.old_log.gz

The configuration for old style pattern appenders. These logs will be saved in *.log files and if archived will be in * .log.gz files. Note that these should be considered deprecated and will be removed in the future.

旧样式模式附加程序的配置。这些日志将保存在*.log文件中,如果被归档,将保存在*.log.gz文件中。请注意,这些应被视为已弃用,并将在将来删除。

Log4j’s configuration parsing gets confused by any extraneous whitespace; if you copy and paste any Log4j settings on this page, or enter any Log4j configuration in general, be sure to trim any leading and trailing whitespace.

Log4j的配置解析会受到任何多余的空格的影响。如果在此页面上复制并粘贴任何Log4j设置,或者一般输入任何Log4j配置,请确保删除任何前导和尾随空格。

Note than you can replace .gz by .zip in appender.rolling.filePattern to compress the rolled logs using the zip format. If you remove the .gz extension then logs will not be compressed as they are rolled.

请注意,您可以在appender.rolling.filePattern中将.gz替换为.zip,以使用zip格式压缩滚动日志。如果删除.gz扩展名,则日志在滚动时不会被压缩。

If you want to retain log files for a specified period of time, you can use a rollover strategy with a delete action.

如果您希望保留日志文件一定时间,可以使用带有删除操作的滚动策略。

# 配置DefaultRolloverStrategy
appender.rolling.strategy.type = DefaultRolloverStrategy 
# 配置处理滚动的Delete操作
appender.rolling.strategy.action.type = Delete 
# Elasticsearch日志的基本路径
appender.rolling.strategy.action.basepath = ${
    
    sys:es.logs.base_path} 
# 处理滚动时应用的条件
appender.rolling.strategy.action.condition.type = IfFileName 
# 从与glob ${sys:es.logs.cluster_name}-*匹配的基路径删除文件;这是日志文件滚动到的glob;这是为了仅删除滚动的Elasticsearch日志,而不会删除弃用和慢速日志
appender.rolling.strategy.action.condition.glob = 
${
    
    sys:es.logs.cluster_name}-* 
# 应用于与glob匹配的文件的嵌套条件
appender.rolling.strategy.action.condition.nested_condition.type = IfLastModified 
# 保留日志7天
appender.rolling.strategy.action.condition.nested_condition.age = 7D 

Multiple configuration files can be loaded (in which case they will get merged) as long as they are named log4j2.properties and have the Elasticsearch config directory as an ancestor; this is useful for plugins that expose additional loggers. The logger section contains the java packages and their corresponding log level. The appender section contains the destinations for the logs. Extensive information on how to customize logging and all the supported appenders can be found on the Log4j documentation.

可以加载多个配置文件(在这种情况下它们将被合并),只要它们被命名为log4j2.properties,并且具有Elasticsearch配置目录作为祖先;这对于公开附加日志记录器的插件非常有用。日志记录器部分包含Java包及其相应的日志级别。附加程序部分包含日志的目标。有关如何自定义日志记录和所有支持的附加程序的详细信息,可以在 Log4j文档中找到。

配置日志级别

有四种配置日志级别的方式,每种方式都适用于不同的情况。

Via the command-line: -E = (e.g., -E logger.org.elasticsearch.discovery=debug). This is most appropriate when you are temporarily debugging a problem on a single node (for example, a problem with startup, or during development).

通过命令行:-E <日志层次结构名称>=<级别>(例如,-E logger.org.elasticsearch.discovery=debug)。当您在单个节点上暂时调试问题时(例如,启动问题或在开发过程中的问题调试)时,这是最合适的方式。

Via elasticsearch.yml: : (e.g., logger.org.elasticsearch.discovery: debug). This is most appropriate when you are temporarily debugging a problem but are not starting Elasticsearch via the command-line (e.g., via a service) or you want a logging level adjusted on a more permanent basis.

通过elasticsearch.yml:<日志层次结构名称>: <级别>(例如,logger.org.elasticsearch.discovery: debug)。当您需要暂时调试问题但不是通过命令行启动Elasticsearch(例如,通过服务启动)或者您希望更长期地调整日志级别时,这是最合适的方式。

Via cluster settings:

通过集群设置:

PUT /_cluster/settings
{
    
    
  "transient": {
    
    
    "<日志层次结构名称>": "<级别>"
  }
}

例如:

PUT /_cluster/settings
{
    
    
  "transient": {
    
    
    "logger.org.elasticsearch.discovery": "DEBUG"
  }
}

This is most appropriate when you need to dynamically need to adjust a logging level on an actively-running cluster.

此方式最适用于您需要在正在运行的集群上动态调整日志级别的情况。

Via the log4j2.properties:

通过log4j2.properties:

logger.<唯一标识符>.name = <日志层次结构名称>
logger.<唯一标识符>.level = <级别>
例如:

logger.discovery.name = org.elasticsearch.discovery
logger.discovery.level = debug

This is most appropriate when you need fine-grained control over the logger (for example, you want to send the logger to another file, or manage the logger differently; this is a rare use-case).

这在需要对记录器进行精细控制时非常适用(例如,您希望将记录器发送到另一个文件,或以不同方式管理记录器;这是一个罕见的用例)。

弃用日志记录

除了常规日志记录外,Elasticsearch还允许您启用弃用操作的日志记录。例如,这允许您提前确定是否需要在将来迁移某些功能。默认情况下,弃用日志记录在WARN级别下启用,这是发出所有弃用日志消息的级别。

logger.deprecation.level = warn
这将在您的日志目录中创建一个每天滚动的弃用日志文件。请定期检查此文件,特别是当您计划升级到新的主要版本时。

默认的日志配置已经为弃用日志设置了滚动策略,以在1 GB后滚动和压缩日志,并保留最多五个日志文件(四个滚动日志和活动日志)。

您可以在config/log4j2.properties文件中将deprecation日志级别设置为error以禁用它,如下所示:

logger.deprecation.name = org.elasticsearch.deprecation
logger.deprecation.level = error
如果X-Opaque-Id被用作HTTP头,则可以识别触发弃用功能的内容。用户ID包含在弃用JSON日志中的X-Opaque-ID字段中。

{
“type”: “deprecation”,
“timestamp”: “2019-08-30T12:07:07,126+02:00”,
“level”: “WARN”,
“component”: “o.e.d.r.a.a.i.RestCreateIndexAction”,
“cluster.name”: “distribution_run”,
“node.name”: “node-0”,
“message”: “[types removal] Using include_type_name in create index requests is deprecated. The parameter will be removed in the next major version.”,
“x-opaque-id”: “MY_USER_ID”,
“cluster.uuid”: “Aq-c-PAeQiK3tfBYtig9Bw”,
“node.id”: “D7fUYfnfTLa2D7y-xw6tZg”
}
JSON日志格式
为了更容易解析Elasticsearch日志,现在以JSON格式打印日志。这是通过Log4J布局属性appender.rolling.layout.type = ESJsonLayout配置的。此布局要求设置一个type_name属性,用于在解析时区分日志流。

appender.rolling.layout.type = ESJsonLayout
appender.rolling.layout.type_name = server
每行包含一个带有ESJsonLayout中配置的属性的JSON文档。有关更多详细信息,请参阅此类的javadoc。但是,如果JSON文档包含异常,它将以多行形式打印。第一行将包含常规属性,随后的行将以JSON数组格式包含堆栈跟踪。

您仍然可以使用自定义布局。要做到这一点,替换appender.rolling.layout.type行以不同的布局。请参阅下面的示例:

appender.rolling.type = RollingFile
appender.rolling.name = rolling
appender.rolling.fileName = s y s : e s . l o g s . b a s e p a t h {sys:es.logs.base_path} sys:es.logs.basepath{sys:file.separator}${sys:es.logs.cluster_name}_server.log
appender.rolling.layout.type = PatternLayout
appender.rolling.layout.pattern = [%d{ISO8601}][%-5p][%-25c{1.}] [%node_name]%marker %.-10000m%n
appender.rolling.filePattern = ${sys:es.logs.base_path}${sys:file.separator}${sys:es.logs.cluster_name}-%d{yyyy-MM-dd}-%i.log.gz

1.14 机器学习设置

You do not need to configure any settings to use machine learning. It is enabled by default.
Machine learning uses SSE4.2 instructions, so will only work on machines whose CPUs support SSE4.2. If you run Elasticsearch on older hardware you must disable machine learning (by setting xpack.ml.enabled to false).

您无需配置任何设置即可使用机器学习,它默认为启用状态
机器学习使用SSE4.2指令,因此只能在支持SSE4.2的CPU的计算机上运行。如果在较旧的硬件上运行Elasticsearch,则必须禁用机器学习(通过将xpack.ml.enabled设置为false)。

General machine learning settings(一般的机器学习设置)

node.roles: [ ml ]

(Static) Set node.roles to contain ml to identify the node as a machine learning node that is capable of running jobs. Every node is a machine learning node by default.
If you use the node.roles setting, then all required roles must be explicitly set. Consult Node to learn more.
On dedicated coordinating nodes or dedicated master nodes, do not set the ml role.
The ml.node setting is deprecated in the 7.9.0 version, use this setting instead.

(静态)将node.roles设置为包含ml,以将节点标识为能够运行作业的机器学习节点。默认情况下,每个节点都是一个机器学习节点。
如果使用node.roles设置,则必须明确设置所有必需的角色。请参阅Node以了解更多信息。
在专用的协调节点或专用的主节点上,不要设置ml角色。
在7.9.0版本中,ml.node设置已被弃用,请改用此设置。

xpack.ml.enabled

(Static) Set to true (default) to enable machine learning APIs on the node.

(静态)设置为true(默认值)以在节点上启用机器学习API。

If set to false, the machine learning APIs are disabled on the node. Therefore the node cannot open jobs, start datafeeds, or receive transport (internal) communication requests related to machine learning APIs. If the node is a coordinating node, machine learning requests from clients (including Kibana) also fail. For more information about disabling machine learning in specific Kibana instances, see Kibana machine learning settings.

如果设置为false,则节点上的机器学习API将被禁用。因此,节点无法打开作业、启动数据源或接收与机器学习API相关的传输(内部)通信请求。如果节点是协调节点,则来自客户端(包括Kibana)的机器学习请求也将失败。有关在特定Kibana实例中禁用机器学习的更多信息,请参阅Kibana机器学习设置。

If you want to use machine learning features in your cluster, it is recommended that you set xpack.ml.enabled to true on all nodes. This is the default behavior. At a minimum, it must be enabled on all master-eligible nodes. If you want to use machine learning features in clients or Kibana, it must also be enabled on all coordinating nodes.

如果要在集群中使用机器学习功能,建议在所有节点上将xpack.ml.enabled设置为true。这是默认行为。至少在所有具有主节点资格的节点上必须启用它。如果要在客户端或Kibana中使用机器学习功能,还必须在所有协调节点上启用它。

xpack.ml.inference_model.cache_size

(Static) The maximum inference cache size allowed. The inference cache exists in the JVM heap on each ingest node. The cache affords faster processing times for the inference processor. The value can be a static byte sized value (i.e. “2gb”) or a percentage of total allocated heap. The default is “40%”. See also Machine learning circuit breaker settings.

(静态)允许的最大推理缓存大小。推理缓存存在于每个摄取节点的JVM堆中。缓存可加快推理处理器的处理速度。该值可以是一个静态的字节大小值(例如"2gb")或总分配堆的百分比。默认值为"40%"。另请参阅机器学习断路器设置。

xpack.ml.inference_model.time_to_live

(Static) The time to live (TTL) for models in the inference model cache. The TTL is calculated from last access. The inference processor attempts to load the model from cache. If the inference processor does not receive any documents for the duration of the TTL, the referenced model is flagged for eviction from the cache. If a document is processed later, the model is again loaded into the cache. Defaults to 5m.

(静态)推理模型缓存中模型的生存时间(TTL)。TTL是从上次访问计算的。推理处理器尝试从缓存中加载模型。如果在TTL的持续时间内推理处理器没有接收到任何文档,则缓存中的模型将被标记为要从缓存中清除。如果以后处理文档,则模型将再次加载到缓存中。默认值为5分钟。

xpack.ml.max_inference_processors

(Dynamic) The total number of inference type processors allowed across all ingest pipelines. Once the limit is reached, adding an inference processor to a pipeline is disallowed. Defaults to 50.

(动态)允许在所有摄取流水线中的所有推理类型处理器的总数。一旦达到限制,将禁止将推理处理器添加到流水线中。默认值为50。

xpack.ml.max_machine_memory_percent

(Dynamic) The total number of inference type processors allowed across all ingest pipelines. Once the limit is reached, adding an inference processor to a pipeline is disallowed. Defaults to 50.

(动态)机器学习可能用于运行分析进程的机器内存的最大百分比。 (这些进程与Elasticsearch JVM分开。)默认为30%。限制基于计算机的总内存,而不是当前的空闲内存。如果分配作业到节点会导致机器学习作业的估算内存使用超出限制,则不会将作业分配给节点。

xpack.ml.max_model_memory_limit

(Dynamic) The maximum model_memory_limit property value that can be set for any job on this node. If you try to create a job with a model_memory_limit property value that is greater than this setting value, an error occurs. Existing jobs are not affected when you update this setting. For more information about the model_memory_limit property, see analysis_limits.

(动态)可为此节点上的任何作业设置的最大model_memory_limit属性值。如果尝试为作业创建一个model_memory_limit属性值大于此设置值的作业,将出现错误。更新此设置时,不会影响现有作业。有关model_memory_limit属性的更多信息,请参阅analysis_limits。

xpack.ml.max_open_jobs

(Dynamic) The maximum number of jobs that can run simultaneously on a node. Defaults to 20. In this context, jobs include both anomaly detection jobs and data frame analytics jobs. The maximum number of jobs is also constrained by memory usage. Thus if the estimated memory usage of the jobs would be higher than allowed, fewer jobs will run on a node. Prior to version 7.1, this setting was a per-node non-dynamic setting. It became a cluster-wide dynamic setting in version 7.1. As a result, changes to its value after node startup are used only after every node in the cluster is running version 7.1 or higher. The maximum permitted value is 512.

(动态)可以在节点上同时运行的作业的最大数量。默认值为20。在此上下文中,作业包括异常检测作业和数据帧分析作业。作业的最大数量也受内存使用的限制。因此,如果作业的估算内存使用超过了允许的限制,节点上将运行较少的作业。在7.1版本之前,此设置是每个节点的非动态设置。它在7.1版本中成为群集范围的动态设置。因此,在节点启动后更改其值仅在群集中的每个节点都运行7.1或更高版本时才会使用。允许的最大值为512。

xpack.ml.node_concurrent_job_allocations

(Dynamic) The maximum number of jobs that can concurrently be in the opening state on each node. Typically, jobs spend a small amount of time in this state before they move to open state. Jobs that must restore large models when they are opening spend more time in the opening state. Defaults to 2.

(动态)每个节点上可以同时处于正在打开状态的作业的最大数量。通常,作业在进入打开状态之前会花费很少的时间。在打开状态时必须恢复大型模型的作业花费更多的时间。默认值为2。

Advanced machine learning settings(高级机器学习设置)

这些设置适用于高级用例;通常情况下,默认值已足够:

xpack.ml.enable_config_migration

(动态)保留。

xpack.ml.max_anomaly_records

(动态)每个存储桶输出的最大记录数。默认值为500。

xpack.ml.max_lazy_ml_nodes

(Dynamic) The number of lazily spun up machine learning nodes. Useful in situations where machine learning nodes are not desired until the first machine learning job opens. It defaults to 0 and has a maximum acceptable value of 3. If the current number of machine learning nodes is greater than or equal to this setting, it is assumed that there are no more lazy nodes available as the desired number of nodes have already been provisioned. If a job is opened and this setting has a value greater than zero and there are no nodes that can accept the job, the job stays in the OPENING state until a new machine learning node is added to the cluster and the job is assigned to run on that node.

(动态)惰性启动的机器学习节点数。在不需要机器学习节点直到第一个机器学习作业打开的情况下有用。默认为0,最大可接受值为3。如果当前的机器学习节点数大于或等于此设置值,则假定没有更多的懒惰节点可用,因为已经提供了所需数量的节点。如果打开了一个作业并且此设置的值大于零,并且没有节点可以接受作业,则作业将保持在OPENING状态,直到向群集添加新的机器学习节点,并将作业分配给该节点。

This setting assumes some external process is capable of adding machine learning nodes to the cluster. This setting is only useful when used in conjunction with such an external process.

此设置假定某个外部进程能够将机器学习节点添加到群集中。只有在与此类外部进程一起使用时,此设置才有用。

xpack.ml.process_connect_timeout

This setting assumes some external process is capable of adding machine learning nodes to the cluster. This setting is only useful when used in conjunction with such an external process.

(动态)与Elasticsearch JVM分开运行的机器学习进程的连接超时。默认为10秒。某些机器学习处理由与Elasticsearch JVM分开运行的进程执行。启动此类进程时,它们必须连接到Elasticsearch JVM。如果此类进程在由此设置指定的时间段内未连接,则认为该进程已失败。默认为10秒。此设置的最小值为5秒。

Machine learning circuit breaker settings(机器学习断路器设置)

breaker.model_inference.limit

(Dynamic) Limit for the model inference breaker, which defaults to 50% of the JVM heap. If the parent circuit breaker is less than 50% of the JVM heap, it is bound to that limit instead. See Circuit breaker settings.

(动态)模型推断断路器的限制,默认为JVM堆的50%。如果父断路器小于JVM堆的50%,则它将绑定到该限制。请参阅断路器设置。

breaker.model_inference.overhead

(Dynamic) Limit for the model inference breaker, which defaults to 50% of the JVM heap. If the parent circuit breaker is less than 50% of the JVM heap, it is bound to that limit instead. See Circuit breaker settings.

(动态)将所有会计估算乘以以确定最终估算的常数。默认值为1。请参阅断路器设置。

breaker.model_inference.type

(Dynamic) Limit for the model inference breaker, which defaults to 50% of the JVM heap. If the parent circuit breaker is less than 50% of the JVM heap, it is bound to that limit instead. See Circuit breaker settings.

(静态)断路器的基本类型。有两个有效选项:noop和memory。noop表示断路器不会采取任何措施来防止过多的内存使用。memory表示断路器跟踪推理模型使用的内存,可能会断开并防止内存不足错误。默认值为memory。

1.15 Monitoring settings in Elasticsearch (ES监控设置)

https://www.elastic.co/guide/en/elasticsearch/reference/7.9/monitoring-settings.html

1.16 Node (节点)

Any time that you start an instance of Elasticsearch, you are starting a node. A collection of connected nodes is called a cluster. If you are running a single node of Elasticsearch, then you have a cluster of one node.

每次启动Elasticsearch实例时,都会启动一个节点。连接的节点集合称为集群。如果运行的是Elasticsearch的单个节点,则您将拥有一个节点的集群。

Every node in the cluster can handle HTTP and Transport traffic by default. The transport layer is used exclusively for communication between nodes; the HTTP layer is used by REST clients.

默认情况下,集群中的每个节点都可以处理HTTP和传输流量。传输层专门用于节点之间的通信;HTTP层由REST客户端使用。

All nodes know about all the other nodes in the cluster and can forward client requests to the appropriate node.

所有节点都知道集群中的所有其他节点,并可以将客户端请求转发到适当的节点。

By default, a node is all of the following types: master-eligible, data, ingest, and (if available) machine learning. All data nodes are also transform nodes.

默认情况下,节点具有以下所有类型:主节点、数据节点、摄取节点和(如果可用)机器学习节点。所有数据节点也都是转换节点。

As the cluster grows and in particular if you have large machine learning jobs or continuous transforms, consider separating dedicated master-eligible nodes from dedicated data nodes, machine learning nodes, and transform nodes.

随着集群的增长,特别是如果您有大型机器学习作业或连续转换,考虑将专用的主节点、数据节点、机器学习节点和转换节点与专用的协调节点分开。

Node roles(节点角色)

You can define the roles of a node by setting node.roles. If you don’t configure this setting, then the node has the following roles by default:

您可以通过设置node.roles来定义节点的角色。如果您不配置此设置,那么节点将默认具有以下角色:

  • 主节点
  • 数据节点
  • 摄取节点
  • 机器学习节点
  • 远程集群客户端
  • 转换节点

If you set node.roles, the node is assigned only the roles you specify.

如果设置了node.roles,节点将仅分配您指定的角色。

  • Master-eligible node(主节点)

A node that has the master role (default), which makes it eligible to be elected as the master node, which controls the cluster.

主节点具有主要角色(默认情况下),这使得它有资格被选为主节点,从而控制集群。

  • Data node(数据节点)

A node that has the data role (default). Data nodes hold data and perform data related operations such as CRUD, search, and aggregations.

数据节点具有数据角色(默认情况下)。数据节点保存数据并执行与数据相关的操作,如CRUD、搜索和聚合。

  • Ingest node(摄取节点)

A node that has the ingest role (default). Ingest nodes are able to apply an ingest pipeline to a document in order to transform and enrich the document before indexing. With a heavy ingest load, it makes sense to use dedicated ingest nodes and to not include the ingest role from nodes that have the master or data roles.

摄取节点具有摄取角色(默认情况下)。摄取节点能够将摄取管道应用于文档,以在索引之前转换和丰富文档。在有大量摄取负载的情况下,使用专用的摄取节点并且不包括具有主节点或数据节点角色的节点中的摄取角色是有意义的。

  • Remote-eligible node(远程集群客户端节点)

A node that has the remote_cluster_client role (default), which makes it eligible to act as a remote client. By default, any node in the cluster can act as a cross-cluster client and connect to remote clusters.

具有远程_cluster_client角色(默认情况下)的节点有资格充当远程客户端。默认情况下,集群中的任何节点都可以充当跨集群客户端,并连接到远程集群。

  • Machine learning node(机器学习节点)

A node that has xpack.ml.enabled and the ml role, which is the default behavior in the Elasticsearch default distribution. If you want to use machine learning features, there must be at least one machine learning node in your cluster. For more information about machine learning features, see Machine learning in the Elastic Stack.
If you use the OSS-only distribution, do not add the ml role. Otherwise, the node fails to start.

具有xpack.ml.enabled和ml角色的节点,默认情况下在Elasticsearch默认分发中是默认行为。如果要在集群中使用机器学习功能,必须至少有一个机器学习节点。有关机器学习功能的更多信息,请参阅Elastic Stack中的机器学习。
如果使用的是OSS-only分发,请勿添加ml角色。否则,节点将无法启动。

  • Transform node(转换节点)

    A node that has the transform role. If you want to use transforms, there must be at least one transform node in your cluster. For more information, see Transforms settings and Transforming data.

具有转换角色的节点。如果要使用转换功能,集群中必须至少有一个转换节点。有关更多信息,请参阅转换设置和数据转换。

Coordinating node(协调节点)

equests like search requests or bulk-indexing requests may involve data held on different data nodes. A search request, for example, is executed in two phases which are coordinated by the node which receives the client request — the coordinating node.
In the scatter phase, the coordinating node forwards the request to the data nodes which hold the data. Each data node executes the request locally and returns its results to the coordinating node. In the gather phase, the coordinating node reduces each data node’s results into a single global result set.
Every node is implicitly a coordinating node. This means that a node that has an explicit empty list of roles via node.roles will only act as a coordinating node, which cannot be disabled. As a result, such a node needs to have enough memory and CPU in order to deal with the gather phase.

像搜索请求或批量索引请求这样的请求可能涉及存储在不同数据节点上的数据。例如,搜索请求在由接收客户端请求的节点协调的两个阶段中执行。
在散布阶段中,协调节点将请求转发到保存数据的数据节点。每个数据节点在本地执行请求并将其结果返回给协调节点。在收集阶段中,协调节点将每个数据节点的结果减少为单个全局结果集。
每个节点都隐式地充当协调节点。这意味着具有通过node.roles具有显式空角色列表的节点将仅充当协调节点,无法禁用。因此,此类节点需要具有足够的内存和CPU,以处理收集阶段。

Master-eligible node(主节点)

The master node is responsible for lightweight cluster-wide actions such as creating or deleting an index, tracking which nodes are part of the cluster, and deciding which shards to allocate to which nodes. It is important for cluster health to have a stable master node.

主节点负责轻量级的集群范围操作,例如创建或删除索引、跟踪哪些节点是集群的一部分,并决定将哪些分片分配给哪些节点。拥有稳定的主节点对集群健康至关重要。

Any master-eligible node that is not a voting-only node may be elected to become the master node by the master election process.

任何非仅投票节点的主节点都可能通过主选举过程被选为主节点。

Master nodes must have access to the data/ directory (just like data nodes) as this is where the cluster state is persisted between node restarts.

主节点必须访问数据/目录(与数据节点一样),因为这是在节点重新启动之间持久保存集群状态的位置。

Dedicated master-eligible node(专用主节点)

It is important for the health of the cluster that the elected master node has the resources it needs to fulfill its responsibilities. If the elected master node is overloaded with other tasks then the cluster may not operate well. In particular, indexing and searching your data can be very resource-intensive, so in large or high-throughput clusters it is a good idea to avoid using the master-eligible nodes for tasks such as indexing and searching. You can do this by configuring three of your nodes to be dedicated master-eligible nodes. Dedicated master-eligible nodes only have the master role, allowing them to focus on managing the cluster. While master nodes can also behave as coordinating nodes and route search and indexing requests from clients to data nodes, it is better not to use dedicated master nodes for this purpose.

为了集群的健康,当选为主节点的节点必须具备其履行职责所需的资源。如果选举为主节点的节点过载其他任务,那么集群可能无法正常运行。特别是,索引和搜索数据可能会非常耗费资源,因此在大型或高吞吐量的集群中,最好不要将主节点用于索引和搜索等任务。您可以通过将三个节点配置为专用的主节点来实现这一点。专用的主节点仅具有主角色,允许它们专注于管理集群。虽然主节点也可以充当协调节点,并将来自客户端到数据节点的搜索和索引请求路由,但最好不要将专用主节点用于此目的。

To create a dedicated master-eligible node, set:

要创建专用的主节点,请设置:

node.roles: [ master ]

Voting-only master-eligible node(仅投票的主节点)

A voting-only master-eligible node is a node that participates in master elections but which will not act as the cluster’s elected master node. In particular, a voting-only node can serve as a tiebreaker in elections.

仅投票的主节点是参与主选举但不会充当集群选举主节点的节点。特别地,仅投票节点可以在选举中充当决定胜负的节点。

It may seem confusing to use the term “master-eligible” to describe a voting-only node since such a node is not actually eligible to become the master at all. This terminology is an unfortunate consequence of history: master-eligible nodes are those nodes that participate in elections and perform certain tasks during cluster state publications, and voting-only nodes have the same responsibilities even if they can never become the elected master.

将“主要资格”用于描述仅投票节点似乎可能会令人困惑,因为这样的节点实际上根本没有资格成为主节点。这个术语是历史的不幸后果:主要资格节点是参与选举并在集群状态发布期间执行某些任务的节点,即使它们永远不会成为选举的主节点,它们也具有相同的职责。

To configure a master-eligible node as a voting-only node, include master and voting_only in the list of roles. For example to create a voting-only data node:

要将主要资格节点配置为仅投票节点,将master和voting_only包括在角色列表中。例如,要创建仅投票的数据节点:

node.roles: [ data, master, voting_only ]

The voting_only role requires the default distribution of Elasticsearch and is not supported in the OSS-only distribution. If you use the OSS-only distribution and add the voting_only role then the node will fail to start. Also note that only nodes with the master role can be marked as having the voting_only role.

voting_only角色需要Elasticsearch的默认分发,并不支持OSS-only分发。如果使用OSS-only分发并添加voting_only角色,节点将无法启动。还要注意,只有具有主要资格的节点才能被标记为具有voting_only角色。

High availability (HA) clusters require at least three master-eligible nodes, at least two of which are not voting-only nodes. Such a cluster will be able to elect a master node even if one of the nodes fails.

高可用性(HA)集群需要至少三个具有主要资格的节点,其中至少有两个不是仅投票节点。这样的集群将能够在一个节点失败时选择主节点。

Since voting-only nodes never act as the cluster’s elected master, they may require require less heap and a less powerful CPU than the true master nodes. However all master-eligible nodes, including voting-only nodes, require reasonably fast persistent storage and a reliable and low-latency network connection to the rest of the cluster, since they are on the critical path for publishing cluster state updates.

由于仅投票节点从不充当集群的选举主节点,因此它们可能需要比真正的主节点更少的堆和更弱大的CPU。然而,所有具有主要资格的节点,包括仅投票节点,都需要相对快速的持久性存储和可靠且低延迟的网络连接到集群的其余部分,因为它们处于发布集群状态更新的关键路径上。

Voting-only master-eligible nodes may also fill other roles in your cluster. For instance, a node may be both a data node and a voting-only master-eligible node. A dedicated voting-only master-eligible nodes is a voting-only master-eligible node that fills no other roles in the cluster. To create a dedicated voting-only master-eligible node in the default distribution, set:

仅投票的主节点还可以填充集群中的其他角色。例如,一个节点可以既是数据节点又是仅投票的主要资格节点。专用的仅投票主要资格节点是仅充当投票主要资格节点的节点,不在集群中填充其他角色。要在默认分发中创建专用的仅投票主节点,请设置:

node.roles: [ master, voting_only ]

Data node(数据节点)

Data nodes hold the shards that contain the documents you have indexed. Data nodes handle data related operations like CRUD, search, and aggregations. These operations are I/O-, memory-, and CPU-intensive. It is important to monitor these resources and to add more data nodes if they are overloaded.

数据节点保存包含您已索引的文档的分片。数据节点处理与数据相关的操作,如CRUD、搜索和聚合。这些操作需要I/O、内存和CPU。监控这些资源并在负载过重时添加更多数据节点非常重要。

The main benefit of having dedicated data nodes is the separation of the master and data roles.

具有专用数据节点的主要好处是主节点和数据角色的分离。

To create a dedicated data node, set:

要创建专用数据节点,请设置:

node.roles: [ data ]

摄取节点
摄取节点可以执行预处理管道,由一个或多个摄取处理器组成。根据摄取处理器执行的操作类型和所需资源,可能有必要使用专用摄取节点,仅执行此特定任务。

要创建专用摄取节点,请设置:

node.roles: [ ingest ]

Ingest node(仅协调节点)

If you take away the ability to be able to handle master duties, to hold data, and pre-process documents, then you are left with a coordinating node that can only route requests, handle the search reduce phase, and distribute bulk indexing. Essentially, coordinating only nodes behave as smart load balancers.

如果剥夺了处理主职责、持有数据职责和预处理文档的能力,那么您将得到一个仅能够路由请求、处理搜索减少阶段并分发批量索引的协调节点。本质上,仅协调节点行为类似于智能负载均衡器。

Coordinating only nodes can benefit large clusters by offloading the coordinating node role from data and master-eligible nodes. They join the cluster and receive the full cluster state, like every other node, and they use the cluster state to route requests directly to the appropriate place(s).

仅协调节点可以通过卸载数据和主要资格节点的协调节点角色来减轻大型集群的协调节点负担。它们加入集群并接收完整的集群状态,就像其他每个节点一样,然后使用集群状态将请求直接路由到适当的位置。

Coordinating only nodes can benefit large clusters by offloading the coordinating node role from data and master-eligible nodes. They join the cluster and receive the full cluster state, like every other node, and they use the cluster state to route requests directly to the appropriate place(s).

将过多的仅协调节点添加到集群中可能会增加整个集群的负担,因为选定的主节点必须等待来自每个节点的集群状态更新的确认!不应过分强调仅协调节点的好处 - 数据节点可以愉快地担任相同的目的。

To create a dedicated coordinating node, set:

要创建专用协调节点,请设置:

node.roles: [ ]

Remote-eligible node(远程集群客户端节点)

By default, any node in a cluster can act as a cross-cluster client and connect to remote clusters. Once connected, you can search remote clusters using cross-cluster search. You can also sync data between clusters using cross-cluster replication.

默认情况下,集群中的任何节点都可以充当跨集群客户端,并连接到远程集群。连接后,您可以使用跨集群搜索进行跨集群搜索。还可以使用跨集群复制在集群之间同步数据。

node.roles: [ remote_cluster_client ]

Machine learning node(机器学习节点)

The machine learning features provide machine learning nodes, which run jobs and handle machine learning API requests. If xpack.ml.enabled is set to true and the node does not have the ml role, the node can service API requests but it cannot run jobs.

机器学习功能提供机器学习节点,运行作业并处理机器学习API请求。如果xpack.ml.enabled设置为true,并且节点没有ml角色,则节点可以提供API请求,但不能运行作业。

If you want to use machine learning features in your cluster, you must enable machine learning (set xpack.ml.enabled to true) on all master-eligible nodes. If you want to use machine learning features in clients (including Kibana), it must also be enabled on all coordinating nodes. If you have the OSS-only distribution, do not use these settings.

如果要在集群中使用机器学习功能,必须在所有具有主要资格的节点上启用机器学习(将xpack.ml.enabled设置为true)。如果要在客户端(包括Kibana)中使用机器学习功能,还必须在所有协调节点上启用它。如果使用的是OSS-only分发,请不要使用这些设置。

To create a dedicated machine learning node in the default distribution, set:

要在默认分发中创建专用机器学习节点,请设置:

node.roles: [ ml ]
xpack.ml.enabled: true

注意:xpack.ml.enabled设置默认情况下是启用的。

Transform node(转换节点)

Transform nodes run transforms and handle transform API requests. If you have the OSS-only distribution, do not use these settings. For more information, see Transforms settings.
To create a dedicated transform node in the default distribution, set:

转换节点运行转换并处理转换API请求。如果您使用的是OSS-only分发,请不要使用这些设置。有关更多信息,请参阅转换设置。
要在默认分发中创建专用转换节点,请设置:

node.roles: [ transform ]

Changing the role of a node(更改节点的角色)

Each data node maintains the following data on disk:

  • the shard data for every shard allocated to that node,
  • the index metadata corresponding with every shard allocated to that node, and
  • the cluster-wide metadata, such as settings and index templates.

每个数据节点在磁盘上维护以下数据:

  • 分配给该节点的每个分片的分片数据,
  • 与分配给该节点的每个分片对应的索引元数据,
  • 集群范围的元数据,如设置和索引模板。

Similarly, each master-eligible node maintains the following data on disk:

  • the index metadata for every index in the cluster, and
  • the cluster-wide metadata, such as settings and index templates.

类似地,每个主要资格节点在磁盘上维护以下数据:

  • 集群中每个索引的索引元数据,
  • 集群范围的元数据,如设置和索引模板。

Each node checks the contents of its data path at startup. If it discovers unexpected data then it will refuse to start. This is to avoid importing unwanted dangling indices which can lead to a red cluster health. To be more precise, nodes without the data role will refuse to start if they find any shard data on disk at startup, and nodes without both the master and data roles will refuse to start if they have any index metadata on disk at startup.

每个节点在启动时都会检查其数据路径的内容。
如果发现意外的数据,那么它将拒绝启动。这是为了避免导入不想要的悬挂索引,这可能会导致集群健康不良。更准确地说,没有数据角色的节点会在启动时拒绝启动,如果它们在启动时在磁盘上找到任何分片数据,而没有主和数据角色的节点会在启动时在磁盘上找到任何索引元数据时拒绝启动。

It is possible to change the roles of a node by adjusting its elasticsearch.yml file and restarting it. This is known as repurposing a node. In order to satisfy the checks for unexpected data described above, you must perform some extra steps to prepare a node for repurposing when starting the node without the data or master roles.

可以通过调整其elasticsearch.yml文件并重新启动来更改节点的角色。这被称为重新用途节点。为了满足上述意外数据检查的要求,当启动没有数据或主节点角色的节点时,您必须执行一些额外的步骤来准备节点以重新用途。

If you want to repurpose a data node by removing the data role then you should first use an allocation filter to safely migrate all the shard data onto other nodes in the cluster.
If you want to repurpose a node to have neither the data nor master roles then it is simplest to start a brand-new node with an empty data path and the desired roles. You may find it safest to use an allocation filter to migrate the shard data elsewhere in the cluster first.

  • 如果要通过删除数据角色来重新用途数据节点,那么您应该首先使用分配过滤器安全地将所有分片数据迁移到集群中的其他节点。
  • 如果要将节点重新用途为既不具有数据角色也不具有主角色的节点,那么最简单的方法是使用空数据路径和所需角色启动一个全新的节点。在使用分配过滤器将分片数据迁移到集群中的其他地方之前,您可能会发现将分片数据迁移到其他地方更安全。

If it is not possible to follow these extra steps then you may be able to use the elasticsearch-node repurpose tool to delete any excess data that prevents a node from starting.

如果不可能执行这些额外的步骤,那么您可能可以使用elasticsearch-node重新用途工具来删除阻止节点启动的多余数据。

Node data path settings(节点数据路径设置)

path.data

Every data and master-eligible node requires access to a data directory where shards and index and cluster metadata will be stored. The path.data defaults to $ES_HOME/data but can be configured in the elasticsearch.yml config file an absolute path or a path relative to $ES_HOME as follows:

每个数据和主要资格节点都需要访问数据目录,其中将存储分片、索引和集群元数据。path.data默认为$ES_HOME/data,但可以在elasticsearch.yml配置文件中配置为绝对路径或相对于$ES_HOME的路径,如下所示:

path.data:  /var/elasticsearch/data

与所有节点设置一样,它也可以在命令行上指定,如下所示:

./bin/elasticsearch -Epath.data=/var/elasticsearch/data

When using the .zip or .tar.gz distributions, the path.data setting should be configured to locate the data directory outside the Elasticsearch home directory, so that the home directory can be deleted without deleting your data! The RPM and Debian distributions do this for you already.

使用.zip或.tar.gz分发时,path.data设置应配置为在Elasticsearch主目录之外定位数据目录,以便可以删除主目录而不删除数据!RPM和Debian分发已经为您执行了此操作。

node.max_local_storage_nodes

数据路径可以由多个节点共享,甚至可以由来自不同集群的节点共享。但建议仅运行一个使用相同数据路径的Elasticsearch节点。此设置在7.x中已弃用,并将在版本8.0中删除。

默认情况下,Elasticsearch配置为防止超过一个节点共享相同的数据路径。要允许超过一个节点(例如,在开发机器上),请使用node.max_local_storage_nodes设置,并将其设置为大于1的正整数。

Never run different node types (i.e. master, data) from the same data directory. This can lead to unexpected data loss.

永远不要从相同的数据目录运行不同类型的节点(即主节点、数据节点)。这可能导致意外数据丢失。

Other node settings(其他节点设置)

More node settings can be found in Configuring Elasticsearch and Important Elasticsearch configuration, including:

可以在Configuring Elasticsearch和Important Elasticsearch配置中找到更多的节点设置,包括:

  • cluster.name
  • node.name
  • network settings

Network settings

Node query cache settings

Search settings

Security settings

Shard request cache settings

Snapshot lifecycle management settings

Transforms settings

Transport

Thread pools

Watcher settings

猜你喜欢

转载自blog.csdn.net/qq_29864051/article/details/133523082