This article mainly introduces the configuration files of ClickHouse. The configuration in ClickHouse is mainly divided into two categories, one is responsible for server-side configuration, and the other is responsible for user-side configuration. Those responsible for server-side configuration are generally placed config.xml
in the file, and those responsible for user-side configuration are generally placed users.xml
in the file. config.xml
Of course, it is also possible to put both in the file, but according to the custom, the two will be divided into two files for configuration. config.xml
Therefore, we will not separate and introduce the following functions in the future users.xml
, they are all universal.
Multiple Profiles Capabilities
In order to have greater flexibility in configuration files, ClickHouse also supports the multi-configuration file function, that is, the configuration responsible for different functions can be divided into multiple configuration files. The content is merged as the final configuration. The advantage of this is that the configuration can be classified and managed. For example, the configuration related to the ClickHouse cluster topology can be independently formed as a configuration file (usually named clusters.xml
), and the macro-related configuration can also be independently formed as a configuration file (usually named macros.xml
).
If you want to use the multi-configuration file function, you need to understand how ClickHouse loads configuration files. The default main configuration path of ClickHouse is /etc/clickhouse-server/config.xml
(of course, you can use to specify the path of the configuration file when ClickHouse Server is started --config-file=/etc/config/config.xml
), if there is config.xml
a directory in the folder where is located config.d
, ClickHouse will traverse all the files in this directory and save the content to Merge to generate the final configuration content. The above steps will be executed every time ClickHouse is restarted.
How does ClickHouse deal with setting different values for the same configuration? Let's take macro
as an example to see how ClickHouse handles it. The experiment is divided into two situations, one is config.xml
repeated configuration marco
, and the other is repeated configuration with different configuration files marco
.
Single file duplicate configuration
We config.xml
add the following configuration in , and then start ClickHouse Server to check a
the value of .
<clickhouse>
......
<macros>
<a>1</a>
</macros>
<macros>
<a>2</a>
</macros>
</clickhouse>
The value queried by the following statement a
is 1.
SELECT * FROM system.macros
┌─macro─┬─substitution─┐
│ a │ 1 │
└───────┴──────────────┘
This means that in the same configuration file, if the same configuration parameter is configured with different values, ClickHouse will use the value that appears first.
Multi-file duplicate configuration
config.d
Configure the two files a.xml
and in the directory respectively b.xml
, and configure both macro
.
a.xml:
<clickhouse>
<macros>
<a>1</a>
</macros>
</clickhouse>
b.xml:
<clickhouse>
<macros>
<a>2</a>
</macros>
</clickhouse>
Also, the value queried by the above statement a
is 2.
SELECT * FROM system.macros
┌─macro─┬─substitution─┐
│ a │ 2 │
└───────┴──────────────┘
Why does it take the value 2 instead of 1 this time? We can observe the logs of ClickHouse Server:
2023.01.07 11:26:51.092322 [ 25669095 ] {
} <Debug> ConfigReloader: Loading config 'config.xml'
Processing configuration file 'config.xml'.
Merging configuration file 'config.d/a.xml'.
Merging configuration file 'config.d/b.xml'.
ClickHouse will load first config.xml
, and then traverse config.d
the directory to load all configuration files in alphabetical order of file names. If different files contain the same configuration parameters, the later loaded ones will overwrite the previous parameter values. If you a.xml
change to c.xml
, the final execution SELECT * FROM system.macros
result will be 1, you can try it yourself.
summary
Although we understand the behavior of ClickHouse loading configuration files, we still try to avoid the problem of repeated configuration.
Configuration replacement function
ClickHouse supports replacing configuration values with environment variables, xml stanzas, and zookeeper node values.
Use environment variable substitution
ClickHouse supports using in the xml section from_env="xxx"
to use environment variables to replace the current configuration values. The usage is as follows:
<clickhouse>
<macros>
<replica from_env="REPLICA" />
</macros>
</clickhouse>
Environment variables can export REPLICA=0
be specified by , and the query result is 0 by SELECT * FROM system.macros WHERE macro = 'replica'
. Equivalent to configuration <replica>0</replica>
.
Replace with xml section
ClickHouse supports using in the xml section incl="xxx"
to specify an xml section to replace the current xml section. The usage is as follows:
<clickhouse>
<zookeeper incl="zookeeper-servers" optional="true">
<node>
<host>host1</host>
<port>2181</port>
</node>
</zookeeper>
</clickhouse>
<clickhouse>
<zookeeper-servers>
<node>
<host>host2</host>
<port>2182</port>
</node>
</zookeeper-servers>
</clickhouse>
This <zookeeper>
will be <zookeeper-servers>
replaced by the content contained in the . Equivalent to the following configuration:
<clickhouse>
<zookeeper>
<node>
<host>host2</host>
<port>2182</port>
</node>
</zookeeper>
</clickhouse>
optional="true"
This attribute is to avoid the problem that an error will be reported if the specified xml section does not exist. If yes true
, and <zookeeper-servers>
does not exist, <zookeeper>
the configuration with host1 and port 2181 will be used. In general, it is not recommended to configure optional="true"
. It involves key configuration information. If there is an error, it should be reported in advance to avoid using the wrong configuration.
Replace with zookeeper node value
ClickHouse supports using in the xml section from_zk="xxx"
to specify a zookeeper node value (requires xml section form) to replace the current xml section. The usage is as follows:
<clickhouse>
<remote_servers from_zk="/clickhouse/remote_servers">
<default>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>host1</host>
<port>9000</port>
</replica>
<replica>
<host>host2</host>
<port>9000</port>
</replica>
</shard>
</default>
</remote_servers>
</clickhouse>
<!-- zookeeper 节点/clickhouse/remote_servers内容如下 -->
<default>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>host3</host>
<port>9000</port>
</replica>
<replica>
<host>host4</host>
<port>9000</port>
</replica>
</shard>
</default>
In this way, the cluster configuration named default will be /clickhouse/remote_servers
replaced by the value of the node on zookeeper. Please pay attention to the tightening in the zookeeper node, which will be brought into the configuration. If you want to consider it, you can keep the tightening in the zookeeper node.
Configuration supports yaml format
ClickHouse also supports configuration files in yaml format. For specific examples, please refer to config.yaml.example . And ClickHouse also supports the mixed use of yaml and xml, but you cannot use yaml and xml in one file at the same time. This section does not introduce this part too much. from_env="xxx"
Because yaml is not very intuitive and easy to understand when expressing the attributes of the xml section (such as ), and the general production environment still uses the xml format as the configuration file by default, so this part is enough to understand and is not recommended.
Welcome to add WeChat: xiedeyantu to discuss technical issues.