InfluxDB Enterprise Cluster Installation Experiment 01-Introduction
Foreword
Recently, the company intends to purchase the enterprise version cluster of InfluxDB, just to take the time to build a set of hands-on practice to test the basic functions.
lab environment
Operating system: CentOS Linux release 7.3.1611 (Core)
influxdb version: 1.7.9
machine:
Machine name | ip |
---|---|
influxdb01 | 10.11.100.73 |
influxdb02 | 10.11.100.74 |
influxdb03 | 10.11.100.75 |
Precondition
1. Key configuration requirements
Influxdb's secret key configuration has two methods: license key and license file. If you use the license key for key verification, you must ensure that all nodes can access port 80 or port 443 of the portal portal.influxdata.com. If you cannot connect to the portal website for more than 4 hours, the key verification will fail, and the entire cluster will be unavailable.
Note: I personally recommend using the license file verification method in production to avoid cluster failure due to external network fluctuations
2. Ensure the connection of each host
All machines in the cluster need to ensure that the host name and ip are resolved from each other and that the network is smooth.
Add the host name to the hosts file of all machines:
[root@influxdb01 opt]# cat /etc/hosts
10.11.100.73 influxdb01
10.11.100.74 influxdb02
10.11.100.75 influxdb03
In the default configuration, you need to ensure that the ports 8086,8088,8089,8091 of all nodes are unblocked.
3. Time synchronization
InfluxDB Enterprise uses the UTC local time of each host as the data distribution timestamp to achieve the purpose of coordination. Need to use ntp to synchronize the time of each host.
4. Hard disk requirements
InfluxDB Enterprise requires the hard disk to have 1000-2000 IOPS, otherwise the cluster will have problems related to IOPS contention. It is recommended to use ssd.
Architecture
1. Architecture Overview
Influxdb cluster consists of three parts: data nodes (data nodes), meta nodes (meta nodes), Enterprise web server (web server).
It can be seen as two independent clusters communicating with each other, a meta-node cluster and a data node.
The number of meta nodes must be an odd number to ensure that the arbitration takes effect. At the same time, in order to ensure high availability, it is best to use 3 meta nodes, so that if one meta node is damaged, the cluster can still run with the remaining 2 meta nodes until the third meta node is replaced. Too many meta-nodes will lead to exponentially increased communication overhead, so it is not recommended to use too many meta-nodes. The official recommendation is 3.
The minimum number of data nodes is 1. Generally need to set according to the replication factor (replication factor), for example, the replication factor is 2, you need to run a multiple of 2 data nodes.
The architecture diagram is as follows: the
meta node defaults to internal access through port 8089 and external access through port 8091.
The data node defaults to internal access through port 8088 and metanode through port 8091.
In a cluster, all meta nodes must be able to communicate with other meta nodes. All data nodes must be able to communicate with other data nodes and all meta nodes.
2. meta nodes (meta nodes)
The meta node holds all the following metadata:
- Information about all nodes and their roles in the cluster
- All database information and retention policies in the cluster
- All shards and shard groups, as well as their nodes
- Cluster users and their permissions
- All continuous queries
The meta node will save these data in its raft library, the default storage path is: /var/lib/influxdb/meta/raft.db
3. data nodes (data nodes)
The data node stores all original time series data and metadata, including:
- measurements (similar to table)
- Tag (tag) key and value
- Field keys and values
The data node will store the data in the format of / <retention_policy> / <shard_id>, the default storage path is: / var / lib / influxdb / data