1 Doris compiled
Apache Doris provides a version compression package that can be deployed directly: https://cloud.baidu.com/doc/PALO/s/Ikivhcwb5
You can also use it after compiling the compressed package yourself (recommended)
1.1 Compile with Docker development image (recommended)
This is recommended by the official documentation. It can compile the source code very conveniently and smoothly. If you need to deploy quickly, you can use this method. The advantage of this method is that there is no need to configure environment variables, and there is no need to consider various version issues. After entering the development mirror system, you can directly download the doris source code and compile it.
First, you need to install Docker. It is relatively simple to install Docker under Linux, so I won’t introduce it here.
After starting the Docker service (systemctl status docker), we directly pull the image and start compiling Doris.
Download the Doris image
Pull the official Docker image provided by Doris, the currently available versions are: build-env, build-env-1.1, build-env-1.2
docker pull apache/doris:build-env-for-0.15.0
View the Docker image
docker images
Notice:
For different Doris versions, you need to download the corresponding mirror version. Starting from version 0.15 of Apache Doris, the version numbers of subsequent images will be unified with those of Doris.
run mirror
Save the package downloaded by maven in the container to a file specified locally on the host to avoid repeated downloads, and at the same time save the compiled Doris file to a file specified locally on the host for easy deployment
docker run -it -v /u01/.m2:/root/.m2 -v /u01/incubator-doris-DORIS-0.15-release/:/root/incubator-doris-DORIS-0.15-release/ apache/doris:build-env-for-0.15.0
After opening, it is inside the container
Download the installation package of Doris
Enter the docker container
cd /opt
wget https://mirrors.tuna.tsinghua.edu.cn/apache/doris/0.15.0-incubating/apache-doris-0.15.0-incubating-src.tar.gz
Unzip and install
tar -zxvf apache-doris-0.15.0-incubating-src.tar.gz
start compiling
cd apache-doris-0.15.0-incubating-src
sh build.sh
Note: The jdk version used for compilation needs to be the same as that of the deployment environment, otherwise an exception will be reported
Compiled and exported to the server
The compiled files are in the output directory
#docker cp 容器:docker路径 本地路径
docker cp mystifying_swanson:/opt/apache-doris-0.15.0-incubating-src/output /home/
2 Installation and deployment
As an open source MPP architecture OLAP database, Doris can run on most mainstream commercial servers. In order to take full advantage of the concurrency advantages of the MPP architecture and the high availability features of Doris, we recommend that the deployment of Doris follow the following requirements:
- Linux operating system version requirements
Linux system |
Version |
hundred |
7.1 and above |
Ubuntu |
16.04 and above |
- Software Requirements
software |
Version |
Java |
1.8 and above |
GCC |
7.3 and above |
- Development and testing environment
module |
CPU |
Memory |
disk |
network |
number of instances |
Frontend |
8 cores + |
8GB+ |
SSD or SATA, 10GB+* |
Gigabit Ethernet |
1 |
Backend |
8 cores + |
16GB+ |
SSD or SATA, 50GB+ * |
Gigabit Ethernet |
1-3 * |
- Production Environment
module |
CPU |
Memory |
disk |
network |
number of instances |
Frontend |
16 cores+ |
64GB+ |
SSD or RAID, 100GB+* |
10 Gigabit NIC |
1-5 * |
Backend |
16 cores+ |
64GB+ |
SSD or SATA, 100GB+* |
10 Gigabit NIC |
10-100 * |
Notice:
- The disk space of FE is mainly used to store metadata, including logs and images. Usually anywhere from a few hundred MB to several GB.
- The disk space of BE is mainly used to store user data. The total disk space is calculated by the total amount of user data * 3 (3 copies), and then an additional 40% of the space is reserved for background compaction and storage of some intermediate data.
- Multiple BE instances can be deployed on one machine, but only one FE can be deployed. If 3 copies of data are required, at least 3 machines are required to deploy a BE instance (instead of 1 machine deploying 3 BE instances). The clocks of the servers where multiple FEs are located must be consistent (clock deviations of up to 5 seconds are allowed)
- The test environment can also be tested with only one BE. In the actual production environment, the number of BE instances directly determines the overall query latency.
- All deployment nodes close Swap.
Note: the number of FE nodes
- FE roles are divided into Follower and Observer, (Leader is a role elected in the Follower group, hereinafter collectively referred to as Follower, see the metadata design document for specific meanings).
- FE node data is at least 1 (1 Follower). When deploying 1 Follower and 1 Observer, read high availability can be achieved. When deploying 3 Followers, read and write high availability (HA) can be achieved.
- The number of Followers must be an odd number, and the number of Observers is arbitrary.
- Based on past experience, when cluster availability requirements are high (such as providing online services), 3 Followers and 1-3 Observers can be deployed. For offline business, it is recommended to deploy 1 Follower and 1-3 Observers.
Usually we recommend about 10 to 100 machines to give full play to the performance of Doris (3 of them are deployed with FE (HA), and the rest are deployed with BE)
Of course, the performance of Doris is positively related to the number of nodes and configuration. Doris can still run smoothly with at least 4 machines (one FE, three BEs, one BE mixed with one Observer FE to provide metadata backup) and a lower configuration.
If FE and BE are mixed, attention should be paid to resource competition, and ensure that the metadata directory and data directory belong to different disks.
- Broker deployment
Broker is a process used to access external data sources such as hdfs. Usually, it is sufficient to deploy one broker instance on each machine.
- network requirements
Instances of Doris communicate directly over the network. The table below shows all required ports
Notice:
- When deploying multiple FE instances, ensure that the http_port configuration of FE is the same.
- Please ensure that each port has access rights in the proper direction before deployment
3 Resource Planning
node1 |
node2 |
node3 |
FE(Leader) |
FE(Follower) |
FE(Follower) |
BE |
BE |
BE |
BROKER |
BROKER |
BROKER |
Note: Due to limited test environment resources, FE and BE nodes are deployed on the same server, and the production environment is recommended to be separated
4 Start FE
4.1 Configure environment variables
(1) Copy the FE deployment file to the specified node (node1)
Copy the fe folder of the output generated by source code compilation to the path of node /opt/apache-doris-0.15.0 (select the path yourself) of FE
cp -r fe /opt/apache-doris-0.15.0/
(2) Configure environment variables
vim /etc/profile
#DORIS_HOME
export DORIS_HOME=/opt/apache-doris-0.15.0
export PATH=:$DORIS_HOME/bin:$PATH
Reload environment variables:
source /etc/profile
4.2 create doris-mate
The configuration file is fe/conf/fe.conf. Note: meta_dir: metadata storage location. By default it is under fe/doris-meta/.
The directory needs to be created manually
mkdir -p /opt/apache-doris-0.15.0/fe/doris-meta
Configure the fe/conf/fe.conf configuration file
vim conf/fe.conf
meta_dir = /opt/apache-doris-0.15.0/fe/doris-meta
4.3 Modify JAVA_OPTS in fe.conf
JAVA_OPTS in fe.conf defaults the maximum heap memory of java to 4GB, and it is recommended to adjust it to more than 8G in the production environment
4.4 Modify ip binding (optional)
If the machine has multiple ips, such as intranet and extranet, virtual machine docker, etc., ip binding is required so that it can be correctly identified when configuring the cluster
Modify the configuration file of the fe service (the ip address is modified according to the actual ip of the environment)
vim /opt/apache-doris-0.15.0/fe/conf/fe.conf
priority_networks = 192.168.222.0/24
4.5 Distribute the installation directory to the other two nodes
scp -r /opt/apache-doris-0.15.0/ 192.168.222.144:/opt/
scp -r /opt/apache-doris-0.15.0/ 192.168.222.145:/opt/
4.6 Start FE
The three machines are started separately
sh /opt/apache-doris-0.15.0/fe/bin/start_fe.sh --daemon
The logs are stored in the fe/log/ directory by default
5 Configure BE
5.1 Configure be node
Copy the BE deployment file to the specified node (node1)
Copy the be folder under the output generated by the source code compilation to the node /opt/apache-doris-0.15.0 path of BE
cp -r be /opt/apache-doris-0.15.0/
5.2 Create storage_root_path, and configure be.conf
The configuration file is be/conf/be.conf. Mainly configure storage_root_path: data storage directory. By default, it is under be/storage and needs to be manually created. Use ; to separate multiple paths (do not add ; after the last directory)
mkdir -p /opt/apache-doris-0.15.0/be/storage1 /opt/apache-doris-0.15.0/be/storage2
Enter be to modify the be.conf configuration file
vim conf/be.conf
storage_root_path = /opt/apache-doris-0.15.0/be/storage1,10;/opt/apache-doris-0.15.0/be/storage2
6 Add BE
6.1 Connect using mysql
Delete the mysql library file (node1) that comes with the operating system
rpm -qa | grep mariadb
rpm -e --nodeps mariadb-libs-5.5.65-1.el7.x86_64
install mysql-client
Download the rpm of mysql-client and upload it to the server node /opt/mysql-client, of course, you can also use the yum command to install
Enter /opt/mysql-client to install
rpm -ivh *
Connect to the mysql instance on the node1 server (default port 9030, no password by default)
mysql -uroot -h 192.168.222.143 -P 9030
After logging in, you can change the root password with the following command
SET PASSWORD FOR 'root' = PASSWORD('123456');
Log in with the Navicat client
6.2 Add be
BE nodes need to be added in FE before they can join the cluster (node1)
mysql -uroot -h 192.168.222.143 -P 9030 -p
输入密码:123456
After logging in, add the BE node port as the heartbeat_service_port port on be, the default is 9050
ALTER SYSTEM ADD BACKEND "192.168.222.143:9050";
ALTER SYSTEM ADD BACKEND "192.168.222.144:9050";
ALTER SYSTEM ADD BACKEND "192.168.222.145:9050";
View BE status, alive must be true
SHOW PROC '/backends';
Check BE running status. If everything is normal, the isAlive column should be true, and it is abnormal at this stage, and BE has not yet started.
6.3 Modify the number of open files
The command is as follows:
ulimit -n 65535
The above method fails after restarting the system
Or modify the configuration file: /etc/security/limits.conf, add
* soft nofile 65535
* hard nofile 65535
* soft nproc 65535
* hard nproc 65535
This method needs to restart the machine to take effect (all BE nodes need to be configured)
Otherwise, the startup is unsuccessful and the log reports an error
6.4 Modify ip binding
If the machine has multiple ips, such as intranet and extranet, virtual machine docker, etc., ip binding is required so that it can be correctly identified when configuring the cluster
Modify the configuration file of the fe service (the ip address is modified according to the actual ip of the environment)
vim /opt/apache-doris-0.15.0/be/conf/be.conf
priority_networks = 192.168.222.0/24
6.5 Distribute the installation directory to the other two nodes
scp -r /opt/apache-doris-0.15.0/be 192.168.222.144:/opt/apache-doris-0.15.0
scp -r /opt/apache-doris-0.15.0/be 192.168.222.145:/opt/apache-doris-0.15.0
6.6 Start BE
The three machines are started separately
sh /opt/apache-doris-0.15.0/be/bin/start_be.sh --daemon
The logs are stored in the fe/log/ directory by default
6.7 View FE and BE
- in the mysql terminal
show proc '/frontends';
show proc '/backends';
Check BE running. If all is well, the isAlive column should be true
- Access FE through the front-end interface
http://192.168.222.143:8030/login
Note: The password is the same as the password set by mysql
http://192.168.222.143:8030/system?path=//frontends
- Access BE through the front-end interface:
http://192.168.222.143:8030/backend
http://192.168.52.143:8030/system?path=//backends
6.8 Add FS_BROKER (optional)
BROKER is in the form of a plug-in, independent of the deployment of Doris. It is recommended that each PE and BE node deploy a Broker. Broker is a process used to access external data sources. The default is HDFS. Upload the compiled hdfs_broker
6.8.1 Configuring broker nodes
Copy the corresponding Broker directory under the output directory of the source code fs_broker to all nodes that need to be deployed. It is recommended to keep the same level as the BE or FE directory.
Enter the previous docker to compile fs_broker
sh /opt/apache-doris-0.15.0-incubating-src/fs_brokers/apache_hdfs_broker/build.sh
Copy the output directory to the local node
docker cp 9330fa7d63d6:/opt/apache-doris-0.15.0-incubating-src/fs_brokers/apache_hdfs_broker/output/apache_hdfs_broker /home/
6.8.2 Distribute the installation directory to the other two nodes
Enter the /opt/apache-doris-0.15.0 directory
scp -r apache_hdfs_broker/ 192.168.222.143:/opt/apache-doris-0.15.0/
scp -r apache_hdfs_broker/ 192.168.222.144:/opt/apache-doris-0.15.0/
scp -r apache_hdfs_broker/ 192.168.222.144:/opt/apache-doris-0.15.0/
6.8.3 Start Broker
The three machines are started separately
sh /opt/apache-doris-0.15.0/apache_hdfs_broker/bin/start_broker.sh --daemon
6.8.4 Add broker node
Use mysql client to access pe, add broker node
mysql -uroot -h 192.168.222.143 -P 9030 -p
输入密码:123456
To let the FE and BE of Doris know which nodes the Broker is on, add the Broker node list through the sql command
ALTER SYSTEM ADD BROKER broker_name "192.168.222.143:8000","192.168.222.144:8000","192.168.222.145:8000";
Where host is the node ip where Broker is located; port is broker_ipc_port in the Broker configuration file.
SHOW PROC "/brokers";
Note: In the production environment, all instances should be started with a daemon process to ensure that the process will be automatically pulled up after exiting, such as Supervisor (opens new window). If you need to start with a daemon, in versions 0.9.0 and earlier, you need to modify each start_xx.sh script to remove the final & symbol. Starting from version 0.10.0, just call sh start_xx.sh directly to start.
6.9 Expansion and contraction
Doris can easily expand and shrink FE, BE, Broker instances
6.9.1 FE expansion and contraction
High availability of FE can be achieved by expanding the capacity of FE to more than 3 nodes.
The expansion and shrinkage process of FE nodes will not affect the current system operation
Add FE nodes
FE is divided into three roles: Leader, Follower and Observer. By default, a cluster can only have one Leader, and can have multiple Followers and Observers. The Leader and Follower form a Paxos selection group. If the Leader goes down, the remaining Followers will automatically select a new Leader to ensure high write availability. Observer synchronizes Leader's data, but does not participate in elections. If only one FE is deployed, the FE is the Leader by default.
The first FE started automatically becomes the leader. On this basis, several Followers and Observers can be added.
Add Follower or Observer. Use mysql-client to connect to the started FE, and execute:
ALTER SYSTEM ADD FOLLOWER "ip:port";
或
ALTER SYSTEM ADD OBSERVER "ip:port";
Where host is the IP address of the node where the Follower or Observer is located, and port is the edit_log_port in the configuration file fe.conf.
Configure and start Follower or Observer. The configuration of Follower and Observer is the same as that of Leader.
When starting for the first time, execute the following command:
./bin/start_fe.sh --helper host:port --daemon
Where host is the node ip where the Leader is located, and port is the edit_log_port in the configuration file fe.conf of the Leader. The --helper parameter is only required the first time followers and observers are started.
View the running status of Follower or Observer. Use mysql-client to connect to any started FE, and execute: SHOW PROC '/frontends'; You can view the FEs that have joined the cluster and their corresponding roles.
Notes on FE expansion:
- The number of Follower FE (including Leader) must be an odd number, and it is recommended to deploy up to 3 to form a high availability (HA) mode
- When FE is in a high-availability deployment (1 Leader, 2 Followers), we recommend adding Observer FE to expand the read service capability of FE. Of course, you can continue to increase Follower FE, but it is almost unnecessary
- Usually one FE node can handle 10-20 BE nodes. It is recommended that the total number of FE nodes be less than 10. Usually 3 can meet most needs
- The helper cannot point to the FE itself, but must point to one or more existing and running Master/Follower FEs
Delete FE node
Use the following command to delete the corresponding FE node:
ALTER SYSTEM DROP FOLLOWER[OBSERVER] "fe_host:edit_log_port";
Notes on FE shrinkage:
- When deleting Follower FE, ensure that the final remaining Follower (including Leader) nodes are odd
Operation demonstration
Use mysql client to access pe, add broker node
mysql -uroot -h 192.168.222.143 -P 9030 -p
输入密码:123456
Add node2 node as FOLLOWER
ALTER SYSTEM ADD FOLLOWER "192.168.222.144:9010";
Add node3 node as OBSERVER
ALTER SYSTEM ADD OBSERVER "192.168.222.145:9010";
Stop the fe service of the three nodes respectively (the three nodes stop in sequence)
/opt/apache-doris-0.15.0/fe/bin/stop_fe.sh
Start the node1 node
sh /opt/apache-doris-0.15.0/fe/bin/start_fe.sh --daemon
Start the node2 node (specify the location of the leader node)
sh /opt/apache-doris-0.15.0/fe/bin/start_fe.sh --helper 192.168.222.143:9010 --daemon
Start the node3 node (specify the location of the leader node)
sh /opt/apache-doris-0.15.0/fe/bin/start_fe.sh --helper 192.168.222.143:9010 --daemon
View fe node list
SHOW PROC '/frontends';
6.9.2 BE Expansion and Reduction
Users can log in to Master FE through the mysql client.
The expansion and shrinkage process of the BE node does not affect the current system operation and tasks being executed, and will not affect the performance of the current system. Data balancing is done automatically. Depending on the size of the existing data volume of the cluster, the cluster will return to a load-balanced state within a few hours to a day. For cluster load conditions, please refer to Tablet Load Balancing Documentation.
Add BE node
The method of adding BE nodes is the same as that in the BE deployment section, and the BE nodes are added through the ALTER SYSTEM ADD BACKEND command.
Notes for BE expansion:
After BE expansion, Doris will automatically perform data balancing according to the load condition, and the usage will not be affected during this period.
Delete BE node
There are two ways to delete BE nodes: DROP and DECOMMISSION
The DROP statement is as follows:
ALTER SYSTEM DROP BACKEND "be_host:be_heartbeat_service_port";
Precautions:
DROP BACKEND will directly delete the BE, and the data on it will no longer be recoverable! ! ! So we strongly do not recommend using DROP BACKEND to delete BE nodes. When you use this statement, there will be corresponding anti-misuse prompts.
The DECOMMISSION statement is as follows:
ALTER SYSTEM DECOMMISSION BACKEND "be_host:be_heartbeat_service_port";
DECOMMISSION command description:
This command is used to safely delete BE nodes. After the command is issued, Doris will try to migrate the data on the BE to other BE nodes, and when all the data is migrated, Doris will automatically delete the node.
This command is an asynchronous operation. After execution, you can see that the isDecommission status of the BE node is true through SHOW PROC '/backends'; Indicates that the node is going offline.
The command does not necessarily execute successfully. For example, if the remaining BE storage space is not enough to accommodate the data on the offline BE, or the number of remaining machines does not meet the minimum number of copies, the command cannot be completed, and the BE will always be in the state of isDecommission is true.
The progress of DECOMMISSION can be viewed through SHOW PROC '/backends'; TabletNum, if it is in progress, TabletNum will continue to decrease.
This operation can be done by:
CANCEL DECOMMISSION BACKEND "be_host:be_heartbeat_service_port";
Order cancelled. After cancellation, the data on the BE will maintain the current remaining data volume. Subsequent Doris will re-balance the load
6.9.3 Broker expansion and contraction
There is no hard requirement on the number of Broker instances. Usually, one per physical machine is sufficient. Adding and removing Brokers can be done with the following commands:
ALTER SYSTEM ADD BROKER broker_name "broker_host:broker_ipc_port";
ALTER SYSTEM DROP BROKER broker_name "broker_host:broker_ipc_port";
ALTER SYSTEM DROP ALL BROKER broker_name;
Broker is a stateless process that can be started and stopped at will. Of course, when stopped, jobs running on it will fail, just retry.