Doris (2): Doris compilation and deployment

1 Doris compiled

Apache Doris provides a version compression package that can be deployed directly: https://cloud.baidu.com/doc/PALO/s/Ikivhcwb5

You can also use it after compiling the compressed package yourself (recommended)

1.1 Compile with Docker development image (recommended)

This is recommended by the official documentation. It can compile the source code very conveniently and smoothly. If you need to deploy quickly, you can use this method. The advantage of this method is that there is no need to configure environment variables, and there is no need to consider various version issues. After entering the development mirror system, you can directly download the doris source code and compile it.

First, you need to install Docker. It is relatively simple to install Docker under Linux, so I won’t introduce it here.

After starting the Docker service (systemctl status docker), we directly pull the image and start compiling Doris.

Download the Doris image

Pull the official Docker image provided by Doris, the currently available versions are: build-env, build-env-1.1, build-env-1.2

docker pull apache/doris:build-env-for-0.15.0

View the Docker image

docker images

Notice:

For different Doris versions, you need to download the corresponding mirror version. Starting from version 0.15 of Apache Doris, the version numbers of subsequent images will be unified with those of Doris.

run mirror

Save the package downloaded by maven in the container to a file specified locally on the host to avoid repeated downloads, and at the same time save the compiled Doris file to a file specified locally on the host for easy deployment

docker run -it -v /u01/.m2:/root/.m2 -v /u01/incubator-doris-DORIS-0.15-release/:/root/incubator-doris-DORIS-0.15-release/ apache/doris:build-env-for-0.15.0

After opening, it is inside the container

Download the installation package of Doris

Enter the docker container

cd /opt

wget https://mirrors.tuna.tsinghua.edu.cn/apache/doris/0.15.0-incubating/apache-doris-0.15.0-incubating-src.tar.gz

Unzip and install

tar -zxvf  apache-doris-0.15.0-incubating-src.tar.gz

start compiling

cd apache-doris-0.15.0-incubating-src
sh build.sh

Note: The jdk version used for compilation needs to be the same as that of the deployment environment, otherwise an exception will be reported

Compiled and exported to the server

The compiled files are in the output directory

#docker cp 容器:docker路径   本地路径
docker cp mystifying_swanson:/opt/apache-doris-0.15.0-incubating-src/output   /home/

2 Installation and deployment

As an open source MPP architecture OLAP database, Doris can run on most mainstream commercial servers. In order to take full advantage of the concurrency advantages of the MPP architecture and the high availability features of Doris, we recommend that the deployment of Doris follow the following requirements:

  • Linux operating system version requirements

Linux system

Version

hundred

7.1 and above

Ubuntu

16.04 and above

  • Software Requirements

software

Version

Java

1.8 and above

GCC

7.3 and above

  • Development and testing environment

module

CPU

Memory

disk

network

number of instances

Frontend

8 cores +

8GB+

SSD or SATA, 10GB+*

Gigabit Ethernet

1

Backend

8 cores +

16GB+

SSD or SATA, 50GB+ *

Gigabit Ethernet

1-3 *

  • Production Environment

module

CPU

Memory

disk

network

number of instances

Frontend

16 cores+

64GB+

SSD or RAID, 100GB+*

10 Gigabit NIC

1-5 *

Backend

16 cores+

64GB+

SSD or SATA, 100GB+*

10 Gigabit NIC

10-100 *

Notice:

  • The disk space of FE is mainly used to store metadata, including logs and images. Usually anywhere from a few hundred MB to several GB.
  • The disk space of BE is mainly used to store user data. The total disk space is calculated by the total amount of user data * 3 (3 copies), and then an additional 40% of the space is reserved for background compaction and storage of some intermediate data.
  • Multiple BE instances can be deployed on one machine, but only one FE can be deployed. If 3 copies of data are required, at least 3 machines are required to deploy a BE instance (instead of 1 machine deploying 3 BE instances). The clocks of the servers where multiple FEs are located must be consistent (clock deviations of up to 5 seconds are allowed)
  • The test environment can also be tested with only one BE. In the actual production environment, the number of BE instances directly determines the overall query latency.
  • All deployment nodes close Swap.

Note: the number of FE nodes

  • FE roles are divided into Follower and Observer, (Leader is a role elected in the Follower group, hereinafter collectively referred to as Follower, see the metadata design document for specific meanings).
  • FE node data is at least 1 (1 Follower). When deploying 1 Follower and 1 Observer, read high availability can be achieved. When deploying 3 Followers, read and write high availability (HA) can be achieved.
  • The number of Followers must be an odd number, and the number of Observers is arbitrary.
  • Based on past experience, when cluster availability requirements are high (such as providing online services), 3 Followers and 1-3 Observers can be deployed. For offline business, it is recommended to deploy 1 Follower and 1-3 Observers.

        Usually we recommend about 10 to 100 machines to give full play to the performance of Doris (3 of them are deployed with FE (HA), and the rest are deployed with BE)

        Of course, the performance of Doris is positively related to the number of nodes and configuration. Doris can still run smoothly with at least 4 machines (one FE, three BEs, one BE mixed with one Observer FE to provide metadata backup) and a lower configuration.

        If FE and BE are mixed, attention should be paid to resource competition, and ensure that the metadata directory and data directory belong to different disks.

  • Broker deployment

       Broker is a process used to access external data sources such as hdfs. Usually, it is sufficient to deploy one broker instance on each machine.

  • network requirements

        Instances of Doris communicate directly over the network. The table below shows all required ports

Notice:

  • When deploying multiple FE instances, ensure that the http_port configuration of FE is the same.
  • Please ensure that each port has access rights in the proper direction before deployment

3 Resource Planning

node1

node2

node3

FE(Leader

FE(Follower

FE(Follower

BE

BE

BE

BROKER

BROKER

BROKER

Note: Due to limited test environment resources, FE and BE nodes are deployed on the same server, and the production environment is recommended to be separated

4 Start FE

4.1 Configure environment variables

(1) Copy the FE deployment file to the specified node (node1)

Copy the fe folder of the output generated by source code compilation to the path of node /opt/apache-doris-0.15.0 (select the path yourself) of FE

cp -r fe /opt/apache-doris-0.15.0/

(2) Configure environment variables

vim /etc/profile

#DORIS_HOME
export DORIS_HOME=/opt/apache-doris-0.15.0
export PATH=:$DORIS_HOME/bin:$PATH

Reload environment variables:

source /etc/profile

4.2 create doris-mate

The configuration file is fe/conf/fe.conf. Note: meta_dir: metadata storage location. By default it is under fe/doris-meta/.

The directory needs to be created manually

mkdir -p /opt/apache-doris-0.15.0/fe/doris-meta

Configure the fe/conf/fe.conf configuration file

vim conf/fe.conf

meta_dir = /opt/apache-doris-0.15.0/fe/doris-meta

4.3 Modify JAVA_OPTS in fe.conf

JAVA_OPTS in fe.conf defaults the maximum heap memory of java to 4GB, and it is recommended to adjust it to more than 8G in the production environment

4.4 Modify ip binding (optional)

If the machine has multiple ips, such as intranet and extranet, virtual machine docker, etc., ip binding is required so that it can be correctly identified when configuring the cluster

Modify the configuration file of the fe service (the ip address is modified according to the actual ip of the environment)

vim /opt/apache-doris-0.15.0/fe/conf/fe.conf

priority_networks = 192.168.222.0/24

4.5 Distribute the installation directory to the other two nodes

scp -r /opt/apache-doris-0.15.0/ 192.168.222.144:/opt/

scp -r /opt/apache-doris-0.15.0/ 192.168.222.145:/opt/

4.6 Start FE

The three machines are started separately

sh /opt/apache-doris-0.15.0/fe/bin/start_fe.sh --daemon

The logs are stored in the fe/log/ directory by default

5 Configure BE

5.1 Configure be node

Copy the BE deployment file to the specified node (node1)

Copy the be folder under the output generated by the source code compilation to the node /opt/apache-doris-0.15.0 path of BE

cp -r be /opt/apache-doris-0.15.0/

5.2 Create storage_root_path, and configure be.conf

The configuration file is be/conf/be.conf. Mainly configure storage_root_path: data storage directory. By default, it is under be/storage and needs to be manually created. Use ; to separate multiple paths (do not add ; after the last directory)

mkdir -p /opt/apache-doris-0.15.0/be/storage1 /opt/apache-doris-0.15.0/be/storage2

Enter be to modify the be.conf configuration file

vim conf/be.conf 

storage_root_path = /opt/apache-doris-0.15.0/be/storage1,10;/opt/apache-doris-0.15.0/be/storage2

6 Add BE

6.1 Connect using mysql

Delete the mysql library file (node1) that comes with the operating system

rpm -qa | grep mariadb

rpm -e --nodeps mariadb-libs-5.5.65-1.el7.x86_64

install mysql-client

Download the rpm of mysql-client and upload it to the server node /opt/mysql-client, of course, you can also use the yum command to install

Enter /opt/mysql-client to install

rpm -ivh *

Connect to the mysql instance on the node1 server (default port 9030, no password by default)

mysql -uroot -h 192.168.222.143 -P 9030

After logging in, you can change the root password with the following command

SET PASSWORD FOR 'root' = PASSWORD('123456');

Log in with the Navicat client

6.2 Add be

BE nodes need to be added in FE before they can join the cluster (node1)

mysql -uroot -h 192.168.222.143 -P 9030 -p

输入密码:123456

After logging in, add the BE node port as the heartbeat_service_port port on be, the default is 9050

ALTER SYSTEM ADD BACKEND "192.168.222.143:9050";

ALTER SYSTEM ADD BACKEND "192.168.222.144:9050";

ALTER SYSTEM ADD BACKEND "192.168.222.145:9050";

View BE status, alive must be true

SHOW PROC '/backends';

Check BE running status. If everything is normal, the isAlive column should be true, and it is abnormal at this stage, and BE has not yet started.

6.3 Modify the number of open files

The command is as follows:

ulimit -n 65535

The above method fails after restarting the system

Or modify the configuration file: /etc/security/limits.conf, add

* soft nofile 65535 
* hard nofile 65535 
* soft nproc 65535 
* hard nproc 65535

This method needs to restart the machine to take effect (all BE nodes need to be configured)

Otherwise, the startup is unsuccessful and the log reports an error

6.4 Modify ip binding

If the machine has multiple ips, such as intranet and extranet, virtual machine docker, etc., ip binding is required so that it can be correctly identified when configuring the cluster

Modify the configuration file of the fe service (the ip address is modified according to the actual ip of the environment)

vim /opt/apache-doris-0.15.0/be/conf/be.conf

priority_networks = 192.168.222.0/24

6.5 Distribute the installation directory to the other two nodes

scp -r /opt/apache-doris-0.15.0/be 192.168.222.144:/opt/apache-doris-0.15.0

scp -r /opt/apache-doris-0.15.0/be 192.168.222.145:/opt/apache-doris-0.15.0

6.6 Start BE

The three machines are started separately

sh /opt/apache-doris-0.15.0/be/bin/start_be.sh --daemon

The logs are stored in the fe/log/ directory by default

6.7 View FE and BE

  • in the mysql terminal
show proc '/frontends';

show proc '/backends';

Check BE running. If all is well, the isAlive column should be true

  • Access FE through the front-end interface

http://192.168.222.143:8030/login

 Note: The password is the same as the password set by mysql

http://192.168.222.143:8030/system?path=//frontends

  • Access BE through the front-end interface:

http://192.168.222.143:8030/backend

http://192.168.52.143:8030/system?path=//backends

6.8 Add FS_BROKER (optional)

BROKER is in the form of a plug-in, independent of the deployment of Doris. It is recommended that each PE and BE node deploy a Broker. Broker is a process used to access external data sources. The default is HDFS. Upload the compiled hdfs_broker

6.8.1 Configuring broker nodes

Copy the corresponding Broker directory under the output directory of the source code fs_broker to all nodes that need to be deployed. It is recommended to keep the same level as the BE or FE directory.

Enter the previous docker to compile fs_broker

sh /opt/apache-doris-0.15.0-incubating-src/fs_brokers/apache_hdfs_broker/build.sh

Copy the output directory to the local node

docker cp 9330fa7d63d6:/opt/apache-doris-0.15.0-incubating-src/fs_brokers/apache_hdfs_broker/output/apache_hdfs_broker /home/

6.8.2 Distribute the installation directory to the other two nodes

Enter the /opt/apache-doris-0.15.0 directory

scp  -r apache_hdfs_broker/ 192.168.222.143:/opt/apache-doris-0.15.0/

scp  -r apache_hdfs_broker/ 192.168.222.144:/opt/apache-doris-0.15.0/

scp  -r apache_hdfs_broker/ 192.168.222.144:/opt/apache-doris-0.15.0/

6.8.3 Start Broker

The three machines are started separately

sh /opt/apache-doris-0.15.0/apache_hdfs_broker/bin/start_broker.sh --daemon

6.8.4 Add broker node

Use mysql client to access pe, add broker node

mysql -uroot -h 192.168.222.143 -P 9030 -p

输入密码:123456

To let the FE and BE of Doris know which nodes the Broker is on, add the Broker node list through the sql command

ALTER SYSTEM ADD BROKER broker_name "192.168.222.143:8000","192.168.222.144:8000","192.168.222.145:8000";

Where host is the node ip where Broker is located; port is broker_ipc_port in the Broker configuration file.

SHOW PROC "/brokers";

Note: In the production environment, all instances should be started with a daemon process to ensure that the process will be automatically pulled up after exiting, such as Supervisor (opens new window). If you need to start with a daemon, in versions 0.9.0 and earlier, you need to modify each start_xx.sh script to remove the final & symbol. Starting from version 0.10.0, just call sh start_xx.sh directly to start.

6.9 Expansion and contraction

Doris can easily expand and shrink FE, BE, Broker instances

6.9.1 FE expansion and contraction

High availability of FE can be achieved by expanding the capacity of FE to more than 3 nodes.

The expansion and shrinkage process of FE nodes will not affect the current system operation

Add FE nodes

FE is divided into three roles: Leader, Follower and Observer. By default, a cluster can only have one Leader, and can have multiple Followers and Observers. The Leader and Follower form a Paxos selection group. If the Leader goes down, the remaining Followers will automatically select a new Leader to ensure high write availability. Observer synchronizes Leader's data, but does not participate in elections. If only one FE is deployed, the FE is the Leader by default.

The first FE started automatically becomes the leader. On this basis, several Followers and Observers can be added.

Add Follower or Observer. Use mysql-client to connect to the started FE, and execute:

ALTER SYSTEM ADD FOLLOWER "ip:port";
ALTER SYSTEM ADD OBSERVER "ip:port";

Where host is the IP address of the node where the Follower or Observer is located, and port is the edit_log_port in the configuration file fe.conf.

Configure and start Follower or Observer. The configuration of Follower and Observer is the same as that of Leader.

When starting for the first time, execute the following command:

./bin/start_fe.sh --helper host:port --daemon

Where host is the node ip where the Leader is located, and port is the edit_log_port in the configuration file fe.conf of the Leader. The --helper parameter is only required the first time followers and observers are started.

View the running status of Follower or Observer. Use mysql-client to connect to any started FE, and execute: SHOW PROC '/frontends'; You can view the FEs that have joined the cluster and their corresponding roles.

Notes on FE expansion:

  • The number of Follower FE (including Leader) must be an odd number, and it is recommended to deploy up to 3 to form a high availability (HA) mode
  • When FE is in a high-availability deployment (1 Leader, 2 Followers), we recommend adding Observer FE to expand the read service capability of FE. Of course, you can continue to increase Follower FE, but it is almost unnecessary
  • Usually one FE node can handle 10-20 BE nodes. It is recommended that the total number of FE nodes be less than 10. Usually 3 can meet most needs
  • The helper cannot point to the FE itself, but must point to one or more existing and running Master/Follower FEs

Delete FE node

Use the following command to delete the corresponding FE node:

ALTER SYSTEM DROP FOLLOWER[OBSERVER] "fe_host:edit_log_port";

Notes on FE shrinkage:

  • When deleting Follower FE, ensure that the final remaining Follower (including Leader) nodes are odd

Operation demonstration

Use mysql client to access pe, add broker node

mysql -uroot -h 192.168.222.143 -P 9030 -p

输入密码:123456

Add node2 node as FOLLOWER

ALTER SYSTEM ADD FOLLOWER "192.168.222.144:9010";

Add node3 node as OBSERVER

ALTER SYSTEM ADD OBSERVER "192.168.222.145:9010";

Stop the fe service of the three nodes respectively (the three nodes stop in sequence)

/opt/apache-doris-0.15.0/fe/bin/stop_fe.sh

Start the node1 node

sh /opt/apache-doris-0.15.0/fe/bin/start_fe.sh --daemon

Start the node2 node (specify the location of the leader node)

sh /opt/apache-doris-0.15.0/fe/bin/start_fe.sh --helper 192.168.222.143:9010 --daemon

Start the node3 node (specify the location of the leader node)

sh /opt/apache-doris-0.15.0/fe/bin/start_fe.sh --helper 192.168.222.143:9010 --daemon

View fe node list

SHOW PROC '/frontends';

6.9.2 BE Expansion and Reduction

Users can log in to Master FE through the mysql client.

The expansion and shrinkage process of the BE node does not affect the current system operation and tasks being executed, and will not affect the performance of the current system. Data balancing is done automatically. Depending on the size of the existing data volume of the cluster, the cluster will return to a load-balanced state within a few hours to a day. For cluster load conditions, please refer to Tablet Load Balancing Documentation.

Add BE node

The method of adding BE nodes is the same as that in the BE deployment section, and the BE nodes are added through the ALTER SYSTEM ADD BACKEND command.

Notes for BE expansion:

After BE expansion, Doris will automatically perform data balancing according to the load condition, and the usage will not be affected during this period.

Delete BE node

There are two ways to delete BE nodes: DROP and DECOMMISSION

The DROP statement is as follows:

ALTER SYSTEM DROP BACKEND "be_host:be_heartbeat_service_port";

Precautions:

DROP BACKEND will directly delete the BE, and the data on it will no longer be recoverable! ! ! So we strongly do not recommend using DROP BACKEND to delete BE nodes. When you use this statement, there will be corresponding anti-misuse prompts.

The DECOMMISSION statement is as follows:

ALTER SYSTEM DECOMMISSION BACKEND "be_host:be_heartbeat_service_port";

DECOMMISSION command description:

This command is used to safely delete BE nodes. After the command is issued, Doris will try to migrate the data on the BE to other BE nodes, and when all the data is migrated, Doris will automatically delete the node.

This command is an asynchronous operation. After execution, you can see that the isDecommission status of the BE node is true through SHOW PROC '/backends'; Indicates that the node is going offline.

The command does not necessarily execute successfully. For example, if the remaining BE storage space is not enough to accommodate the data on the offline BE, or the number of remaining machines does not meet the minimum number of copies, the command cannot be completed, and the BE will always be in the state of isDecommission is true.

The progress of DECOMMISSION can be viewed through SHOW PROC '/backends'; TabletNum, if it is in progress, TabletNum will continue to decrease.

This operation can be done by:

CANCEL DECOMMISSION BACKEND "be_host:be_heartbeat_service_port";

Order cancelled. After cancellation, the data on the BE will maintain the current remaining data volume. Subsequent Doris will re-balance the load

6.9.3 Broker expansion and contraction

There is no hard requirement on the number of Broker instances. Usually, one per physical machine is sufficient. Adding and removing Brokers can be done with the following commands:

ALTER SYSTEM ADD BROKER broker_name "broker_host:broker_ipc_port"; 

ALTER SYSTEM DROP BROKER broker_name "broker_host:broker_ipc_port"; 

ALTER SYSTEM DROP ALL BROKER broker_name;

Broker is a stateless process that can be started and stopped at will. Of course, when stopped, jobs running on it will fail, just retry.

Guess you like

Origin blog.csdn.net/u013938578/article/details/130071739