Centos7 environment: Introduction to DolphinScheduler3.1.5 and pseudo-cluster mode installation and deployment

Centos7 environment: Introduction to DolphinScheduler3.1.5 and pseudo-cluster mode installation and deployment

Introduction to DolphinScheduler

Apache DolphinScheduler is a distributed, easily scalable visual DAG workflow task scheduling platform. Committed to solving the intricate dependencies in the data processing process, so that the scheduling system can be used out of the box in the data processing process.
The purpose of pseudo-cluster deployment is to deploy the DolphinScheduler service on a single machine. In this mode, the master, worker, and api server are all on the same machine.

DolphinScheduler core architecture

The main roles of DolphinScheduler are as follows:
MasterServer adopts a distributed centerless design concept. MasterServer is mainly responsible for DAG task segmentation, task submission, task monitoring, and also monitors the health status of other MasterServers and WorkerServers.
WorkerServer also adopts a distributed centerless design concept. WorkerServer is mainly responsible for task execution and providing log services.
ZooKeeper service, MasterServer and WorkerServer nodes in the system all use ZooKeeper for cluster management and fault tolerance.
Alert service provides alarm-related services.
The API interface layer is mainly responsible for processing requests from the front-end UI layer.
UI, the front-end page of the system, provides various visual operation interfaces of the system.
Insert image description here

1. 1 Cluster planning

In cluster mode, multiple Masters and Multiple Workers can be configured. Usually 2~3 Masters and several Workers can be configured. Due to limited cluster resources, one Master and one Worker are configured here. The cluster planning is as follows.
hadoop master, worker

1.2 Preparatory work (the documents are on my blog and I have uploaded all the resources)

(1) All nodes need to deploy JDK (1.8+) and configure relevant environment variables. Attached is my blog link: http://t.csdn.cn/TFgeQ
(2) A database needs to be deployed, supporting MySQL (5.7+) or PostgreSQL (8.2.15+). You can choose either one of the two. For example, MySQL requires JDBC Driver 8.0.16.
Attached is my blog link: http://t.csdn.cn/9BVap
(3) Zookeeper (3.4.6+) needs to be deployed. Attached is my blog link: http://t.csdn.cn/1can4
#(4) If the HDFS file system is enabled, a Hadoop (2.6+) environment is required.
(5) All nodes need to install the process tree analysis tool psmisc.
To install psmisc offline on CentOS 7, you can follow the following steps:
Upload the psmisc package. Or you can download it directly using the yum command. (I have uploaded all ds installation packages for free download)
On the target CentOS 7 computer, use the following command to install:

rpm -ivh psmisc-22.20-16.el7.x86_64.rpm

After the installation is complete, verify whether psmisc is installed successfully through the following command:

rpm -qa | grep psmisc

Or yum installation

sudo yum install -y psmisc

The following error may occur during Yum.
Insert image description here
Reason: resolv.conf is not configured.
Solution:
Configure resolv.conf in the /etc directory and add the nameserver IP, such as:
nameserver 8.8.8.8
nameserver 8.8.4.4
search localdomain
. Save and run the above command again.

2.1 Prepare DolphinScheduler startup environment

Configure user password exemption and permissions
. Create a deployment user, and be sure to configure sudo password exemption. Take creating the dolphinscheduler user as an example

# 创建用户需使用 root 登录
useradd dolphinscheduler

# 添加密码
echo "dolphinscheduler" | passwd --stdin dolphinscheduler
# 配置 sudo 免密
sed -i '$adolphinscheduler  ALL=(ALL)  NOPASSWD: NOPASSWD: ALL' /etc/sudoers
sed -i 's/Defaults    requirett/#Defaults    requirett/g' /etc/sudoers
# 修改目录权限,使得部署用户对二进制包解压后的 apache-dolphinscheduler-*-bin 目录有操作权限
chown -R dolphinscheduler:dolphinscheduler apache-dolphinscheduler-*-bin

• Because the task execution service uses sudo -u {linux-user} to switch different Linux users to implement multi-tenant running jobs, the deployment user needs to have sudo permissions and is password-free. If beginners don’t understand it, you can ignore this point for now.
• If you find the line “Defaults requirett” in the /etc/sudoers file, please comment it out as well.

2.2 Password-free operation

Attached is my previous blog address: http://t.csdn.cn/IIe29

3.1 Unzip the DolphinScheduler installation package

(1) Upload the DolphinScheduler installation package to the /opt/software directory of the hadoop node
(2) Unzip the installation package to the current directory

3.2 Create metadata database and users

DolphinScheduler metadata is stored in a relational database, so the corresponding database and user need to be created.

(1)创建数据库
mysql> CREATE DATABASE dolphinscheduler DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
(2)创建用户
mysql> CREATE USER 'dolphinscheduler'@'%' IDENTIFIED BY 'dolphinscheduler';
注:
若出现以下错误信息,表明新建用户的密码过于简单。
ERROR 1819 (HY000): Your password does not satisfy the current policy requirements
可提高密码复杂度或者执行以下命令降低MySQL密码强度级别。
mysql> set global validate_password_policy=0;
mysql> set global validate_password_length=4;
(3)赋予用户相应权限
mysql> GRANT ALL PRIVILEGES ON dolphinscheduler.* TO 'dolphinscheduler'@'%';

mysql> flush privileges;

3.3 Modify related configurations

After completing the preparation of the basic environment, you need to modify the configuration file according to your machine environment. The configuration files can be found in the directory bin/env and are named install_env.sh and dolphinscheduler_env.sh respectively.
Modify the install_env.sh file

ips="192.168.2.221"
sshPort=“22”
masters="192.168.2.221"
workers="192.168.2.221:default"
alertServer="192.168.2.221"
apiServers="192.168.2.221"
installPath="/opt/module/dolphinscheduler-3.1.5"
deployUser="root"
zkRoot="/dolphinscheduler"

dolphinscheduler_env.sh

export JAVA_HOME=/opt/module/jdk1.8.0_212
export DATABASE="mysql"
export SPRING_PROFILES_ACTIVE=${DATABASE}
export SPRING_DATASOURCE_URL="jdbc:mysql://192.168.2.221:3306/dolphinscheduler?useUnicode=true&characterEncoding=UTF-8"
export SPRING_DATASOURCE_USERNAME="root"
export SPRING_DATASOURCE_PASSWORD="root"
export SPRING_CACHE_TYPE="none"
export SPRING_JACKSON_TIME_ZONE="Asia/Shanghai"
export MASTER_FETCH_COMMAND_NUM="10"
export REGISTRY_TYPE="zookeeper"
export REGISTRY_ZOOKEEPER_CONNECT_STRING="192.168.2.221:2181"

Copy the mysql driver to include api-server/libs and alert-server/libs and master-server/libs and worker-server/libs and tools/libs (note that there must be five places)

[root@localhost software]# cp mysql-connector-java-8.0.16.jar /opt/module/dolphinscheduler-3.1.5/api-server/libs/
[root@localhost software]# cp mysql-connector-java-8.0.16.jar /opt/module/dolphinscheduler-3.1.5/alert-server/libs/
[root@localhost software]# cp mysql-connector-java-8.0.16.jar /opt/module/dolphinscheduler-3.1.5/master-server/libs/
[root@localhost software]# cp mysql-connector-java-8.0.16.jar /opt/module/dolphinscheduler-3.1.5/worker-server/libs/
[root@localhost software]# cp mysql-connector-java-8.0.16.jar /opt/module/dolphinscheduler-3.1.5/tools/libs/

After completing the above steps, you have created a new database for DolphinScheduler and now you can initialize the database through a quick shell script

bash tools/bin/upgrade-schema.sh

4.1 Start DolphinScheduler

Use the deployment user created above to run the following command to complete the deployment. The running logs after deployment will be stored in the logs folder.

bash ./bin/install.sh

Note: During the first deployment, sh: bin/dolphinscheduler-daemon.sh: No such file or directory related information may appear 5 times. This is non-important information and can be ignored.

4.2 Log in to DolphinScheduler

Visit the address http://localhost:12345/dolphinscheduler/ui with your browser to log in to the system UI. The default username and password is admin/dolphinscheduler123

4.3 Start and stop services

# 一键停止集群所有服务
bash ./bin/stop-all.sh
# 一键开启集群所有服务
bash ./bin/start-all.sh
# 启停 Master
bash ./bin/dolphinscheduler-daemon.sh stop master-server
bash ./bin/dolphinscheduler-daemon.sh start master-server
# 启停 Worker
bash ./bin/dolphinscheduler-daemon.sh start worker-server
bash ./bin/dolphinscheduler-daemon.sh stop worker-server
# 启停 Api
bash ./bin/dolphinscheduler-daemon.sh start api-server
bash ./bin/dolphinscheduler-daemon.sh stop api-server
# 启停 Alert
bash ./bin/dolphinscheduler-daemon.sh start alert-server
bash ./bin/dolphinscheduler-daemon.sh stop alert-server
注意1:: 每个服务在路径 <service>/conf/dolphinscheduler_env.sh 中都有 dolphinscheduler_env.sh 文件,这是可以为微 服务需求提供便利。意味着您可以基于不同的环境变量来启动各个服务,只需要在对应服务中配置 <service>/conf/dolphinscheduler_env.sh 然后通过 <service>/bin/start.sh 命令启动即可。但是如果您使用命令 /bin/dolphinscheduler-daemon.sh start <service> 启动服务器,它将会用文件 bin/env/dolphinscheduler_env.sh 覆盖 <service>/conf/dolphinscheduler_env.sh 然后启动服务,目的是为了减少用户修改配置的成本.

注意2::服务用途请具体参见《系统架构设计》小节。Python gateway service 默认与 api-server 一起启动,如果您不想启动 Python gateway service 请通过更改 api-server 配置文件 api-server/conf/application.yaml 中的 python-gateway.enabled : false 来禁用它。

Official deployment manual address: https://www.bookstack.cn/read/dolphinscheduler-3.1.0-zh/bf5533c107dc1904.md#Be
sure to check the official website for the deployment environment. If you encounter problems, come back and read your personal notes. Official dad is the right answer.

Guess you like

Origin blog.csdn.net/Liu__sir__/article/details/130243456