Linux-centos deploys big data workflow task scheduling system: apache-dolphinscheduler-1.3.4

table of Contents

Standalone deployment

1. Basic software installation (must install items please install by yourself)

2. Download the binary tar.gz package

3. Create deployment users and assign directory operation permissions

4. SSH secret-free configuration

5. Database initialization

6. Modify operating parameters

7. One-click deployment

8. Log in to the system

9. Start-stop service

10. Encounter problems


Standalone deployment

1. Basic software installation (must install items please install by yourself)

  • PostgreSQL (8.2.15+) or MySQL (5.7+): You can choose one of the two, such as MySQL requires JDBC Driver 5.1.47+
  • JDK  (1.8+): Required, please configure JAVA_HOME and PATH variables under /etc/profile after installation
  • ZooKeeper (3.4.6+): required
  • Hadoop (2.6+) or MinIO: Optional. If you need to use the resource upload function, you can select the local file directory as the upload folder for a single machine (this operation does not need to deploy Hadoop); of course, you can also choose to upload to a Hadoop or MinIO cluster
  • DataX: optional, provides data synchronization function between heterogeneous data sources, such as MySQL, Oracle, etc.
 注意:DolphinScheduler本身不依赖Hadoop、Hive、Spark,仅是会调用他们的Client,用于对应任务的运行。

2. Download the binary tar.gz package

  • Please download the latest version of the back-end installation package to the server deployment directory, for example, create /opt/dolphinscheduler as the installation deployment directory, download address:  download , upload the tar package to this directory after downloading, and unzip
# 创建部署目录,部署目录请不要创建在/root、/home等高权限目录 
mkdir -p /opt/dolphinscheduler;
cd /opt/dolphinscheduler;
# 解压缩
tar -zxvf apache-dolphinscheduler-incubating-1.3.4-dolphinscheduler-bin.tar.gz -C /opt/dolphinscheduler;
 
mv apache-dolphinscheduler-incubating-1.3.4-dolphinscheduler-bin  dolphinscheduler-bin

3. Create deployment users and assign directory operation permissions

  • Create a deployment user, and be sure to configure sudo without password. Take the creation of a dolphinscheduler user as an example
# 创建用户需使用root登录
useradd dolphinscheduler;

# 添加密码
echo "dolphinscheduler" | passwd --stdin dolphinscheduler

# 配置sudo免密
sed -i '$adolphinscheduler  ALL=(ALL)  NOPASSWD: NOPASSWD: ALL' /etc/sudoers
sed -i 's/Defaults    requirett/#Defaults    requirett/g' /etc/sudoers

# 修改目录权限,使得部署用户对dolphinscheduler-bin目录有操作权限
chown -R dolphinscheduler:dolphinscheduler dolphinscheduler-bin
 注意:
 - 因为任务执行服务是以 sudo -u {linux-user} 切换不同linux用户的方式来实现多租户运行作业,所以部署用户需要有 sudo 权限,而且是免密的。初学习者不理解的话,完全可以暂时忽略这一点
 - 如果发现/etc/sudoers文件中有"Default requiretty"这行,也请注释掉
 - 如果用到资源上传的话,还需要给该部署用户分配操作`本地文件系统或者HDFS或者MinIO`的权限

4. SSH secret-free configuration

  • Switch to the deployment user and configure SSH local password-free login
su dolphinscheduler;

ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys

Note: After the normal setting, the dolphinscheduler user ssh localhost does not need to enter the password when executing the command

5. Database initialization

  • Enter the database, the default database is PostgreSQL, if you choose MySQL, you need to add the mysql-connector-java driver package to the lib directory of DolphinScheduler
mysql -uroot -p
  • After entering the database command line window, execute the database initialization command and set the access account and password. Note: {user} and {password} need to be replaced with specific database username and password

    mysql> CREATE DATABASE dolphinscheduler DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
    mysql> GRANT ALL PRIVILEGES ON dolphinscheduler.* TO '{user}'@'%' IDENTIFIED BY '{password}';
    mysql> GRANT ALL PRIVILEGES ON dolphinscheduler.* TO '{user}'@'localhost' IDENTIFIED BY '{password}';
    mysql> flush privileges;
    
  • Create tables and import basic data

    • Modify the following configuration in datasource.properties in the conf directory

      • vi conf/datasource.properties
        
    • If you choose MySQL, please comment out the PostgreSQL related configuration (the same is true for the reverse), and you need to manually add the [  mysql-connector-java driver jar  ] package to the lib directory, here is mysql-connector-java-5.1.47.jar , And then correctly configure the database connection related information

      # postgre
      #spring.datasource.driver-class-name=org.postgresql.Driver
      #spring.datasource.url=jdbc:postgresql://localhost:5432/dolphinscheduler
      # mysql
      spring.datasource.driver-class-name=com.mysql.jdbc.Driver
      spring.datasource.url=jdbc:mysql://xxx:3306/dolphinscheduler?useUnicode=true&characterEncoding=UTF-8&allowMultiQueries=true     需要修改ip,本机localhost即可
      spring.datasource.username=xxx						需要修改为上面的{user}值
      spring.datasource.password=xxx						需要修改为上面的{password}值
    
    • After modifying and saving, execute the table creation and import basic data script in the script directory
    sh script/create-dolphinscheduler.sh
    

Note: If you perform the above script report "/ bin / java: No such file or directory" error, in / etc / profile configuration under JAVA_HOME and PATH variable

6. Modify operating parameters

  • Modify the dolphinscheduler_env.sh environment variables in the conf/env directory  (take the relevant software installed under /opt/soft as an example)

    export HADOOP_HOME=/opt/soft/hadoop
    export HADOOP_CONF_DIR=/opt/soft/hadoop/etc/hadoop
    #export SPARK_HOME1=/opt/soft/spark1
    export SPARK_HOME2=/opt/soft/spark2
    export PYTHON_HOME=/opt/soft/python
    export JAVA_HOME=/opt/soft/java
    export HIVE_HOME=/opt/soft/hive
    export FLINK_HOME=/opt/soft/flink
    export DATAX_HOME=/opt/soft/datax/bin/datax.py
    export PATH=$HADOOP_HOME/bin:$SPARK_HOME2/bin:$PYTHON_HOME:$JAVA_HOME/bin:$HIVE_HOME/bin:$PATH:$FLINK_HOME/bin:$DATAX_HOME:$PATH
    

    注: 这一步非常重要,例如 JAVA_HOME 和 PATH 是必须要配置的,没有用到的可以忽略或者注释掉;如果找不到dolphinscheduler_env.sh, 请运行 ls -a

  • Soft link jdk to /usr/bin/java (still take JAVA_HOME=/opt/soft/java as an example)

    sudo ln -s /opt/soft/java/bin/java /usr/bin/java
    
  • Modify conf/config/install_config.confthe parameters in the one-click deployment configuration file  , pay special attention to the configuration of the following parameters

    # 这里填 mysql or postgresql
    dbtype="mysql"
    
    # 数据库连接地址
    dbhost="localhost:3306"
    
    # 数据库名
    dbname="dolphinscheduler"
    
    # 数据库用户名,此处需要修改为上面设置的{user}具体值
    username="xxx"    
    
    # 数据库密码, 如果有特殊字符,请使用\转义,需要修改为上面设置的{password}具体值
    password="xxx"
    
    #Zookeeper地址,单机本机是localhost:2181,记得把2181端口带上
    zkQuorum="localhost:2181"
    
    #将DS安装到哪个目录,如: /opt/soft/dolphinscheduler,不同于现在的目录
    installPath="/opt/soft/dolphinscheduler"
    
    #使用哪个用户部署,使用第3节创建的用户
    deployUser="dolphinscheduler"
    
    # 邮件配置,以qq邮箱为例
    # 邮件协议
    mailProtocol="SMTP"
    
    # 邮件服务地址
    mailServerHost="smtp.qq.com"
    
    # 邮件服务端口
    mailServerPort="25"
    
    # mailSender和mailUser配置成一样即可
    # 发送者
    mailSender="[email protected]"
    
    # 发送用户
    mailUser="[email protected]"
    
    # 邮箱密码
    mailPassword="xxx"
    
    # TLS协议的邮箱设置为true,否则设置为false
    starttlsEnable="true"
    
    # 开启SSL协议的邮箱配置为true,否则为false。注意: starttlsEnable和sslEnable不能同时为true
    sslEnable="false"
    
    # 邮件服务地址值,参考上面 mailServerHost
    sslTrust="smtp.qq.com"
    
    # 业务用到的比如sql等资源文件上传到哪里,可以设置:HDFS,S3,NONE,单机如果想使用本地文件系统,请配置为HDFS,因为HDFS支持本地文件系统;如果不需要资源上传功能请选择NONE。强调一点:使用本地文件系统不需要部署hadoop
    resourceStorageType="HDFS"
    
    # 这里以保存到本地文件系统为例
    #注:但是如果你想上传到HDFS的话,NameNode启用了HA,则需要将hadoop的配置文件core-site.xml和hdfs-site.xml放到conf目录下,本例即是放到/opt/dolphinscheduler/conf下面,并配置namenode cluster名称;如果NameNode不是HA,则修改为具体的ip或者主机名即可
    defaultFS="file:///data/dolphinscheduler"    #hdfs://{具体的ip/主机名}:8020
    
    # 如果没有使用到Yarn,保持以下默认值即可;如果ResourceManager是HA,则配置为ResourceManager节点的主备ip或者hostname,比如"192.168.xx.xx,192.168.xx.xx";如果是单ResourceManager请配置yarnHaIps=""即可
    yarnHaIps="192.168.xx.xx,192.168.xx.xx"
    
    # 如果ResourceManager是HA或者没有使用到Yarn保持默认值即可;如果是单ResourceManager,请配置真实的ResourceManager主机名或者ip
    singleYarnIp="yarnIp1"
    
    # 资源上传根路径,支持HDFS和S3,由于hdfs支持本地文件系统,需要确保本地文件夹存在且有读写权限
    resourceUploadPath="/data/dolphinscheduler"
    
    # 具备权限创建resourceUploadPath的用户
    hdfsRootUser="hdfs"
    
    #在哪些机器上部署DS服务,本机选localhost
    ips="localhost"
    
    #ssh端口,默认22
    sshPort="22"
    
    #master服务部署在哪台机器上
    masters="localhost"
    
    #worker服务部署在哪台机器上,并指定此worker属于哪一个worker组,下面示例的default即为组名
    workers="localhost:default"
    
    #报警服务部署在哪台机器上
    alertServer="localhost"
    
    #后端api服务部署在在哪台机器上
    apiServers="localhost"
    
    

    Note: If you plan to use the 资源中心function, execute the following command:

    sudo mkdir /data/dolphinscheduler
    sudo chown -R dolphinscheduler:dolphinscheduler /data/dolphinscheduler
    

7. One-click deployment

  • Switch to the deployment user and execute the one-click deployment script

    sh install.sh

    注意:
    第一次部署的话,在运行中第3步`3,stop server`出现5次以下信息,此信息可以忽略
    sh: bin/dolphinscheduler-daemon.sh: No such file or directory
    
  • After the script is completed, the following 5 services will be started, use the jpscommand to check whether the service is started ( jpsit java JDKcomes with it)

    MasterServer         ----- master服务
    WorkerServer         ----- worker服务
    LoggerServer         ----- logger服务
    ApiApplicationServer ----- api服务
    AlertServer          ----- alert服务

If the above services are started normally, the automatic deployment is successful

After the deployment is successful, you can view the logs, and the logs are stored in the logs folder.

 logs/
    ├── dolphinscheduler-alert-server.log
    ├── dolphinscheduler-master-server.log
    |—— dolphinscheduler-worker-server.log
    |—— dolphinscheduler-api-server.log
    |—— dolphinscheduler-logger-server.log

8. Log in to the system

9. Start-stop service

  • Stop all services in the cluster with one click

    sh ./bin/stop-all.sh

  • Open all services of the cluster with one click

    sh ./bin/start-all.sh

  • Start and stop Master

sh ./bin/dolphinscheduler-daemon.sh start master-server
sh ./bin/dolphinscheduler-daemon.sh stop master-server
  • Start and stop Worker
sh ./bin/dolphinscheduler-daemon.sh start worker-server
sh ./bin/dolphinscheduler-daemon.sh stop worker-server
  • Start and stop Api
sh ./bin/dolphinscheduler-daemon.sh start api-server
sh ./bin/dolphinscheduler-daemon.sh stop api-server
  • Start and stop Logger
sh ./bin/dolphinscheduler-daemon.sh start logger-server
sh ./bin/dolphinscheduler-daemon.sh stop logger-server
  • Start and stop Alert
sh ./bin/dolphinscheduler-daemon.sh start alert-server
sh ./bin/dolphinscheduler-daemon.sh stop alert-server

10. Encounter problems

  • Basic software installation: datax is required. As of now, the official document version 1.3.4 does not prompt, and an error is reported when datax is used to synchronize data later. Here has been included in the basic environment options, you need to pay attention to similar issues.
  • Add the password in the third step: echo "dolphinscheduler" | passwd --stdin dolphinscheduler, it is recommended to change to: passwd dolphinscheduler, which has better security and stability
  • Step 5: If you use MySQL and the user is root, you only need to execute the statement to create the database, and the statement of authorization and refresh permissions does not need to be executed
  • The version used by MySQL is 8.0.22, after downloading rpm -ivh mysql-connector-java-8.0.22-1.el7.noarch.rpm, the connection driver is in the /usr/share/java/ directory, you need to copy it
  • Step 7: After the deployment is completed, jps view found that the api did not start, manually execute sh ./bin/dolphinscheduler-daemon.sh start api-server to start the api service

Guess you like

Origin blog.csdn.net/ct_666/article/details/113114907