Big data DataX-Web detailed installation tutorial

Table of contents

1. Introduction to DataX-Web

1.1 What is DataX-Web

1.2 DataX-Web architecture

2. DataX-Web installation and deployment 

2.1 Environmental requirements

2.2 Installation

2.3 Deployment

2.4 Database initialization

2.5 Configuration

2.6 Start the service

2.6.1 Start all services with one click

2.6.2 Cancel all services with one click

2.7 View services (Attention! Attention!)

2.8 Access the Web UI

2.9 Operation log

3. DataX-Web task deployment

3.1 Create project

3.2 Executor management

3.3 Create data source

3.3.1 mysql data source

3.3.2 hive data source

3.4 Create task template

3.5 Task creation

3.5.1 Building reader

3.5.2 Build writer

3.5.3 Set field mapping

3.5.4 Build

4. DataX-Web task management


 

1. Introduction to DataX-Web

1.1 What is DataX-Web

Point location:https://github.com/WeiYe-Jing/datax-web

        datax-web  is a distributed data synchronization tool developed on DataX. It provides a simple and easy-to-use operation interface, reduces users’ learning costs for using DataX, and shortens task configuration. time to avoid errors during configuration. Users can select the data source through the page to create a data synchronization task. Supports RDBMS, Hive, HBase, ClickHouse, MongoDB and other data sources. The RDBMS data source can create data synchronization tasks in batches, supports real-time viewing of data synchronization progress and logs, and provides a synchronization termination function. It integrates and secondary develops xxl-job, and can synchronize data incrementally based on time and auto-incremented primary keys.

        The task "executor" supports cluster deployment, multi-node routing strategy selection for the executor, timeout control, failure retry, failure alarm, task dependency, executor CPU, memory, load monitoring, etc.

1.2 DataX-Web architecture

2. DataX-Web installation and deployment 

2.1 Environmental requirements

environment Require
operating system mac、Windows、Linux
Java Java8, jdk version is recommended to be above 1.8.201
Python

Python (2.x) (to support Python3, you need to modify and replace the three python files under datax/bin. The replacement files are under doc/datax-web/datax-python3). Required. It is mainly used to schedule and execute the startup script of the underlying DataX. The default way is to execute DataX in Java sub-process mode. Users can choose to make customized transformations in Python mode.

MySQL MySQL 5.7+
Maven Apache Maven 3.6.1+, required to compile the installation package (optional)
DataX DataX 3

For the corresponding basic environment, you can check this article for installation:Big Data DataX Detailed Installation Tutorial-CSDN Blog 

2.2 Installation

        Download the installation package directly (download address: https://pan.baidu.com/s/13yoqhGpD00I82K4lOYtQhg, extraction code: cpsk), unzip and install to the specified path:

(base) [root@hadoop03 ~]# ls
datax-web-2.1.2.tar.gz
(base) [root@hadoop03 ~]# tar -zxvf datax-web-2.1.2.tar.gz -C /usr/local/

2.3 Deployment

        Execute the one-click installation script, enter the decompressed directory, and find the install.sh file under the bin directory. If you choose interactive installation, execute directly:

(base) [root@hadoop03 ~]# cd /usr/local/datax-web-2.1.2/
(base) [root@hadoop03 /usr/local/datax-web-2.1.2]# ./bin/install.sh

        In the interactive mode, the user is asked for confirmation when decompressing the package compressed package of each module and calling the configure configuration script. You can check whether the installation is successful according to the prompts. If the installation is not successful, you can try again; if you do not want to use the interactive mode, skip After passing the confirmation process, execute the following command to install:

./bin/install.sh --force

2.4 Database initialization

If the mysql command is installed on your local service, the following reminder will appear during the execution of the installation script:

Scan out mysql command, so begin to initalize the database
Do you want to initalize database with sql: [{INSTALL_PATH}/bin/db/datax-web.sql]? (Y/N)y
Please input the db host(default: 127.0.0.1): 
Please input the db port(default: 3306): 
Please input the db username(default: root): 
Please input the db password(default: ): 
Please input the db name(default: exchangis)

        ​​​​​​Follow the prompts to enter the database address, port number, user name, password and database name. In most cases, the initialization can be completed quickly. If the mysql command is not installed on the local service (mysql is not installed on my server), you can use the /bin/db/datax-web.sql script in the directory to execute it manually. After completion, modify the relevant configuration files:

(base) [root@hadoop03 /usr/local/datax-web-2.1.2]# vim modules/datax-admin/conf/bootstrap.properties 
#Database
DB_HOST=192.168.170.136
DB_PORT=3306
DB_USERNAME=root
DB_PASSWORD=xxx
DB_DATABASE=dataxweb

Just configure the corresponding value according to the specific situation.

2.5 Configuration

        After the installation is complete, specify the path of PYTHON_PATH (that is, the python script path of DataX) in the project directory /modules/datax-execute/bin/env.properties:

(base) [root@hadoop03 /usr/local/datax-web-2.1.2]# vim modules/datax-executor/bin/env.properties 
······
## PYTHON脚本执行位置
#PYTHON_PATH=/home/hadoop/install/datax/bin/datax.py
PYTHON_PATH=/usr/local/datax/bin/datax.py

2.6 Start the service

2.6.1 Start all services with one click

./bin/start-all.sh

Some modules may fail to start or get stuck during the process. You can exit and repeat the execution. If you need to change the service port number of a certain module, then:

vi ./modules/{module_name}/bin/env.properties

Find the SERVER_PORT configuration item and change its value. Of course, you can also start a certain module service individually:

./bin/start.sh -m {module_name}

2.6.2 Cancel all services with one click

./bin/stop-all.sh

Of course, you can also stop a certain module service alone:

./bin/stop.sh -m {module_name}

2.7 View services (Attention! Attention!)

        ​ ​ ​Use the JPS command in the Linux environment to check whether the DataXAdminApplication and DataXExecutorApplication processes appear. If they exist, it means that the project runs successfully:

        If the project fails to start, please check the startup log: modules/datax-admin/bin/console.out or modules/datax-executor/bin/console.out


Tips: The script uses the bash instruction set. If you use sh to call the script, there may be unknown errors.

2.8 Access the Web UI

        After the deployment is completed, enter http://ip:port/index.html in the browser to access the corresponding main interface (ip is the IP of the server where datax-admin is deployed, and port is the operating port 9527 specified by datax-admin). Enter the username admin and password 123456 to directly access the system:

If you cannot log in and the account and password are incorrect, you can first go to the database to see if there is a dataxweb database generated. If not, we need to manually import datax_web.sql into the dataxweb database. First create the dataxweb database and then enter this database, and finally import datax_web .sql file will do:

(base) [root@hadoop03 /usr/local/datax-web-2.1.2/bin/db]# pwd
/usr/local/datax-web-2.1.2/bin/db
(base) [root@hadoop03 /usr/local/datax-web-2.1.2/bin/db]# ls
datax_web.sql

2.9 Operation log

        After the deployment is completed, under modules/corresponding project/data/applogs (the user can also specify the log by himself, just modify the logpath address in application.yml), the user can track the actual startup status of the project based on this log

If the executor starts faster than admin, the executor will fail to connect and the log will report a "connection refused" error:

The solution is to start the admin first and then the executor. It will reconnect after 30 seconds. If successful, please ignore this exception.

(base) [root@hadoop03 /usr/local/datax-web-2.1.2/bin]# ./start.sh -m datax-admin

# 30s 后再启动
(base) [root@hadoop03 /usr/local/datax-web-2.1.2/bin]# ./start.sh -m datax-executor

3. DataX-Web task deployment

3.1 Create project

3.2 Executor management

Here is a list of all online Executors:

3.3 Create data source

3.3.1 mysql data source

3.3.2 hive data source

        datax-web is connected to Hive through ThriftServer. Therefore, it is necessary to ensure that Hive's hiveserver2 service is turned on.  

3.4 Create task template

3.5 Task creation

3.5.1 Building reader

3.5.2 Build writer

3.5.3 Set field mapping

3.5.4 Build

4. DataX-Web task management

Guess you like

Origin blog.csdn.net/weixin_46560589/article/details/134592916