Getting Started with Data Integration Framework FlinkX (Chunjun)

01 What is FlinkX?

官网地址:https://dtstack.github.io/chunjun/
Github: https://github.com/DTStack/chunjun

FlinkX is now renamedChunjun, which is actually a product based on < a i=4>Flink realizes data synchronization and calculation between multiple heterogeneous data sources, and</u?. An open source data integration framework that supports streaming and batch integration


FlinkX abstracts different databases into reader/source plug-ins, writer/sink plug-ins and lookup dimension table plug-ins, which has the following characteristics:

  • Based on the real-time computing engine Flink, supportsJSON template configuration tasks, and is compatible withFlink SQLSyntax;
  • supports distributed operation and supports multiple submission methods such as flink-standalone, yarn-session, yarn-per job;
  • SupportDockerOne key department, supportK8S Department operation;
  • supports a variety of heterogeneous data sources, and can support the synchronization and calculation of more than 20 data sources such as MySQL、Oracle、SQLServer、Hive、Kudu;
  • Easy to expand and highly flexible, newly expanded data source plug-ins can instantly interoperate with existing data source plug-ins, and plug-in developers do not need to care about the code logic of other plug-ins;
  • It not only supports full synchronization, but also supports incremental synchronization and interval rotation training;
  • Integrating batch and stream, it not only supports offline synchronization and calculation, but is also compatible with real-time scenarios;
  • Support dirty data storage and provide indicator monitoring, etc.;
  • Cooperate withcheckpoint to achieve breakpoint resumption;
  • Not only supports synchronization ofDMLdata, but also supportsSchemachange synchronization.

In fact, its template configuration based on json is somewhat similar to DataX. The previous blogger also wrote a tutorial about DataX , interested students can refer to "DataX Column" .
Insert image description here
ok, next, use Chunjun according to the tutorial to implement the function of synchronizing MySQL to MySQL.

02 Use FlinkX to synchronize MySQL to MySQL

2.1 Source code compilation

First of all, we clone the source code of Chunjun. In order to improve the download speed, we can directly clone the gitee warehouse:https://gitee.com/dtstack_dev_0/chunjun.git .

cloneAfter is completed, you can see that the directory structure of Chunjun is as follows (remarked):

- bin                         			 # 存放执行脚本的目录
  ├── chunjun-docker.sh                  # Docker 启动脚本
  ├── chunjun-kubernetes-application.sh  # Kubernetes 应用模式启动脚本
  ├── chunjun-kubernetes-session.sh      # Kubernetes 会话模式启动脚本
  ├── chunjun-local.sh                   # 本地启动脚本
  ├── chunjun-standalone.sh              # 单机模式启动脚本
  ├── chunjun-yarn-perjob.sh             # YARN 每作业模式启动脚本
  ├── chunjun-yarn-session.sh            # YARN 会话模式启动脚本
  ├── start-chunjun                      # 通用启动脚本
  └── submit.sh                          # 提交任务脚本
- build                                  # 构建脚本目录
  └── build.sh                           # 构建脚本
- chunjun-assembly                       # 汇总装配模块目录
- chunjun-clients                        # 客户端模块目录
- chunjun-connectors                     # 连接器模块目录
  ├── (多个子目录)                         # 不同的数据连接器子模块
- chunjun-core                           # 核心模块目录
- chunjun-ddl                            # 数据定义语言模块目录
  ├── chunjun-ddl-base                   # DDL 基础模块
  ├── chunjun-ddl-mysql                  # MySQL DDL 模块
  ├── chunjun-ddl-oracle                 # Oracle DDL 模块
- chunjun-dev                            # 开发工具模块目录
  ├── (多个子目录)                         # 包含开发用的各种工具和资源
- chunjun-dirty                          # 脏数据处理模块目录
  ├── (多个子目录)                         # 不同的脏数据处理子模块
- chunjun-docker                         # Docker 相关模块目录
  ├── (多个子目录)                         # Docker 相关资源和配置
- chunjun-e2e                            # 端到端测试模块目录
- chunjun-examples                       # 示例模块目录
  ├── json                               # JSON 示例
  └── sql                                # SQL 示例
- chunjun-local-test                     # 本地测试模块目录
- chunjun-metrics                        # 指标监控模块目录
  ├── (多个子目录)                         # 包含不同的监控模块
- chunjun-restore                        # 数据恢复模块目录
  ├── chunjun-restore-common             # 通用数据恢复模块
  └── chunjun-restore-mysql              # MySQL 数据恢复模块

Compilation prerequisites: JDK and MAVEN, which will not be detailed here.

Execute the compilation command:

mvn clean package -DskipTests 

Compiled successfully:
Insert image description here

All resources can be found in the output directory under the project directory:
Insert image description here

Let's unzip it and see what's in this directory:

├── bin
│   ├── chunjun-docker.sh
│   ├── chunjun-kubernetes-application.sh
│   ├── chunjun-kubernetes-session.sh
│   ├── chunjun-local.sh
│   ├── chunjun-standalone.sh
│   ├── chunjun-yarn-perjob.sh
│   ├── chunjun-yarn-session.sh
│   ├── start-chunjun
│   └── submit.sh
├── chunjun-dist
│   ├── chunjun-core.jar
│   ├── connector
│   ├── ddl
│   ├── dirty-data-collector
│   ├── docker-build
│   ├── metrics
│   └── restore-plugins
├── chunjun-examples
│   ├── json
│   └── sql
└── lib
    ├── chunjun-clients.jar
    ├── log4j-1.2-api-2.19.0.jar
    ├── log4j-api-2.19.0.jar
    ├── log4j-core-2.19.0.jar
    └── log4j-slf4j-impl-2.19.0.jar

ok, with the above content, you can submit a task. Task submission types are: local (default value), standalone, yarn-session, yarn-per-job, < /span> mode to submit . Then use , please refer to the ClusterMode class for details: kubernetes-session, kubernetes-application
Insert image description here
local

2.2 Submit tasks

We can directly modify the content to run the example (location "Unzip directory/chunjun-examples/json/mysql"): a>
Insert image description here

Directly modify the contents of "mysql_mysql_realtime.json":

{
  "job": {
    "content": [
      {
        "reader": {
          "name": "mysqlreader",
          "parameter": {
            "column": [
              {
                "name": "id",
                "type": "int"
              },
              {
                "name": "name",
                "type": "string"
              },
              {
                "name": "age",
                "type": "int"
              }
            ],
            "customSql": "",
            "where": "id < 1000",
            "splitPk": "id",
            "startLocation": "2",
            "polling": true,
            "pollingInterval": 3000,
            "queryTimeOut": 1000,
            "username": "root",
            "password": "root",
            "connection": [
              {
                "jdbcUrl": [
                  "jdbc:mysql://127.0.0.1:32306/test?useSSL=false"
                ],
                "table": [
                  "t_user"
                ]
              }
            ]
          }
        },
        "writer": {
          "name": "mysqlwriter",
          "parameter": {
            "username": "root",
            "password": "root",
            "connection": [
              {
                "jdbcUrl": "jdbc:mysql://127.0.0.1:32306/test?useSSL=false",
                "table": [
                  "t_user_copy"
                ]
              }
            ],
            "writeMode": "insert",
            "flushIntervalMills":"3000",
            "uniqueKey": ["id"],
            "column": [
              {
                "name": "id",
                "type": "int"
              },
              {
                "name": "name",
                "type": "string"
              },
              {
                "name": "age",
                "type": "int"
              }
            ]
          }
        }
      }
    ],
    "setting": {
      "restore": {
        "restoreColumnName": "id"
      },
      "speed": {
        "channel": 1,
        "bytes": 0
      }
    }
  }
}

submit homework:

cd 安装目录/bin
sh chunjun-local.sh -job ../chunjun-examples/json/mysql/mysql_mysql_realtime.json

When starting, it will print:
Insert image description here
After starting (it feels very similar to DataX):
Insert image description here

You can see that the data of the t_user table has been synchronized to the t_user_copy table:
Insert image description here

03 End of article

ok, this article mainly briefly explains the simple use of FlinkX, and will continue to explain its principles and source code later. I hope it helps everyone. Thank you everyone for reading. This article is complete.

Guess you like

Origin blog.csdn.net/qq_20042935/article/details/134163899