Article directory
01 What is FlinkX?
官网地址:https://dtstack.github.io/chunjun/
Github: https://github.com/DTStack/chunjun
FlinkX is now renamedChunjun, which is actually a product based on < a i=4>Flink realizes data synchronization and calculation between multiple heterogeneous data sources, and</u?. An open source data integration framework that supports streaming and batch integration
FlinkX abstracts different databases into reader/source
plug-ins, writer/sink
plug-ins and lookup
dimension table plug-ins, which has the following characteristics:
- Based on the real-time computing engine Flink, supportsJSON template configuration tasks, and is compatible withFlink SQLSyntax;
- supports distributed operation and supports multiple submission methods such as
flink-standalone
,yarn-session
,yarn-per job
; - Support
Docker
One key department, supportK8S
Department operation; - supports a variety of heterogeneous data sources, and can support the synchronization and calculation of more than 20 data sources such as
MySQL、Oracle、SQLServer、Hive、Kudu
; - Easy to expand and highly flexible, newly expanded data source plug-ins can instantly interoperate with existing data source plug-ins, and plug-in developers do not need to care about the code logic of other plug-ins;
- It not only supports full synchronization, but also supports incremental synchronization and interval rotation training;
- Integrating batch and stream, it not only supports offline synchronization and calculation, but is also compatible with real-time scenarios;
- Support dirty data storage and provide indicator monitoring, etc.;
- Cooperate with
checkpoint
to achieve breakpoint resumption; - Not only supports synchronization of
DML
data, but also supportsSchema
change synchronization.
In fact, its template configuration based on json
is somewhat similar to DataX
. The previous blogger also wrote a tutorial about DataX
, interested students can refer to "DataX Column" .
ok, next, use Chunjun according to the tutorial to implement the function of synchronizing MySQL to MySQL.
02 Use FlinkX to synchronize MySQL to MySQL
2.1 Source code compilation
First of all, we clone the source code of Chunjun. In order to improve the download speed, we can directly clone the gitee warehouse:https://gitee.com/dtstack_dev_0/chunjun.git .
clone
After is completed, you can see that the directory structure of Chunjun
is as follows (remarked):
- bin # 存放执行脚本的目录
├── chunjun-docker.sh # Docker 启动脚本
├── chunjun-kubernetes-application.sh # Kubernetes 应用模式启动脚本
├── chunjun-kubernetes-session.sh # Kubernetes 会话模式启动脚本
├── chunjun-local.sh # 本地启动脚本
├── chunjun-standalone.sh # 单机模式启动脚本
├── chunjun-yarn-perjob.sh # YARN 每作业模式启动脚本
├── chunjun-yarn-session.sh # YARN 会话模式启动脚本
├── start-chunjun # 通用启动脚本
└── submit.sh # 提交任务脚本
- build # 构建脚本目录
└── build.sh # 构建脚本
- chunjun-assembly # 汇总装配模块目录
- chunjun-clients # 客户端模块目录
- chunjun-connectors # 连接器模块目录
├── (多个子目录) # 不同的数据连接器子模块
- chunjun-core # 核心模块目录
- chunjun-ddl # 数据定义语言模块目录
├── chunjun-ddl-base # DDL 基础模块
├── chunjun-ddl-mysql # MySQL DDL 模块
├── chunjun-ddl-oracle # Oracle DDL 模块
- chunjun-dev # 开发工具模块目录
├── (多个子目录) # 包含开发用的各种工具和资源
- chunjun-dirty # 脏数据处理模块目录
├── (多个子目录) # 不同的脏数据处理子模块
- chunjun-docker # Docker 相关模块目录
├── (多个子目录) # Docker 相关资源和配置
- chunjun-e2e # 端到端测试模块目录
- chunjun-examples # 示例模块目录
├── json # JSON 示例
└── sql # SQL 示例
- chunjun-local-test # 本地测试模块目录
- chunjun-metrics # 指标监控模块目录
├── (多个子目录) # 包含不同的监控模块
- chunjun-restore # 数据恢复模块目录
├── chunjun-restore-common # 通用数据恢复模块
└── chunjun-restore-mysql # MySQL 数据恢复模块
Compilation prerequisites: JDK and MAVEN, which will not be detailed here.
Execute the compilation command:
mvn clean package -DskipTests
Compiled successfully:
All resources can be found in the output directory under the project directory:
Let's unzip it and see what's in this directory:
├── bin
│ ├── chunjun-docker.sh
│ ├── chunjun-kubernetes-application.sh
│ ├── chunjun-kubernetes-session.sh
│ ├── chunjun-local.sh
│ ├── chunjun-standalone.sh
│ ├── chunjun-yarn-perjob.sh
│ ├── chunjun-yarn-session.sh
│ ├── start-chunjun
│ └── submit.sh
├── chunjun-dist
│ ├── chunjun-core.jar
│ ├── connector
│ ├── ddl
│ ├── dirty-data-collector
│ ├── docker-build
│ ├── metrics
│ └── restore-plugins
├── chunjun-examples
│ ├── json
│ └── sql
└── lib
├── chunjun-clients.jar
├── log4j-1.2-api-2.19.0.jar
├── log4j-api-2.19.0.jar
├── log4j-core-2.19.0.jar
└── log4j-slf4j-impl-2.19.0.jar
ok, with the above content, you can submit a task. Task submission types are: local
(default value), standalone
, yarn-session
, yarn-per-job
, < /span> mode to submit . Then use , please refer to the ClusterMode class for details: kubernetes-session
, kubernetes-application
local
2.2 Submit tasks
We can directly modify the content to run the example (location "Unzip directory/chunjun-examples/json/mysql"): a>
Directly modify the contents of "mysql_mysql_realtime.json":
{
"job": {
"content": [
{
"reader": {
"name": "mysqlreader",
"parameter": {
"column": [
{
"name": "id",
"type": "int"
},
{
"name": "name",
"type": "string"
},
{
"name": "age",
"type": "int"
}
],
"customSql": "",
"where": "id < 1000",
"splitPk": "id",
"startLocation": "2",
"polling": true,
"pollingInterval": 3000,
"queryTimeOut": 1000,
"username": "root",
"password": "root",
"connection": [
{
"jdbcUrl": [
"jdbc:mysql://127.0.0.1:32306/test?useSSL=false"
],
"table": [
"t_user"
]
}
]
}
},
"writer": {
"name": "mysqlwriter",
"parameter": {
"username": "root",
"password": "root",
"connection": [
{
"jdbcUrl": "jdbc:mysql://127.0.0.1:32306/test?useSSL=false",
"table": [
"t_user_copy"
]
}
],
"writeMode": "insert",
"flushIntervalMills":"3000",
"uniqueKey": ["id"],
"column": [
{
"name": "id",
"type": "int"
},
{
"name": "name",
"type": "string"
},
{
"name": "age",
"type": "int"
}
]
}
}
}
],
"setting": {
"restore": {
"restoreColumnName": "id"
},
"speed": {
"channel": 1,
"bytes": 0
}
}
}
}
submit homework:
cd 安装目录/bin
sh chunjun-local.sh -job ../chunjun-examples/json/mysql/mysql_mysql_realtime.json
When starting, it will print:
After starting (it feels very similar to DataX):
You can see that the data of the t_user table has been synchronized to the t_user_copy table:
03 End of article
ok, this article mainly briefly explains the simple use of FlinkX, and will continue to explain its principles and source code later. I hope it helps everyone. Thank you everyone for reading. This article is complete.