Konfigurieren Sie DataX

1. Konfigurieren Sie DataX

1)下载DataX安装包并上传到hadoop102的/opt/software
下载地址:http://datax-opensource.oss-cn-hangzhou.aliyuncs.com/datax.tar.gz
.tar.gz到/opt/module
[atguigu@hadoop102 software]$ tar -zxvf datax.tar.gz -C /opt/module/
2)自检,执行如下命令
[atguigu@hadoop102 ~]$ python /opt/module/datax/bin/datax.py /opt/module/datax/job/job.json
出现如下内容2)解压datax,则表明安装成功
……
2021-10-12 21:51:12.335 [job-0] INFO  JobContainer - 
任务启动时刻                    : 2021-10-12 21:51:02
任务结束时刻                    : 2021-10-12 21:51:12
任务总计耗时                    :                 10s
任务平均流量                    :          253.91KB/s
记录写入速度                    :          10000rec/s
读出记录总数                    :              100000
读写失败总数                    :                   0


// An highlighted block
var foo = 'bar';

2.DataX-Fall

Die Verwendung von DataX ist sehr einfach: Benutzer müssen lediglich den entsprechenden Reader und Writer entsprechend der Datenquelle und dem Ziel ihrer eigenen synchronisierten Daten auswählen, die Reader- und Writer-Informationen in einer JSON-Datei konfigurieren und dann zum Senden den folgenden Befehl ausführen die Datensynchronisierungsaufgabe. .

[atguigu@hadoop102 datax]$ python bin/datax.py path/to/your/job.json

2.1TableMode von MySQLReader

[gpb@hadoop102 datax]$ cd job/
[gpb@hadoop102 job]$ vim base_province.json

{
    
    
    "job": {
    
    
        "content": [
            {
    
    
                "reader": {
    
    
                    "name": "mysqlreader",
                    "parameter": {
    
    
                        "column": [
                            "id",
                            "name",
                            "region_id",
                            "area_code",
                            "iso_code",
                            "iso_3166_2"
                        ],
                        "where": "id>=3",
                        "connection": [
                            {
    
    
                                "jdbcUrl": [
                                    "jdbc:mysql://hadoop102:3306/gmall"
                                ],
                                "table": [
                                    "base_province"
                                ]
                            }
                        ],
                        "password": "000000",
                        "splitPk": "",
                        "username": "root"
                    }
                },
                "writer": {
    
    
                    "name": "hdfswriter",
                    "parameter": {
    
    
                        "column": [
                            {
    
    
                                "name": "id",
                                "type": "bigint"
                            },
                            {
    
    
                                "name": "name",
                                "type": "string"
                            },
                            {
    
    
                                "name": "region_id",
                                "type": "string"
                            },
                            {
    
    
                                "name": "area_code",
                                "type": "string"
                            },
                            {
    
    
                                "name": "iso_code",
                                "type": "string"
                            },
                            {
    
    
                                "name": "iso_3166_2",
                                "type": "string"
                            }
                        ],
                        "compress": "gzip",
                        "defaultFS": "hdfs://hadoop102:8020",
                        "fieldDelimiter": "\t",
                        "fileName": "base_province",
                        "fileType": "text",
                        "path": "/base_province",
                        "writeMode": "append"
                    }
                }
            }
        ],
        "setting": {
    
    
            "speed": {
    
    
                "channel": 1
            }
        }
    }
}

Python bin/datax.py job/base_province.json

2.2 QuerySQLMode von MySQLReader

(1) Ändern Sie die Konfigurationsdatei base_province.json
[atguigu@hadoop102 ~]$ vim /opt/module/datax/job/base_province.json
(2) Der Inhalt der Konfigurationsdatei lautet wie folgt


{
    
    
    "job": {
    
    
        "content": [
            {
    
    
                "reader": {
    
    
                    "name": "mysqlreader",
                    "parameter": {
    
    
                        "connection": [
                            {
    
    
                                "jdbcUrl": [
                                    "jdbc:mysql://hadoop102:3306/gmall"
                                ],
                                "querySql": [
                                    "select id,name,region_id,area_code,iso_code,iso_3166_2 from base_province where id>=3"
                                ]
                            }
                        ],
                        "password": "000000",
                        "username": "root"
                    }
                },
                "writer": {
    
    
                    "name": "hdfswriter",
                    "parameter": {
    
    
                        "column": [
                            {
    
    
                                "name": "id",
                                "type": "bigint"
                            },
                            {
    
    
                                "name": "name",
                                "type": "string"
                            },
                            {
    
    
                                "name": "region_id",
                                "type": "string"
                            },
                            {
    
    
                                "name": "area_code",
                                "type": "string"
                            },
                            {
    
    
                                "name": "iso_code",
                                "type": "string"
                            },
                            {
    
    
                                "name": "iso_3166_2",
                                "type": "string"
                            }
                        ],
                        "compress": "gzip",
                        "defaultFS": "hdfs://hadoop102:8020",
                        "fieldDelimiter": "\t",
                        "fileName": "base_province",
                        "fileType": "text",
                        "path": "/base_province",
                        "writeMode": "append"
                    }
                }
            }
        ],
        "setting": {
    
    
            "speed": {
    
    
                "channel": 1
            }
        }
    }
}


3)提交任务1)清空历史数据
[atguigu@hadoop102 datax]$ hadoop fs -rm -r -f /base_province/*
(2)进入DataX根目录
[atguigu@hadoop102 datax]$ cd /opt/module/datax 
(3)执行如下命令
[atguigu@hadoop102 datax]$ python bin/datax.py job/base_province_sql.json
4)查看结果
(1)DataX打印日志
2021-10-13 11:13:14.930 [job-0] INFO  JobContainer - 
任务启动时刻                    : 2021-10-13 11:13:03
任务结束时刻                    : 2021-10-13 11:13:14
任务总计耗时                    :                 11s
任务平均流量                    :               66B/s
记录写入速度                    :              3rec/s
读出记录总数                    :                  32
读写失败总数                    :                   0
(2)查看HDFS文件
[atguigu@hadoop102 datax]$ hadoop fs -cat /base_province/* | zcat


4.2.3 DataX-Parameterübertragung

通常情况下,离线数据同步任务需要每日定时重复执行,故HDFS上的目标路径通常会包含一层日期,以对每日同步的数据加以区分,也就是说每日同步数据的目标路径不是固定不变的,因此DataX配置文件中HDFS Writer的path参数的值应该是动态的。为实现这一效果,就需要使用DataX传参的功能。
DataX传参的用法如下,在JSON配置文件中使用${
    
    param}引用参数,在提交任务时使用-p"-Dparam=value"传入参数值,具体示例如下。
1)编写配置文件1)修改配置文件base_province.json
[atguigu@hadoop102 ~]$ vim /opt/module/datax/job/base_province.json2)配置文件内容如下
{
    
    
    "job": {
    
    
        "content": [
            {
    
    
                "reader": {
    
    
                    "name": "mysqlreader",
                    "parameter": {
    
    
                        "connection": [
                            {
    
    
                                "jdbcUrl": [
                                    "jdbc:mysql://hadoop102:3306/gmall"
                                ],
                                "querySql": [
                                    "select id,name,region_id,area_code,iso_code,iso_3166_2 from base_province where id>=3"
                                ]
                            }
                        ],
                        "password": "000000",
                        "username": "root"
                    }
                },
                "writer": {
    
    
                    "name": "hdfswriter",
                    "parameter": {
    
    
                        "column": [
                            {
    
    
                                "name": "id",
                                "type": "bigint"
                            },
                            {
    
    
                                "name": "name",
                                "type": "string"
                            },
                            {
    
    
                                "name": "region_id",
                                "type": "string"
                            },
                            {
    
    
                                "name": "area_code",
                                "type": "string"
                            },
                            {
    
    
                                "name": "iso_code",
                                "type": "string"
                            },
                            {
    
    
                                "name": "iso_3166_2",
                                "type": "string"
                            }
                        ],
                        "compress": "gzip",
                        "defaultFS": "hdfs://hadoop102:8020",
                        "fieldDelimiter": "\t",
                        "fileName": "base_province",
                        "fileType": "text",
                        "path": "/base_province/${dt}",
                        "writeMode": "append"
                    }
                }
            }
        ],
        "setting": {
    
    
            "speed": {
    
    
                "channel": 1
            }
        }
    }
}
2)提交任务1)创建目标路径
[atguigu@hadoop102 datax]$ hadoop fs -mkdir /base_province/2020-06-142)进入DataX根目录
[atguigu@hadoop102 datax]$ cd /opt/module/datax 3)执行如下命令
[atguigu@hadoop102 datax]$ python bin/datax.py -p"-Ddt=2020-06-14" job/base_province.json
3)查看结果
[atguigu@hadoop102 datax]$ hadoop fs -ls /base_province
Found 2 items
drwxr-xr-x   - atguigu supergroup          0 2021-10-15 21:41 /base_province/2020-06-14

4.3 HDFS-Daten mit MySQL-Fall synchronisieren

Fallanforderungen: Synchronisieren Sie die Daten im Verzeichnis /base_province auf HDFS mit der Tabelle test_province unter der MySQL-Gmall-Datenbank.
Anforderungsanalyse: Um diese Funktion zu implementieren, müssen HDFSReader und MySQLWriter ausgewählt werden.



1)编写配置文件1)创建配置文件test_province.json
[atguigu@hadoop102 ~]$ vim /opt/module/datax/job/base_province.json2)配置文件内容如下
{
    
    
    "job": {
    
    
        "content": [
            {
    
    
                "reader": {
    
    
                    "name": "hdfsreader",
                    "parameter": {
    
    
                        "defaultFS": "hdfs://hadoop102:8020",
                        "path": "/base_province",
                        "column": [
                            "*"
                        ],
                        "fileType": "text",
                        "compress": "gzip",
                        "encoding": "UTF-8",
                        "nullFormat": "\\N",
                        "fieldDelimiter": "\t",
                    }
                },
                "writer": {
    
    
                    "name": "mysqlwriter",
                    "parameter": {
    
    
                        "username": "root",
                        "password": "000000",
                        "connection": [
                            {
    
    
                                "table": [
                                    "test_province"
                                ],
                                "jdbcUrl": "jdbc:mysql://hadoop102:3306/gmall?useUnicode=true&characterEncoding=utf-8"
                            }
                        ],
                        "column": [
                            "id",
                            "name",
                            "region_id",
                            "area_code",
                            "iso_code",
                            "iso_3166_2"
                        ],
                        "writeMode": "replace"
                    }
                }
            }
        ],
        "setting": {
    
    
            "speed": {
    
    
                "channel": 1
            }
        }
    }
}
3)提交任务1)在MySQL中创建gmall.test_province表
DROP TABLE IF EXISTS `test_province`;
CREATE TABLE `test_province`  (
  `id` bigint(20) NOT NULL,
  `name` varchar(20) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,
  `region_id` varchar(20) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,
  `area_code` varchar(20) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,
  `iso_code` varchar(20) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,
  `iso_3166_2` varchar(20) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE = InnoDB CHARACTER SET = utf8 COLLATE = utf8_general_ci ROW_FORMAT = Dynamic;2)进入DataX根目录
[atguigu@hadoop102 datax]$ cd /opt/module/datax 3)执行如下命令
[atguigu@hadoop102 datax]$ python bin/datax.py job/test_province.json 

Supongo que te gusta

Origin blog.csdn.net/qq_45972323/article/details/132371187
Recomendado
Clasificación