sqoop: When exporting MySQL data to Hive, the data contains characters such as \001 or \n

Scenes

When using sqoop to export data from MySQL to Hive, if the data contains the column separator specified by hive, such as \001 or \t, the data will be misaligned in Hive; if the data contains a newline character \n, then it will As a result, the original one line of data becomes two lines in Hive.

Solution

When executing sqoop, use the following parameters:

--hive-drop-import-delims When importing Hive, delete \n, \r and \01 from the string field.
--hive-delims-replacement Import Hive and replace \n, \r and \01 in the string field with the specified string.

Such as:

sqoop-import \
--connect jdbc:mysql://ip:port/databasesName \
--username xxxx \
--password xxxx \
--table tableName\
--target-dir /data/tableDir \ #Hive表在HDFS中的位置
--fields-terminated-by '\001'  \ #指定导入到Hive时的列分割符
-m 1 \  #表明几个Map并行跑
--split-by stat_date \ #拆分数据的字段,假设-m设置为4,数据有100条,sqoop首先会获取拆分字段的最大值,最小值,步长为100/4=25;
--delete-target-dir \ #导入Hive前是否先删除HDFS中的目标目录,相当于overwrite
--hive-delims-replacement ''  #将特殊字符转换成空字符串

Sqoop imports other configuration items of Hive

Argument Description
--hive-home <dir> Set up$HIVE_HOME的位置,不加则默认使用环境中的$HIVE_HOME
--hive-import Import the table into hive (if the separator is not set, the Hive default separator is used.)
--hive-overwrite When importing tables into Hive, overwrite the original Hive tables
--create-hive-table If the Hive table does not exist, it will be created automatically; if it does exist, an error will be reported
--hive-table <table-name> Set the target Hive table.
--hive-drop-import-delims When importing to Hive, delete the \n\r , and \01 characters contained in the original data  .
--hive-delims-replacement When importing to Hive,  replace \n\r , and   \01 in the original data with custom characters.
--hive-partition-key Specifies the partition field of the Hive table.
--hive-partition-value <v> Specifies the value of the partition field of the imported Hive table.
--map-column-hive <map> When importing into Hive, specify the data type of the field. If the ID is set to S listening type: --map-column-hive  ID=String

Guess you like

Origin blog.csdn.net/x950913/article/details/108516635