[Sqoop] Import data into orc format hive specified partition table

The following article is importing mysql table into partition table in hiveORC storage format

Import mysql table into partition table in hive TEXTfile storage format , please click here to jump

Sqoop needs to rely on the lib of HCatalog, so the environment variable $HCAT_HOME needs to be configured. Generally, the relevant path of hcatalog can be found from the hive directory

  • Copy hive-hcatalog-core-1.2.2.jar from hive/lib to sqoop/lib
  • cp $HIVE_HOME/lib/hive-shims* $SQOOP_HOME/lib/ 
  • Add in the /etc/profile file
export HCATALOG_HOME=${HIVE_HOME}/hcatalog

 Variables that need to be assigned in the script below

${IP} The IP of the server where mysql is located

${USERNAME} mysql username

${PWD} mysql password

$MYSQLTABLE   mysql表

The time field in the ${date_field} table

${partition_name} the name of the partition field that needs to be added in hive

${partition_value} hive partition value

1. Sqoop creates and imports data to hive orc table

sqoop import \
--connect jdbc:mysql://$IP:3306/$MYSQLDB \
--username $USERNAME \
--password $PWD \
--table $MYSQLTABLE \
--driver com.mysql.jdbc.Driver \
--hcatalog-database intelligentCoal \
--create-hcatalog-table \
--hcatalog-table t_user_orc \
--where "date_format(${date_field},'%Y-%m-%d')='${partition_value}' and \$CONDITIONS" \
--hcatalog-partition-keys ${partition_name} \
--hcatalog-partition-values ${partition_value} \
--hcatalog-storage-stanza 'stored as orc tblproperties ("orc.compress"="SNAPPY")' \
-m 1

 The where option is added or not added according to the needs. The effect of not adding where is the same as --where "1=1 and \$CONDITIONS" \

2. Sqoop imports data to the existing hive orc table

sqoop import \
--connect jdbc:mysql://$IP:3306/$MYSQLDB \
--username $USERNAME \
--password $PWD \
--table $MYSQLTABLE \
--driver com.mysql.jdbc.Driver \
--hcatalog-database intelligentCoal \
--hcatalog-table t_user_orc \
--where "date_format(${date_field},'%Y-%m-%d')='${partition_value}' and \$CONDITIONS" \
--hcatalog-partition-keys ${partition_name} \
--hcatalog-partition-values ${partition_value} \
--hcatalog-storage-stanza 'stored as orc tblproperties ("orc.compress"="SNAPPY")' \
-m 1

If the field type is not specified, the varchar data in MySQL will be of the varchar type when extracted into hive, but the operation of the varchar type in hive will cause various problems

  1. Incomplete extraction of long text and text containing special characters

  2. Hive operation orc table varchar type field causes garbled

Solution: Specify the field type when extracting data (xxx is the column you want to change to String type)

-map-column-hive xxx=String,xxxx=String

 -------------------------------------------------------------------------------------------------------------------

connect JDBC connection information
username JDBC authentication user name
password JDBC authentication password
table The name of the source table to be imported
driver Specify JDBC driver
create-hcatalog-table Specify that the table needs to be created, if not specified, it will not be created by default. Note that if the specified table already exists, an error will be reported
hcatalog-table The name of the target table to be exported
hcatalog-storage-stanza Specify the storage format, and the parameter value will be spliced ​​into the create table command. Default: stored as rcfile
hcatalog-partition-keys Specify the partition field, separate multiple fields with commas (enhanced version of hive-partition-key)
hcatalog-partition-values Specify the partition value, separate multiple partition values ​​with commas (enhancement of hive-partition-value)

Guess you like

Origin blog.csdn.net/qq_44065303/article/details/109379729