hive table data migration (production clusters -> test cluster)

Because you need to test environmental testing hive table logic, but test the cluster is not related tables and data,
so the use of the most simple way: to download files from the production cluster, and then upload to the test by hue by hue cluster
however. .
hive table is partitioned and each partition following are N multiple small files, so write shell scripts:

#! /bin/bash

mkdir -p ./tmp/table
rssc_array=("201901" "201902" "201903" "201904" "201905")

for i in ${rssc_array[*]}
do
     hdfs dfs -get /user/hive/table/partition_brand=vw/partition_date=$i  ./tmp/table/
done
zip -r twdwv1.zip ./tmp/table/
hdfs dfs -put twdwv1.zip  /user/asmp/sql/
rm -rf ./tmp/table
echo "File successfully deleted"

for i in ${rssc_array[*]}
do
     hdfs dfs -get /user/hive/table/partition_brand=skd/partition_date=$i  ./tmp/table/
done
zip -r twdskd1.zip ./tmp/table/
hdfs dfs -put twdskd1.zip  /user/asmp/sql/
rm -rf ./tmp/wd_tt_workitem_detail
rm -f twdskd1.zip
echo "File successfully deleted2"

(1) according to brand partition into two compressed files to download, and upload it to HDFS
(2) according to an array of custom download files on different dates
and then download the compressed files to a local hue, after decompression uploaded to test cluster
finally do not forget to use table MSCK command to repair the partition structure:
Hive> MSCK repair table table_name;

Published 118 original articles · won praise 25 · Views 150,000 +

Guess you like

Origin blog.csdn.net/lhxsir/article/details/90290062