671-6.2.0- how to migrate to the Hive metadata CDH5.12 CDH6.2

1

Documents written in Objective

Here we assume a scenario, you need to migrate CDH5.12 to CDH6.2, CDH5.12 and CDH6.2 are two distinct clusters, our work is mainly HDFS data and various metadata to migrate from CDH5.12 CDH6.2, HDFS article does not discuss the migration of data migration does not discuss other metadata such as CM or Sentry, and only concerned with migration Hive metadata. The problem here is the Hive CDH5.12 is 1.1, while CDH6.2 Hive is already in 2.1.1, the update results in a large version of Hive stored in MySQL 's schema structure is completely changed, so we will CDH5.12 after importing the data into CDH6.2 the MySQL MySQL, need to update metadata schema Hive. First Fayson will build two clusters include CDH5.12 and CDH6.2, for real, Hive is our next simulation, created contains partitions, view and UDF, good easy to verify that the migration can CDH6.2 normal operation. Specifically, how the migration Fayson will be described in detail in the next article.

  • test environment

1.Redhat7.4

2.CDH6.2.0

3.MySQL administrator account

2

Migration Preparation

1. Prepare two clusters, one CDH5.12.0, one is CDH6.2.0.

2. Prepare the same test data in two clusters

3. Create a good test Hive table used in the cluster CDH5.12.0

Create two databases

Create two tables ready to load test data

load data into two tables

Creating a partition table and load test data

Create a test using two view

Table web_returns view of a first row of data forming

view packet statistics partition table form

Adding a UDF as a test

4. CDH5.12.0 derived metadata cluster Hive

3

Migration Steps

1. The metadata in the previous step into CDH6.2.0 in MySQL

2. Perform a metadata upgrade Hive cluster in CDH6.2.0

Hive version View CDH5.12.0 corresponding steps required to upgrade CDH6.2.0 corresponding version of Hive

The implementation of the above-mentioned five SQL order

Hive metadata upgrade is complete

3. Perform Hive metadata updates

4. Update Hive service error, can not find OWNER_TYPE field TBLS table

The Hive yuan into the MySQL database, execute the following statement after the restart Hive service, service uptime

alter table TBLS add column OWNER_TYPE varchar(10) CHARACTER SET latin1 COLLATE latin1_bin DEFAULT NULL after OWNER;
update TBLS set OWNER_TYPE = 'USER';

Migrating data in Table 5. The analog of HDFS Hive

Create a directory two databases

Create a directory table test

All the data table corresponding to the next upload directory HDFS

6. The same query in the C5 and C6 in to see if the results are consistent

View table web_returns and web_sales table:

select count(*) from web_returns;

CDH5.12.0

CDH6.2.0

select count(*) from web_sales;

CDH5.12.0

CDH6.2.0

select * from web_sales limit 1;
select * from web_returns limit 1;

CDH5.12.0

CDH6.2.0

View the partition table test_partition

select count(*) from test_partition where year = 2019;

CDH5.12.0

CDH6.2.0

View view

select * from group_by_year_vw;

CDH5.12.0

CDH6.2.0

select * from records;

CDH5.12.0

CDH6.2.0

View UDF

select parse_date(dates,'yyyy-MM-dd HH:mm:ss') from test_udf;

CDH5.12.0

CDH6.2.0

In the above-described operation, the same operation consistent with the results obtained in two clusters.

Document Reference:

https://www.cloudera.com/documentation/enterprise/latest/topics/cdh_ig_hive_schema_tool.html

4

to sum up

1. command schematool -dbType mysql -upgradeSchemaFrom <version> -dryRun -passWord <db_user_pswd> -userName <db_user_name>, keep up with the upgrade to version after upgradeSchemaFrom parameters, is not here to fill CDH version, but a version of Hive as CDH6.2.0 corresponding Hive version 2.1.1, this command lists the target version upgrade to sql statements need to be performed manually to be able to perform one can see the entire upgrade process.

2. After you have updated and upgraded Hive metadata, Hive service error occurred, posted in a document can be seen in the log table is TBLS missing a field, which may be due to the upgrade process structure of the table is not successful upgrade caused, adding to the lack of OWN_TYPE in TBLS table.

3. After the successful migration Hive metadata to CDH6.2, we know Hive metadata stored in a table such as location information corresponding fact or CDH5.12 in HDFS path, although this will cause you to migrate successful Hive yuan data, but still can not access data in HDFS CDH6.2 the Hive table, you need to follow the steps described earlier in this article by Cloudera Manager upgrade Hive metadata, this step is enabled HDFS HA ​​cluster in the same CDH with us or disable HA, Hive metadata updates as truth.

4. The present document metadata Hive migrate only the analog data to migrate HDFS same directory, such migration patterns, data tables and data directories are migrated after and before, after verification can see, the table no data deletion.

Published 86 original articles · won praise 267 · Views 1.77 million +

Guess you like

Origin blog.csdn.net/javastart/article/details/104521460