hive common functions: Hive way of importing and exporting data

As the Hive data warehouse, stores vast amounts of data users. Hive in normal use, it is inevitable to encounter import external data into the data Hive Hive or the lead out. Today mainly on the Hive way to learn about several types of data import and export.

A, Hive data import mode

This introduces the four categories:

Import data from a local file system to Hive table;

To import data from the HDFS Hive table;

Query the corresponding data from the table and into the other Hive table;

When you create a table by querying the corresponding records from the other table and inserted into the table that is created.

1, from the local file system introduced into the data table Hive

The basic syntax:

load data local inpath local file path name into table Hive table;

Create a good table (demo is online) inside the Hive, as follows:

hive> create table wyp

(id int, name string,

age int, tel string)

ROW FORMAT DELIMITED

FIELDS TERMINATED BY '\t'

STORED AS TEXTFILE;

OK

Time taken: 2.832 seconds

This table is very simple, only four fields, the specific meaning I will not explain. Local file system which has a /home/wyp/wyp.txt file, as follows:

[Wyp master @ ~] $ cat wyp.txt

1 wyp 25 13188888888888

2 test 30 13888888888888

3 34 899 314 121 zs

Wyp.txt between columns of data files is to use \ t divided, it can be introduced into the data inside the file table inside wyp by the following statement, as follows:

hive> load data local inpath 'wyp.txt' into table wyp;

Copying data from file:/home/wyp/wyp.txt

Copying file: file:/home/wyp/wyp.txt

Loading data to table default.wyp

Table default.wyp stats:

[num_partitions: 0, num_files: 1, num_rows: 0, total_size: 67]

OK

Time taken: 5.967 seconds

Thus the contents of which will be introduced into wyp.txt wyp go inside the table, the data directory can wyp Table View, the following command:

hive> dfs -ls /user/hive/warehouse/wyp ;

Found 1 items

-rw-r - r - 3 wyp Supergroup 67 2014-02-19 18:23 /hive/warehouse/wyp/wyp.txt

have to be aware of is:

And we are familiar with relational database is not the same, Hive does not currently support a given set of records directly in the form of text inside an insert statement, that is to say, Hive does not support INSERT INTO .... VALUES form of the statement.

2, the import data into HDFS table Hive

The basic syntax:

load data inpath HDFS file path name into table Hive table;

Data will be introduced from the local file system to process Hive table, in fact, the data is first copied to a temporary directory HDFS (the typical case is copied to the user's uploaded HDFS Home directory, such as / home / wyp /) and then move the data (note here that is moving, not a copy!) from the temporary directory to the corresponding Hive table data directory. That being the case, then surely support the Hive data directly from a directory to a corresponding movement of HDFS Hive data directory table, consider the following document /home/wyp/add.txt, the specific operation is as follows:

[Wyp @ master /home/q/hadoop-2.2.0]$ bin / hadoop fs -cat /home/wyp/add.txt

5 23 131 212 121 212 wyp1

6 24 134 535 353 535 wyp2

7 wyp3 25 132453535353

8 26 154 243 434 355 wyp4

The above is the need to insert the content data, the file is located on HDFS / home / wyp directory (and a different mentioned, a document mentioned is located on the local file system) which, we can use the following command to the file content into the inside of the Hive table, as follows:

hive> load data inpath '/home/wyp/add.txt' into table wyp;

Loading data to table default.wyp

Table default.wyp stats:

[num_partitions: 0, num_files: 2, num_rows: 0, total_size: 215]

OK

Time taken: 0.47 seconds
hive> select * from wyp;

OK

wyp1    23      131212121212

wyp2    24      134535353535

wyp3    25      132453535353

wyp4    26      154243434355

wyp     25      13188888888888

test    30      13888888888888

zs      34      899314121

Time taken: 0.096 seconds, Fetched: 7 row(s)

We can see from the results above, the data indeed imported into wyp the table! Note that the load data inpath '/home/wyp/add.txt' into table wyp; there is no local this word, and this is a difference in the.

3, the query from the other table and the corresponding data into the table Hive

The basic syntax:

insert into table 目标表名称

   [partition (分区字段=值)]

   select 一组字段

   from 源表名称;

Suppose there Hive test table which build the table shown in the following statements:

hive> create table test(

id int, name string

,tel string)

partitioned by

(age int)

ROW FORMAT DELIMITED

FIELDS TERMINATED BY '\t'

STORED AS TEXTFILE;

OK

Time taken: 0.261 seconds

Similar general and wyp table built table statement, but with age as the test table inside the partition field. For partitions, where do explain:

Partition: In Hive, each partition corresponding correspondence table directory table, the data of all partitions are stored in the corresponding directory. For example wyp dt and city table has two partitions, a corresponding dt = 20131218, city = BJ correspondence table directory is / user / hive / warehouse / dt = 20131218 / city = BJ, all belonging to this data are stored in the partition directory.

The following statement is the query results wyp table and inserted into the test table:

hive> insert into table test

partition (age='25')

select id, name, tel

from wyp;

#########################################

这里输出了一堆Mapreduce任务信息,这里省略

#########################################

Total MapReduce CPU Time Spent: 1 seconds 310 msec

OK

Time taken: 19.125 seconds

hive> select * from test;

OK

5       wyp1    131212121212    25

6       wyp2    134535353535    25

7       wyp3    132453535353    25

8       wyp4    154243434355    25

1       wyp     13188888888888  25

2       test    13888888888888  25

3       zs      899314121       25

Time taken: 0.126 seconds, Fetched: 7 row(s)

Here to do some explanation: We know our traditional form of data blocks insert into table values ​​(Field 1, Field 2), this form hive is not supported.

4, when creating the table by querying the corresponding record from the table and inserted into another table created

The basic syntax:

create table new table name

  as

  select 一组字段,逗号分隔

  from 源表名称;

In practice, the output table may be too much, is not suitable for display on the console, this time, the query output is a direct result of a new Hive table is very convenient, we call this condition the CTAS ( create table .. as select) as follows:

hive> create table test4

> as

> select id, name, tel

> from wyp;

hive> select * from test4;

OK

5 131 212 121 212 wyp1

6 134 535 353 535 wyp2

7 wyp3 132453535353

8 154243434355 wyp4

1 wyp 13188888888888

2 test 13888888888888

3 899 314 121 zs

Time taken: 0.089 seconds, Fetched: 7 row(s)

Test4 data is inserted into the table to go, CTAS operation is atomic, so if you select the query fails for some reason, the new table is not created!

Two, Hive data export mode

Can export the place is not the same, these divided into three types:

Export to local file system;

Export to the HDFS;

Export to another table in the Hive.

1, exported to the local file system

The basic syntax:

insert overwrite local directory 'local file path'

select 字段 from Hive表名称;

The above data table export a local file system:

hive> insert overwrite local directory '/home/wyp/wyp'

  >select * from wyp;

The implementation of this HQL need to enable Mapreduce completed, runs out after this statement, it will generate a file in the local file system / home / wyp / wyp directory, this file is the result of Reduce generated.

2, export to the HDFS

The basic syntax:

insert overwrite directory 'HDFS file path'

select * from Hive table name;

The above data export table to HDFS:

hive> insert overwrite directory '/home/wyp/hdfs'

select * from wyp;

It will be guided by the data stored in HDFS under the / home / wyp / hdfs directory. Note that HQL and export the file to the local file system of a small local, data storage path is not the same.

3, exported to another table in the Hive

Above query from another table and the corresponding data into consistent Hive table.

Guess you like

Origin blog.51cto.com/13000661/2437828