6 Ways hive Data Import


1.hadoop fs -put way

Syntax: hadoop FS -put data file name HDFS directory forms of violence imported. [Not recommended]

hadoop fs -put students.txt /home/hadoopUser/apps/hive/warehouse/myhive1029.db/stu1029/
Here Insert Picture Description

2.load way

You can view the official documentation of the hive

LOAD DATA [LOCAL] INPATH ‘filepath’ [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 …)]

Description:
the LOAD operation is a simple copy or move, to move the location of the data file corresponding to the table hive.
[Local] Find a local linux filepath location.
[Overwrite] to clear all the files in the directory table hive, and then added to it filepath file.

  1. Importing data from a local directory to hive table, you need to add the keyword [local]
    from a local directory to copy files to the hive table directory
    "/ home / hadoopUser / filename" into table table load data local inpath;

  2. Import data from HDFS directory table to hive, without adding [key] local
    directory file cut from the hive to HDFS directory table
    load data inpath "/ HDFS directory / filename" into table name;

3.insert ... values ​​the way

The grammar and syntax mysq insert the same data. Generally not used, probably at the time of the test will be used in this way.
Usually the actual work data are stored in the file, the batch is introduced into the load by way of the table.
If you want to use this method, you must ensure hadoop cluster has started, because the syntax is the sql statement into yarn mapreduce program to be executed.

4.insert ... select the way

If I have two tables, one table is t1 (empty table), is mystu a table (table with data), I want mystu table screened gender woman data into t1 table.
Syntax:
INSERT INTO T1 the SELECT * from the Table mystu the WHERE Gender = "female";

  1. 创建t1表:
    create table t1(id int,name string,gender string,age int) row format delimited fields terminated by “,” lines terminated by “\n”;

  2. The mystu sex selection table for the woman in the data out into t1:
    INSERT INTO t1 the Table the SELECT the above mentioned id, name, Gender, Gender Age from mystu the WHERE = "female";
    Here Insert Picture Description
    Here Insert Picture Description
    t1 table is the inner table is created, files in the directory t1 screening is select gender as female and then insert into t1 data files.
    Here Insert Picture Description

5.CTAS的方式(比第4种方式更加简便)

在创建t2表的同时就把mystu筛选的数据导入。
select查询筛选出来的是一张虚表数据,通过下面的语法,将虚表变成一张实表t2。
语法:
create [external] table 表(可加字段)as select…

创建t2表,同时把mystu筛选的数据导入t2。
create table t2 as select * from mystu where gender = “女”;
Here Insert Picture Description
Here Insert Picture Description

6.create views视图的方式(比第5种方式更加简便)

创建视图的时候,as后面sql的语句可以延迟执行。
语法:
create view 表名 as 子查询语句

创建t3_id
create view t3_id as select id,name from mystu;

创建t3_dp
create view t3_dp as select id,department from mystu;

Creating these two views, select sub-query is not executed, but the show tables; to be able to see the view.
select * from view there is no result, there will not be under the hive warehouse catalog view information. (Mystu because I did not create an external table)
Here Insert Picture Description
Here Insert Picture Description
when work needs multiple sub-queries, create view CTAS better than that, because when you create a view, after the sub-query is not executed.
And in doing for some time, such as join query view, in which case the sub-query will perform together, and then through the hive optimization more efficient.
Examples:
SELECT * t3_id, t3_dp t3_id from the Join t3_dp * = ON t3_id.id t3_dp.id;..
Here Insert Picture Description
After the execution results have select * from view.

create views and CTAS difference

  1. create views when the sub-query is not executed.
    But CTAS is necessary to perform sub-query sql statement.
  2. When work demands require multiple sql statements to perform out, create views can save up to multiple sql statement is not executed, and finally perform together again by hive optimization and more efficient.
    The CTAS is every sql statement will be executed first.
Published 56 original articles · won praise 34 · views 3664

Guess you like

Origin blog.csdn.net/MicoOu/article/details/103390249