hive save query results

reference:

https://blog.csdn.net/zhuce1986/article/details/39586189

 

First, save the results to a local

Method 1: Call the hive to standard output, query results written to the specified file

 

The most common method, also frequently use. Sql query result will be saved directly to the /tmp/out.txt

$ hive -e "select user, login_timestamp from user_login" > /tmp/out.txt

 

Of course, we can also be saved to a file file.sql query, the query is executed in the following manner, and save the results

$ hive -f test.sql > /tmp/out.txt

cat test.sql

select * from user_login

 

 

Method 2: Use INSERT OVERWRITE LOCAL DIRECTORY results to a local

hive> insert overwrite local directory "/tmp/out/"                                        

    > select user, login_time from user_login;

The above command will select user, save the query results login_time from user_login to the next / tmp / out / local directory

$ find /tmp/out/ -type f

/tmp/out/.000000_0.crc

/tmp/out/000000_0

The contents of both files are stored is not the same, the results of which 000000_0 stored query with a suffix crc crc32 checksum stored that file

With vim open look under 000000_0 of:

vim /tmp/out/000000_0

 1 user_1^A20140701

 2 user_2^A20140701

 3 user_2^A20140701

It can be seen between the derived query result is between delimiter field, row by row ^ A (Ctrl + A) as \ n as the division

The default field separator can be inconvenient sometimes, but fortunately Hive provides a method for modifying a segment symbol, as long as we will be able to specify at the time of export:

hive> insert overwrite local directory "/tmp/out/"

    > row format delimited fields terminated by "\t" 

    > select user, login_time from user_login;

You can see the field separator has become a tab (the human eye looks more comfortable ^ - ^).

 

Second, save the results to hdfs

Hdfs save query results to a very simple to use INSERT OVERWRITE DIRECTORY can complete the operation:

hive> insert overwrite directory "/tmp/out/"

    > row format delimited fields terminated by "\t" 

    > select user, login_time from user_login;

LOCAL command not need to specify items to be noted that, with the difference to save the local file system is to save time hdfs

 

Third, save the results to a table HIVE

1, the results table has been built using INSERT OVERWRITE TABLE written to cover the results of Table

If the result table has been built, you can use INSERT OVERWRITE TABLE writes the results to the results table:

hive> create table query_result 

    > as

    > select user, login_time from user_login;

 

hive> select * from query_result;            

OK

user_120140701

user_220140701

user_320140701

 

The use of direct export table hdfs

Hive is built on hdfs, so we can use the command hadoop dfs -get hdfs direct export table.

First, we first find the table you want to export to the directory in which to store:

hive> show create table user_login;

OK

CREATE  TABLE `user_login`(

  `user` string, 

  `login_time` bigint)

ROW FORMAT SERDE 

  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 

STORED AS INPUTFORMAT 

  'org.apache.hadoop.mapred.TextInputFormat' 

OUTPUTFORMAT 

  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

<span style="color:#ff0000;">LOCATION

  'file:/user/hive/warehouse/test.db/user_login'</span>

TBLPROPERTIES (

  'totalSize'='160', 

  'numRows'='10', 

  'rawDataSize'='150', 

  'COLUMN_STATS_ACCURATE'='true', 

  'numFiles'='1', 

  'transient_lastDdlTime'='1411544983')

Time taken: 0.174 seconds, Fetched: 18 row(s)

可以看到,user_login表存放到在file:/user/hive/warehouse/test.db/user_login

接下来,直接利用hadoop dfs -get导出到本地:

hadoop dfs -get file:/user/hive/warehouse/test.db/user_login  /tmp/out/


第一种,在bash中直接通过hive -e命令,并用 > 输出流把执行结果输出到制定文件hive -e "select * from student where sex = '男'" > /tmp/output.txt 
第二种,在bash中直接通过hive -f命令,执行文件中一条或者多条sql语句。并用 > 输出流把执行结果输出到制定文件 
hive -f exer.sql  > /tmp/output.txt
文件内容select * from student where sex = '男';select count(*) from student; 
第三种,在hive中输入hive-sql语句,通过使用INSERT OVERWRITE LOCAL DIRECTORY结果到本地系统和HDFS文件系统语法一致,只是路径不同
insert overwrite local directory "/tmp/out" > select cno,avg(grade) from sc group by(cno); 
insert overwrite directory 'hdfs://server71:9000/user/hive/warehouse/mystudent'select * from student1; 
以上是三种,包含了3执行hive-sql的方法。结果保存到本地的方法前两种都属于linxu BASH自带的方法。第三种才是HIVE本身的导出数据的方法。 
第四种,就是基本的SQL语法,从一个表格中抽取数据,直接插入另外一个表格。参考SQL语法即可。insert overwrite table student3 select sno,sname,sex,sage,sdept from student3 where year='1996'; http://blog.csdn.net/zhuce1986/article/details/39586189

一、保存结果到本地方法1:调用hive标准输出,将查询结果写到指定的文件中
这个方法最为常见,笔者也经常使用。sql的查询结果将直接保存到/tmp/out.txt中$ hive -e "select user, login_timestamp from user_login" > /tmp/out.txt
当然我们也可以查询保存到某个文件file.sql中,按下面的方式执行查询,并保存结果$ hive -f test.sql > /tmp/out.txtcat test.sqlselect * from user_login

方法2:使用INSERT OVERWRITE LOCAL DIRECTORY结果到本地hive> insert overwrite local directory "/tmp/out/"                                            > select user, login_time from user_login;上面的命令会将select user, login_time from user_login的查询结果保存到/tmp/out/本地目录下$ find /tmp/out/ -type f/tmp/out/.000000_0.crc/tmp/out/000000_0这两个文件存放的内容不一样,其中000000_0存放查询的结果,带有crc后缀的存放那个文件的crc32校验用vim打开查看下000000_0的内容:vim /tmp/out/000000_0 1 user_1^A20140701 2 user_2^A20140701 3 user_2^A20140701可以看到,导出的查询结果字段之间是用^A(Ctrl+A)作为分割符,行与行之间用\n作为分割默认的字段分割符有时候可能不太方便,幸好Hive提供了修改分割符号的方法,我们只要在导出时指定就可以了:hive> insert overwrite local directory "/tmp/out/"    > row format delimited fields terminated by "\t"     > select user, login_time from user_login;可以看到字段分割符已经变成了tab(人眼看起来更舒服^-^)。
二、保存结果到hdfs保存查询结果到hdfs很简单,使用INSERT OVERWRITE DIRECTORY就可以完成操作:hive> insert overwrite directory "/tmp/out/"    > row format delimited fields terminated by "\t"     > select user, login_time from user_login;需要注意的是,跟保存到本地文件系统的差别是,保存到hdfs时命令不需要指定LOCAL项
三、保存结果到HIVE表方法1、已经建好结果表,使用INSERT OVERWRITE TABLE以覆盖方式写入结果表如果结果表已经建好,可以使用INSERT OVERWRITE TABLE将结果写入结果表:hive> create table query_result     > as    > select user, login_time from user_login;
hive> select * from query_result;            OKuser_120140701user_220140701user_320140701
四、使用hdfs直接导出表Hive是构建在hdfs上的,因此,我们可以使用hdfs的命令hadoop dfs -get直接导出表。首先、我们先找到要导出的表存放到哪个目录下:hive> show create table user_login;OKCREATE  TABLE `user_login`(  `user` string,   `login_time` bigint)ROW FORMAT SERDE   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' STORED AS INPUTFORMAT   'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'<span style="color:#ff0000;">LOCATION  'file:/user/hive/warehouse/test.db/user_login'</span>TBLPROPERTIES (  'totalSize'='160',   'numRows'='10',   'rawDataSize'='150',   'COLUMN_STATS_ACCURATE'='true',   'numFiles'='1',   'transient_lastDdlTime'='1411544983')Time taken: 0.174 seconds, Fetched: 18 row(s)可以看到,user_login表存放到在file:/user/hive/warehouse/test.db/user_login接下来,直接利用hadoop dfs -get导出到本地:hadoop dfs -get file:/user/hive/warehouse/test.db/user_login  /tmp/out/

 

Guess you like

Origin www.cnblogs.com/hongfeng2019/p/11460714.html