Big Data BigData of the hive how to be able to read data directly .gz file amazon s3 in it?

Before reading this, make sure the following things

  • Installed jdk
  • Installed hadoop (remember Ready need to use the jar package)
  • Installed hive (remember hives database, ready jar package need to use)
  • Hadoop file configured in connection amazon s3
  • Configured hive file is connected amazon s3
  • Configure the connection between the hive and hadoop

hive how to be able to read data directly .gz file amazon s3 in it?

We break down this question, divided into three parts:

  1. hive can not read a file in the data amazon s3?
  2. hive can not read .gz file data?
  3. hive can not be read in the amazon s3 in .gz file data?


1. hive can not read a file in the data amazon s3?

1.1 Preparation of data

Ready ahead of data files a.txt, on s3a: under // bucket / test / directory.

$ s3cmd put a.txt s3://bucket/test/

a.txt reads as follows:

Edward,Lear,Baker Street 202
Stephen,Hawking,Baker Street 203
John,Dalton,Baker Street 204
Charles,Darwin,Baker Street 205
Sherlock,Holmes,Baker Street 221B

1.2 Create an external table

Into the hive cli, and create an external table, navigate to s3a: // bucket / s3aTestBaker / directory

CREATE EXTERNAL TABLE test
(FirstName STRING, LastName STRING, StreetAddress STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION 's3a://bucket/test/';

Table will automatically read s3a: // bucketms / s3aTestBaker / file directory (this directory must all files are readable file, or the next query error: Unable to parse file types)

1.3 query data

The real-time s3a: // file data under bucketms / s3aTestBaker / directory select it by creating a good external table

hive>select * from test;
Edward    Lear     Baker Street 202
Stephen   Hawking  Baker Street 203
John      Dalton   Baker Street 204
Charles   Darwin   Baker Street 205
Sherlock  Holmes   Baker Street 221B

2. hive can not read .gz file data?

Local attempts to read
the compressed files into .gz a.txt above document, the local operation

$ gzip a.txt
$ ls
a.txt    a.gz		//已经看到压缩成功了

At this time, then create a local table

CREATE TABLE test_local
(FirstName STRING, LastName STRING, StreetAddress STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION '/home/user/a.gz';

3. hive can not be read in the amazon s3 in .gz file data?

Let's try

Published 204 original articles · won praise 59 · Views 140,000 +

Guess you like

Origin blog.csdn.net/baidu_34122324/article/details/85166733