Load hdfs data into Hive table

confirm the target location is empty:

[cloudera@quickstart ~]$ hadoop fs -ls /user/hive/warehouse

[cloudera@quickstart ~]$

put source data file into hdfs location:

[cloudera@quickstart ~]$ hadoop fs -ls /test

[cloudera@quickstart ~]$ hadoop fs -put T1.csv /test

[cloudera@quickstart ~]$ hadoop fs -put T1.csv /test/T2.csv

[cloudera@quickstart ~]$ hadoop fs -ls /test

Found 2 items

-rw-r--r-- 1 cloudera supergroup 8 2020-03-26 09:31 /test/T1.csv

-rw-r--r-- 1 cloudera supergroup 8 2020-03-26 09:31 /test/T2.csv

[cloudera@quickstart ~]$

enter hive database:

[cloudera@quickstart ~]$ hive

Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties

WARNING: Hive CLI is deprecated and migration to Beeline is recommended.

hive> show tables;

Time taken: 0.318 seconds

create table T1 and T2:

hive> create table T1(a int,b int);

Time taken: 0.253 seconds

hive> create table T2(a int,b int) row format delimited fields terminated by ',' stored as textfile;

Time taken: 0.194 seconds

load data into T1, check data loaded to T1 is NULL:

hive> load data inpath '/test/T1.csv' into table T1;

Loading data to table default.t1

Table default.t1 stats: [numFiles=1, totalSize=8]

Time taken: 0.632 seconds

hive> select * from T1;

NULL NULL

Time taken: 0.395 seconds, Fetched: 2 row(s)

load data to T2, works fine:

hive> load data inpath '/test/T2.csv' into table T2;

Loading data to table default.t2

Table default.t2 stats: [numFiles=1, totalSize=8]

Time taken: 0.259 seconds

hive> select * from T2;

1 2

3 4

Time taken: 0.057 seconds, Fetched: 2 row(s)

hive> exit;

WARN: The method class org.apache.commons.logging.impl.SLF4JLogFactory#release() was invoked.

WARN: Please see http://www.slf4j.org/codes.html#release for an explanation.

[cloudera@quickstart ~]$

After data loading, the source data is moved to the target location, with the same file name:

[cloudera@quickstart ~]$ hadoop fs -ls /test

[cloudera@quickstart ~]$ hadoop fs -ls -R /user/hive/warehouse/

drwxrwxrwx - cloudera supergroup 0 2020-03-26 09:35 /user/hive/warehouse/t1

-rwxrwxrwx 1 cloudera supergroup 8 2020-03-26 09:31 /user/hive/warehouse/t1/T1.csv

drwxrwxrwx - cloudera supergroup 0 2020-03-26 09:35 /user/hive/warehouse/t2

-rwxrwxrwx 1 cloudera supergroup 8 2020-03-26 09:31 /user/hive/warehouse/t2/T2.csv

[cloudera@quickstart ~]$ hadoop fs -cat /user/hive/warehouse/t1/T1.csv

1,2

3,4

[cloudera@quickstart ~]$ hadoop fs -cat /user/hive/warehouse/t2/T2.csv

1,2

3,4

[cloudera@quickstart ~]$

Load hdfs data into Hive table

猜你喜欢