lzo + hive1.x test

(1) Environment:

hadoop 2.8.1
hive 1.2.2

 

   core-site.xml configuration item

<property>
  <name>io.compression.codecs</name>
  <value>org.apache.hadoop.io.compress.GzipCodec,
           org.apache.hadoop.io.compress.DefaultCodec,
           org.apache.hadoop.io.compress.BZip2Codec,
           com.hadoop.compression.lzo.LzopCodec,
           com.hadoop.compression.lzo.LzoCodec
  </value>
</property>    

<!-- lzop -->
<property>
   <name>io.compression.codec.lzo.class</name>
   <value>com.hadoop.compression.lzo.LzopCodec</value>
</property>

  

    mapred-site.xml configuration items

<!--Set the map intermediate result to use lzop compression-->
<property>
    <name>mapreduce.map.output.compress</name>
    <value>true</value>
</property>

<property>
   <name>mapreduce.map.output.compress.codec</name>
   <value>com.hadoop.compression.lzo.LzopCodec</value>
</property>

<!--Set the whole process of map/reduce to use lzop compression-->
<property>
    <name>mapreduce.output.fileoutputformat.compress</name>
    <value>true</value>
</property>

<!-- lzop -->
<property>
   <name>mapreduce.output.fileoutputformat.compress.codec</name>
   <value>com.hadoop.compression.lzo.LzopCodec</value>
</property>

 

 

(two)

1. hive build table sql

CREATE TABLE `lzo5`(
  `uuid` string)
STORED AS
INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

 

 

2. Create a uuid.txt file and put 1 line of data

uuid1

 

 

3.lzop creates the lzo file

lzop uuid.txt

 

 

4.hive load data

load data inpath "/home/hadoop/uuid.txt.lzo" into table lzo5;

 

 

5.hive query, check the result is 1 (correct)

select count(1) from lzo5;

 

 

6. Create an lzo index for the lzo file under the lzo5 path of the hive table

$HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/share/hadoop/common/hadoop-lzo-0.mmon/hadoop-lzo-0.4.21-SNAPSHOT.jar com.hadoop.compression.lzo.DistributeddLzoIndexer  hdfs://hd1:9000/user/hive/warehouse/lzo5

 

 

7. Check the index generation

hdfs dfs -ls hdfs://hd1:9000/user/hive/warehouse/lzo5

 

 

 

8. Query sql again and see that the result is 1 (correct)

select count(1) from lzo5;

 

 

(3) How to know whether the lzo index is effective?

Create an lzo file, slightly larger than the block size of hdfs, test it in two scenarios without index and with index, see the number of maps

  • The number of maps without index is 1, because lzo has no index and cannot be split,
  • The number of indexed maps is lzo file size / block size, because lzo + index supports split

 (4) Comparison results:

The block size is 128M, and the generated lzo file is 370M

Execution times for unindexed and indexed are as follows, with indexed queries slightly faster:

 

 

 No index, 1 map

 

With index, the number of maps is 3 (after index, split is supported)


 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326189058&siteId=291194637