CDH6.3.2 Install hadoop Lzo compression online

1 Check the compression method of my hadoop support

hdfs configuration console collection: io.compression.codecs
Insert picture description here
can be seen without LzopCodec method

CDH does not support Lzo compression encoding by default. You need to download additional Parcel packages to enable Hadoop-related components such as HDFS, Hive, and Spark to support Lzo encoding.

2 The difference between LzoCodec and LzopCodec

LzoCodec和LzopCodec区别
两种压缩编码LzoCodec和LzopCodec区别:
    1. LzoCodec比LzopCodec更快, LzopCodec为了兼容LZOP程序添加了如 bytes signature, header等信息。
    2. LzoCodec作为Reduce输出,结果文件扩展名为 ”.lzo_deflate” ,无法被lzop读取;使用LzopCodec作为Reduce输出,生成扩展名为 ”.lzo” 的文件,可被lzop读取。
    3. LzoCodec结果(.lzo_deflate文件) 不能由 lzo index job 的 "DistributedLzoIndexer" 创建index4..lzo_deflate” 文件不能作为MapReduce输入。而这些 “.LZO” 文件都支持。
        综上所述,map输出的中间结果使用LzoCodec,reduce输出使用 LzopCodec。
 另外:org.apache.hadoop.io.compress.LzoCodec和com.hadoop.compression.lzo.LzoCodec功能一样,都是源码包中带的,生成的都是 lzo_deflate 文件。

3 Install Lzo in Parcel online

3.1 Download link: modify 6.xx to the corresponding version

CDH6: https://archive.cloudera.com/gplextras6/6.xx/parcels/My
version is CDH6.3.1 so my download address is
https://archive.cloudera.com/gplextras6/6.3.1/parcels /
In the Parcel configuration of CDH, "Remote Parcel repository URL", click the "+" sign to add the address bar. It
Insert picture description here
may take a while to return to the Parcel list and you will see GPLEXTRAS because foreign websites may have a delay in our network connection
Insert picture description here

3.2 Download

Click: Download
Insert picture description here
Assignment
Insert picture description here
Activation
Insert picture description here
Activation Successful
Insert picture description here

3.3 Add a compression codec to HDFS

hdfs configuration console: io.compression.codecs
Insert picture description here
click "+" to add:

com.hadoop.compression.lzo.LzoCodec
com.hadoop.compression.lzo.LzopCodec

Insert picture description here

3.4 YARN configuration automatically loads packages under GPLEXTRAS

① Find the GPLEXTRAS directory you just installed

/opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib
Insert picture description here

②配yarn

收索:mapreduce.application.classpath
Insert picture description here

② Add /opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/*

Insert picture description here

③ Update the configuration and restart the service

Insert picture description here
Then complete

Guess you like

Origin blog.csdn.net/qq_32727095/article/details/113740035