CDH6.3.2 Install hadoop Lzo compression online
- 1 Check the compression method of my hadoop support
- 2 The difference between LzoCodec and LzopCodec
- 3 Install Lzo in Parcel online
1 Check the compression method of my hadoop support
hdfs configuration console collection: io.compression.codecs
can be seen without LzopCodec method
CDH does not support Lzo compression encoding by default. You need to download additional Parcel packages to enable Hadoop-related components such as HDFS, Hive, and Spark to support Lzo encoding.
2 The difference between LzoCodec and LzopCodec
LzoCodec和LzopCodec区别
两种压缩编码LzoCodec和LzopCodec区别:
1. LzoCodec比LzopCodec更快, LzopCodec为了兼容LZOP程序添加了如 bytes signature, header等信息。
2. LzoCodec作为Reduce输出,结果文件扩展名为 ”.lzo_deflate” ,无法被lzop读取;使用LzopCodec作为Reduce输出,生成扩展名为 ”.lzo” 的文件,可被lzop读取。
3. LzoCodec结果(.lzo_deflate文件) 不能由 lzo index job 的 "DistributedLzoIndexer" 创建index。
4. “.lzo_deflate” 文件不能作为MapReduce输入。而这些 “.LZO” 文件都支持。
综上所述,map输出的中间结果使用LzoCodec,reduce输出使用 LzopCodec。
另外:org.apache.hadoop.io.compress.LzoCodec和com.hadoop.compression.lzo.LzoCodec功能一样,都是源码包中带的,生成的都是 lzo_deflate 文件。
3 Install Lzo in Parcel online
3.1 Download link: modify 6.xx to the corresponding version
CDH6: https://archive.cloudera.com/gplextras6/6.xx/parcels/My
version is CDH6.3.1 so my download address is
https://archive.cloudera.com/gplextras6/6.3.1/parcels /
In the Parcel configuration of CDH, "Remote Parcel repository URL", click the "+" sign to add the address bar. It
may take a while to return to the Parcel list and you will see GPLEXTRAS because foreign websites may have a delay in our network connection
3.2 Download
Click: Download
Assignment
Activation
Activation Successful
3.3 Add a compression codec to HDFS
hdfs configuration console: io.compression.codecs
click "+" to add:
com.hadoop.compression.lzo.LzoCodec
com.hadoop.compression.lzo.LzopCodec
3.4 YARN configuration automatically loads packages under GPLEXTRAS
① Find the GPLEXTRAS directory you just installed
/opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib
②配yarn
收索:mapreduce.application.classpath
② Add /opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/*
③ Update the configuration and restart the service
Then complete