cdh5.13 配置LZO压缩模式

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/haoxiaoyan/article/details/83343471
  1. 下载安装包

wget http://archive.cloudera.com/gplextras5/parcels/5.13.3/GPLEXTRAS-5.13.3-1.cdh5.13.3.p0.2-el7.parcel

wget http://archive.cloudera.com/gplextras5/parcels/5.13.3/GPLEXTRAS-5.13.3-1.cdh5.13.3.p0.2-el7.parcel.sha1

wget http://archive.cloudera.com/gplextras5/parcels/5.13.3/manifest.json

mv GPLEXTRAS-5.13.3-1.cdh5.13.3.p0.2-el7.parcel.sha1 GPLEXTRAS-5.13.3-1.cdh5.13.3.p0.2-el7.parcel.sha

点击分配按钮

之后进入激活状态中

至此,lzo服务安装工作已经完成。                                                                                                                                                                                 

修改配置集群的lzo服务                                                                                    

修改HDFS配置                              

在io.compression.codecs属性值中追加如下值:                                       

com.hadoop.compression.lzo.LzoCodec                                                

com.hadoop.compression.lzo.LzopCodec   

修改YARN配置

将mapreduce.application.classpath的属性值增加一项:

/opt /cloudera/parcels/HADOOP_LZO/lib/hadoop/lib/*

 

如下图所示:

 

 

修改MR应用程序环境

修改mapreduce.admin.user.env的属性值,增加一项:

 

/opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/native,如下图所示:

重启集群生效。

flume采集日志的机器也要安装lzo包才可以以压缩的结果输入到hdfs

wget http://mirror.centos.org/centos/7/os/x86_64/Packages/lzo-devel-2.06-8.el7.x86_64.rpm

yum install -y lzo-devel

测试结果如下大约节省2/3的空间:

压缩前的日志大小:

压缩后的数据

 

猜你喜欢

转载自blog.csdn.net/haoxiaoyan/article/details/83343471