Big data technology Hadoop data compression parameter configuration

Compression parameter configuration

To enable compression in Hadoop, you can configure the following parameters:

Table 4-10 Configuration parameters

parameter Defaults stage Suggest
io.compression.codecs (configured in core-site.xml) org.apache.hadoop.io.compress.DefaultCodec, org.apache.hadoop.io.compress.GzipCodec, org.apache.hadoop.io.compress.BZip2Codec Input compression Hadoop uses the file extension to determine whether a certain codec is supported
mapreduce.map.output.compress (configured in mapred-site.xml) false mapper output Set this parameter to true to enable compression
mapreduce.map.output.compress.codec (configured in mapred-site.xml) org.apache.hadoop.io.compress.DefaultCodec mapper output Companies mostly use LZO or Snappy codecs to compress data at this stage
mapreduce.output.fileoutputformat.compress (configured in mapred-site.xml) false reducer output Set this parameter to true to enable compression
mapreduce.output.fileoutputformat.compress.codec(在mapred-site.xml中配置) org.apache.hadoop.io.compress. DefaultCodec reducer output Use standard tools or codecs such as gzip and bzip2
mapreduce.output.fileoutputformat.compress.type(在mapred-site.xml中配置) RECORD reducer output Compression type used for SequenceFile output: NONE and BLOCK

Big data training

Guess you like

Origin blog.csdn.net/msjhw_com/article/details/109283518