Compression parameter configuration
To enable compression in Hadoop, you can configure the following parameters:
Table 4-10 Configuration parameters
parameter | Defaults | stage | Suggest |
io.compression.codecs (configured in core-site.xml) | org.apache.hadoop.io.compress.DefaultCodec, org.apache.hadoop.io.compress.GzipCodec, org.apache.hadoop.io.compress.BZip2Codec | Input compression | Hadoop uses the file extension to determine whether a certain codec is supported |
mapreduce.map.output.compress (configured in mapred-site.xml) | false | mapper output | Set this parameter to true to enable compression |
mapreduce.map.output.compress.codec (configured in mapred-site.xml) | org.apache.hadoop.io.compress.DefaultCodec | mapper output | Companies mostly use LZO or Snappy codecs to compress data at this stage |
mapreduce.output.fileoutputformat.compress (configured in mapred-site.xml) | false | reducer output | Set this parameter to true to enable compression |
mapreduce.output.fileoutputformat.compress.codec(在mapred-site.xml中配置) | org.apache.hadoop.io.compress. DefaultCodec | reducer output | Use standard tools or codecs such as gzip and bzip2 |
mapreduce.output.fileoutputformat.compress.type(在mapred-site.xml中配置) | RECORD | reducer output | Compression type used for SequenceFile output: NONE and BLOCK |