Big Data Technology-Compression Encoding Supported by MR

Compression encoding supported by MR

Table 4-7

Compression format Hadoop comes with it? algorithm File extension Whether it can be divided After changing to compressed format, whether the original program needs to be modified
DEFLATE Yes, use it directly DEFLATE .deflate no Like text processing, no modification is required
Gzip Yes, use it directly DEFLATE .gz no Like text processing, no modification is required
bzip2 Yes, use it directly bzip2 .bz2 Yes Like text processing, no modification is required
VOC No, need to install VOC .lzo Yes Need to build index, but also need to specify the input format
Snappy No, need to install Snappy .snappy no Like text processing, no modification is required

In order to support multiple compression/decompression algorithms, Hadoop has introduced an encoder/decoder, as shown in the following table.

Table 4-8

Compression format Corresponding encoder/decoder
DEFLATE org.apache.hadoop.io.compress.DefaultCodec
gzip org.apache.hadoop.io.compress.GzipCodec
bzip2 org.apache.hadoop.io.compress.BZip2Codec
VOC com.hadoop.compression.lzo.LzopCodec
Snappy org.apache.hadoop.io.compress.SnappyCodec

Comparison of compression performance

Table 4-9

Compression algorithm Original file size Compressed file size Compression speed Decompression speed
gzip 8.3GB 1.8GB 17.5MB/s 58MB/s
bzip2 8.3GB 1.1GB 2.4MB/s 9.5MB/s
VOC 8.3GB 2.9GB 49.3MB/s 74.6MB/s

http://google.github.io/snappy/

On a single core of a Core i7 processor in 64-bit mode, Snappy compresses at about 250 MB/sec or more and decompresses at about 500 MB/sec or more.

Big data training

Guess you like

Origin blog.csdn.net/msjhw_com/article/details/109175120