Data compression mechanism in Shuffle stage

In the shuffle stage, you can see that the data is copied through a large number of times. The data output from the map stage is copied through the network and sent to the reduce stage. This process involves a large amount of network IO. If the data can be compressed, then The amount of data sent will be much less.

Compression algorithms supported in Hadoop:

  • gzip
  • bzip2
  • LZO
  • LZ4
  • Snappy

These compression algorithms combine compression and decompression rates. Google's Snappy is the best, and Snappy compression is generally chosen.


See you next time, bye!

Guess you like

Origin blog.csdn.net/frdevolcqzyxynjds/article/details/131852803