The first two mapreduce depth study: 14, mapreduce data compression - Use compression snappy - Code World

The first two mapreduce depth study: 14, mapreduce data compression - Use compression snappy

Others 2019-06-15 21:57:41 views: null

The first two mapreduce depth study: 14, mapreduce data compression - Use compression snappy

File compression has two advantages, saving disk space and speed up the transmission of data over a network and disk.

A mode: compression code set

Code:

FlowMain：

public static void main(String[] args) throws Exception {
//        设置我们的map阶段的压缩
Configuration configuration = new Configuration();
        configuration.set("mapreduce.map.output.compress","true");
        configuration.set("mapreduce.map.output.compress.codec","org.apache.hadoop.io.compress.SnappyCodec");

//        设置我们的reduce阶段的压缩
configuration.set("mapreduce.output.fileoutputformat.compress","true");
        configuration.set("mapreduce.output.fileoutputformat.compress.type","RECORD");
        configuration.set("mapreduce.output.fileoutputformat.compress.codec","org.apache.hadoop.io.compress.SnappyCodec");
        int run = ToolRunner.run(configuration, new FlowMain(), args);
        System.exit(run);
    }

Second way: configure a global MapReduce compression

We can modify mapred-site.xml configuration file, then restart the cluster, so that all mapreduce compression tasks ( usually not so configured )

map the output data is compressed

<property>

<name>mapreduce.map.output.compress</name>

<value>true</value>

</property>

<property>

<name>mapreduce.map.output.compress.codec</name>

<value>org.apache.hadoop.io.compress.SnappyCodec</value>

</property>

reduce the output data is compressed

<property>

　　<name>mapreduce.output.fileoutputformat.compress</name>

<value>true</value>

</property>

<property>

　　<name>mapreduce.output.fileoutputformat.compress.type</name>

<value>RECORD</value>

</property>

<property>

　　<name>mapreduce.output.fileoutputformat.compress.codec</name>

<value>org.apache.hadoop.io.compress.SnappyCodec</value>

</property>

The result: generate the following compressed files.

Note: We are inconvenient to manually open these compressed files, but the program will automatically compress these files according to their extension solution, then passed to the next step.

Guess you like

Origin www.cnblogs.com/mediocreWorld/p/11028335.html

The first two mapreduce depth study: 14, mapreduce data compression - Use compression snappy

MapReduce data compression schemes

The first two mapreduce depth study: 7, MapReduce statute process combiner

Hadoop-MapReduce-data compression

The first two mapreduce depth study: 10, phone number partition

About mapreduce format data compression and support

Hadoop learning: in-depth analysis of MapReduce's big data magic data compression (4)

The difference between the two compression methods of snappy-java

Use MapReduce for data cleaning

Hadoop using compression or decompression file or Mapreduce technical seminars

[Big Data - Hadoop - MapReduce hadoop] study notes: MapReduce framework Detailed

[Algorithms] In-depth understanding of data compression algorithms (lossless compression and lossy compression)

[] Two MapReduce, MapReduce programming model

Compression algorithm---take golang/snappy as an example

Demonstration of two data flow models of MapReduce programming

Should I choose ORC or Parquet for Hive data warehouse table building, LZO or Snappy for compression?

First, the resource consolidation and compression

The first run MapReduce programs

A Preliminary Study on Video Compression

Hadoop study notes - MapReduce

[Sequoia] database Sequoiadb lzw and snappy compression were applied in what the scene

HBase install snappy compression software and related coding configuration

Hadoop columnar storage engine Parquet/ORC and snappy compression

CentOS-based Hadoop source code compilation supports Snappy compression

Big Data interview short answer questions (two) - MapReduce

Compression depth perception - used in GAN

Mysql data compression (compression tables or columns)

Data Compression - Huffman Tree and Huffman Compression

JDK (GZIP for data compression)

Mapreduce use and errors in Combiner

Recommended

Linus is the most active in "eating dog food"!

Ranking

Share good programmer web front-end array and sorting, de-duplication and random roll call

Compilation error caused by cv_bridge and python version problems error: return-statement with no value, in function returning'void*' [-fpe

魔众帮助中心系统 v3.1.0 首页切换器，界面优化

Die beim Millimeterwellenradar-Integrationstest aufgetretene Grube (Multiprozessbindung an einen UDP-Port verursacht Probleme)

How to suppress the "requires transitive directive for an automatic module" warning properly?

LeetCode-1743. Restore the Array From Adjacent Pairs-Analysis and Code (Java)

Summer 2019 Summer soft essay 7 workers

Python中Assert断言的使用语法和例子

LeetCode one question per day (2021-2-3 sliding window median)

Fairchild, the ancestor of semiconductors, the legend of the first trillion-dollar start-up

Daily

More

2024-05-20(5)

2024-05-19(0)

2024-05-18(31)

2024-05-17(6)

2024-05-16(23)

2024-05-15(5)

2024-05-14(9)

2024-05-13(8)

2024-05-12(28)

2024-05-11(32)