Hadoop2——Hadoop program implementation

1 Install hadoop program

1.1 The installation environment on Linux is a software package, which is essentially the same as mysql

Check out Silicon Valley’s courses

1.2 Install the environment on windows (so that you can run the hadoop program in windows and test whether the hadoop program written in java is available)

Refer to the following information to successfully install it on your computer:
Hadoop-3.0.0 version Windows installation

2 Project construction (basically the same as ordinary projects)

There are only three differences from ordinary project construction:
(1) Need to import the JAR package of hadoop, which contains dozens of jar packages
(2) Rewrite the map method and reduce method
(3) Run in an environment where hadoop is installed
Insert image description here

2.1 Create a normal project

2.2 Import Hadoop JAR package

2.2.1 Manual import

Add external dependent libraries directly
Insert image description here

2.2.2 maven import

1 Check out Silicon Valley’s courses
2 Use Maven to build a Hadoop development environment

2.3 Write a total of 3 classes, namely Driver, Mapper and Reducer.

2.3.1 Main function logic class Driver

2.3.2 Rewrite the map method

2.3.3 Rewriting the Reduce method

2.4 Test on Windows with hadoop or test the code on linux

2.5 Package into JAR package

2.6 Deploy and run on Linux environment (hadoop command must be used)

Two steps:
1 su yarn switch to yarn user
2 hadoop jar EXEMPLE_RUNNABLE.jar…

If packaged into a normal jar package, you need to specify the main method entry at runtime:
% hadoop jar EXEMPLE.jar MainClassName
The advantage is that you can arbitrarily specify the main method that needs to be run in the jar package

If packaged into a runnable jar package, the main method entry is specified when packaging:
% hadoop jar EXEMPLE_RUNNABLE.jar…

3 Other important things to know

3.1 Partition

3.2 shuffle

4 A few small cases

1 This case is very good: hadoop starts the wordcount instance, including hadoop's own jar package and eclipsejar package.

2 MapReduce classic case practice

3 Detailed explanation of MapReduce parameters

5 hadoop and java run jar package command

5.1 hadoop

5.1.1 Specify the main function

// 主函数包括两个参数
hadoop jar study_demo.jar com.ncst.hadoop.MaxTemperature /input/sample.txt /output

5.1.2 Default main function

// 主函数包括两个参数
hadoop jar study_demo.jar /input/sample.txt /output

5.2 java command

java -jar
or java -cp

Guess you like

Origin blog.csdn.net/xiaotiig/article/details/126895997