1. Preliminary work
The hadoop environment was successfully built.
For details on how to build a hadoop environment, click here
2. Count words in the server minimization system
2.1 Switch users, view the process
Purpose: to ensure that the
su-angel
master node process is enabled in the hadoop cluster
Slave node process
2.3 Establish test documents
The path of the test document is /home/angel
vim.tiny sw1.txt
vim.tiny sw2.txt
2.4 Create a test folder and upload it to the cluster
View the file
hdfs dfs -ls
in the cluster / Create a test folder in the cluster as the test folder
hdfs dfs -mkdir /test
Upload the test folder to the cluster
hdfs dfs -put sw*.txt /test
View the test folder Did not upload successfully
hdfs dfs -ls /test
view the content of sw1.txt
hdfs dfs -cat /test/sw1.txt
2.5 Run the word count program in the hadoop package to count the words in the test file
2.5.1 Find related packages
cd /app/hadoop-2.8.5/
2.5.2 Run the classes in the hadoop package to count words
Run the wordcount class in the hadoop-mapreduce-examples-2.8.5 package with the hadoop command, enter /test, and output out1.
hadoop jar hadoop-mapreduce-examples-2.8.5 jar wordcount /test /out1
2.5.3 View output files in the cluster
At this point, the haddop cluster is established on the server minimization system, and MapReduce is run to count the words successfully.
3. Run eclipse and write a word counting program
3.1 Preliminary work
This article mainly explains the application
details on the MapReduce platform. Click here to view the preliminary work.
Note:
eclipse can only be run on the desktop version of the system, and cannot be run in the prompt window .
3.2 Open eclipse
3.3 Create the project and configure the Hadoop installation directory
3.3.1 Setting up the project
Execute "File"-"New"-"Other..." in the menu bar and select "Map/Reduce Project", as shown in the figure below.
Enter the project name "WordCount" and select "Configure Hadoop install directory..."
3.3.2 Configure Hadoop installation directory
Click next.
Select the project.
To this project name to create and configure the Hadoop installation directory is complete. Click next.
3.4 New Hadoop address
Enter "master" in the "Location name" text box, the original "localhost" of "Map/Reduce(V2) Mater" "Host" is changed to "master", the original "Port" of "DFS Master" is changed to "50040" As "9000", the others remain unchanged.
Insert:
Why is it 9000?
You can go to the configuration file to see.
cat /app/hadoop-2.8.5/etc/hadoop/core-site.xml to
see if there is a master in the blue on the right, and if there is a master, it proves that the new Hadoop address is successful.
3.5 Establish a class for writing word counts
In "WordCount"-"src", create a new "WordCount" class.
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class WordCount {
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
public static class IntSumReducer
extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length < 2) {
System.err.println("Usage: wordcount <in> [<in>...] <out>");
System.exit(2);
}
Job job = Job.getInstance(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
for (int i = 0; i < otherArgs.length - 1; ++i) {
FileInputFormat.addInputPath(job, new Path(otherArgs[i]));
}
FileOutputFormat.setOutputPath(job,
new Path(otherArgs[otherArgs.length - 1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
3.6 Run class, you need to add program parameters and set the main class before
“Run As”-“Run_Confilgurations”,双击“Java Application”
3.6.1 Add program parameters
Double-click "Java Application", enter "WordCount" in the Name text box,
Select the "Arguments" tab and enter two lines of parameters in the "Program arguments" text box: "hdfs://master:9000/input", "hdfs://master:9000/output"
3.6.2 Set the main class
Select the main category of the project, click "Search", and
set "WordCount-(default package)" as the main category.
3.7 Clear the previous experiment files to prevent interference
Create test files/input
upload test files in the cluster
3.8 Reconnect to the cluster
3.9 Run Hadoop
Select the project, the
console does not report errors, only warnings.
3.10 View results
View the result in the character interface, which is consistent with the window on the desktop.
So far, writing a word counting program on the MapReduce platform is successful! !