mapreduceプログラムは、単語統計ケースをローカルで実行し、入力データと出力データはローカルに保存されます
クラスターモードの操作:https ://blog.csdn.net/weixin_43614067/article/details/108400938操作の
ためのクラスターへのローカル送信:https ://blog.csdn.net/weixin_43614067/article/details/108401227
統計ワードテキスト、word.txt(C:\ Users \ Think \ Desktop \ input \ word.txtにあります)
Stray birds of summer
come to my window to sing and
fly away And yellow leaves of autumn
which have no songs flutter and fall there with a
sign O Troupe of little vagrants of the world l
eave your footprints in my
words The world puts off
its mask of vastness to its
lover It becomes small as one song
as one kiss of the eternal It is the tears of
the earth that keep her smiles in bloom The mighty desert
is burning for the love of a blade of grass
who shakes her head and laughs and flies
away If you shed tears when you miss the
sun you also miss the stars The sands in
your way beg for your song and your movement
dancing water Will you carry the burden of their
lameless Her wishful face haunts my dreams like the
rain at night Once we dreamt that we were strangers
We wake up to find that
we were dear to each other
マッパー側
package com.bjsxt.wc;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class WCMapper extends Mapper<LongWritable, Text, Text, IntWritable>{
@Override
protected void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
//LongWritable为输入文本内容的每一行起始位置,value为每一行内容
String line = value.toString();
String[] words = line.split(" ");
for (String word : words) {
context.write(new Text(word), new IntWritable(1));
}
}
}
レデューサー側
package com.bjsxt.wc;
import java.io.IOException;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableReducer;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class WCReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
@Override
protected void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
Integer count = 0;
for (IntWritable value : values) {
count += value.get();
}
context.write(key, new IntWritable(count));
}
}
ランナーエンド
package com.bjsxt.wc;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WCRunner {
public static void main(String[] args) throws Exception {
//创建配置对象
Configuration conf = new Configuration();
//创建Job对象
Job job = Job.getInstance(conf, "wordCount");
//设置mapper类
job.setMapperClass(WCMapper.class);
//设置 Reduce类
job.setReducerClass(WCReducer.class);
//设置运行job类
job.setJarByClass(WCRunner.class);
//设置map输出的key,value类型
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
//设置reduce输出的key,value类型
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
//设置输入路径金额输出路径
/*FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));*/
FileInputFormat.setInputPaths(job, new Path("C:\Users\Think\Desktop\input\word.txt"));
FileOutputFormat.setOutputPath(job, new Path("C:\Users\Think\Desktop\output"))
long startTime = System.currentTimeMillis();
try {
//提交job
boolean b = job.waitForCompletion(true);
if (b) {
System.out.println("单词统计完成!");
}
} finally {
// 结束的毫秒数
long endTime = System.currentTimeMillis();
System.out.println("Job<" + job.getJobName() + ">是否执行成功:" + job.isSuccessful() + "; 开始时间:" + startTime + "; 结束时间:" + endTime + "; 用时:" + (endTime - startTime) + "ms");
}
}
}
注:使用
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
操作用のパラメータを入力した
結果は次のとおりです。
入力ディレクトリと出力ディレクトリが同じままの場合、次の例外が報告されます
在这里插入代码片`2020-09-03 16:37:03,938 INFO [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1129)) - session.id is deprecated. Instead, use dfs.metrics.session-id
2020-09-03 16:37:03,942 INFO [main] jvm.JvmMetrics (JvmMetrics.java:init(76)) - Initializing JVM Metrics with processName=JobTracker, sessionId=
Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory file:/C:/Users/Think/Desktop/input already exists
at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:146)
at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:267)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:140)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1692)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1315)
at com.bjsxt.wc.WCRunner.main(WCRunner.java:77)
Process finished with exit code 1`
解決策:入力テキストがディレクトリに存在し、出力ディレクトリに一貫性がありません
mapreduceプログラムは、単語統計ケースをローカルで実行し、入力データと出力データをhdfsに配置し、次の構成を追加および変更します。
//本地运行,读取hdfs数据,并将数据提到hdfs
conf.set("fs.defaultFS", "hdfs://node001:8020");
FileInputFormat.setInputPaths(job, new Path("hdfs://node001:8020/wordcount/input"));
FileOutputFormat.setOutputPath(job, new Path("hdfs://node001:8020/wordcount/output"));
クラスターモードの操作:https ://blog.csdn.net/weixin_43614067/article/details/108400938操作の
ためのクラスターへのローカル送信:https ://blog.csdn.net/weixin_43614067/article/details/108401227