El programa Mapreduce se ejecuta localmente, caso de estadísticas de palabras

El programa mapreduce ejecuta el caso de estadísticas de palabras localmente, y los datos de entrada y salida se almacenan localmente

Funcionamiento en modo de clúster: https://blog.csdn.net/weixin_43614067/article/details/108400938 Envío
local al clúster para su funcionamiento: https://blog.csdn.net/weixin_43614067/article/details/108401227

Estadísticas de texto de palabras, word.txt (ubicado en C: \ Users \ Think \ Desktop \ input \ word.txt)

Stray birds of summer 
come to my window to sing and 
fly away  And yellow leaves of autumn
which have no songs flutter and fall there with a
sign  O Troupe of little vagrants of the world l
eave your footprints in my
words  The world puts off 
its mask of vastness to its 
lover  It becomes small as one song 
as one kiss of the eternal  It is the tears of 
the earth that keep her smiles in bloom  The mighty desert 
is burning for the love of a blade of grass 
who shakes her head and laughs and flies 
away  If you shed tears when you miss the 
sun you also miss the stars  The sands in 
your way beg for your song and your movement 
dancing water Will you carry the burden of their 
lameless  Her wishful face haunts my dreams like the 
rain at night  Once we dreamt that we were strangers 
 We wake up to find that 
 we were dear to each other

Lado del mapeador

package com.bjsxt.wc;

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class WCMapper extends Mapper<LongWritable, Text, Text, IntWritable>{
    
    

	@Override
	protected void map(LongWritable key, Text value, Context context)
			throws IOException, InterruptedException {
    
    
		//LongWritable为输入文本内容的每一行起始位置,value为每一行内容
		String line = value.toString();
		String[] words = line.split(" ");
		for (String word : words) {
    
    
			context.write(new Text(word), new IntWritable(1));
		}
	}
}

Lado reductor

package com.bjsxt.wc;

import java.io.IOException;

import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableReducer;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class WCReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
    
    

	@Override
	protected void reduce(Text key, Iterable<IntWritable> values, Context context)
			throws IOException, InterruptedException {
    
    
		Integer count = 0;
		for (IntWritable value : values) {
    
    
			count += value.get();
		}
		context.write(key, new IntWritable(count));
	}
}

Corredor 端

package com.bjsxt.wc;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WCRunner {
    
    
    public static void main(String[] args) throws Exception {
    
    
        //创建配置对象
        Configuration conf = new Configuration();
        //创建Job对象
        Job job = Job.getInstance(conf, "wordCount");
        //设置mapper类
        job.setMapperClass(WCMapper.class);
        //设置 Reduce类
        job.setReducerClass(WCReducer.class);

        //设置运行job类
        job.setJarByClass(WCRunner.class);

        //设置map输出的key,value类型
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);
        //设置reduce输出的key,value类型
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);

        //设置输入路径金额输出路径
        /*FileInputFormat.setInputPaths(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));*/
        FileInputFormat.setInputPaths(job, new Path("C:\Users\Think\Desktop\input\word.txt"));
        FileOutputFormat.setOutputPath(job, new Path("C:\Users\Think\Desktop\output"))
        
        long startTime = System.currentTimeMillis();
        try {
    
    
       		//提交job
            boolean b = job.waitForCompletion(true);
            if (b) {
    
    
                System.out.println("单词统计完成!");
            }
        } finally {
    
    
            // 结束的毫秒数
            long endTime = System.currentTimeMillis();
            System.out.println("Job<" + job.getJobName() + ">是否执行成功:" + job.isSuccessful() + "; 开始时间:" + startTime + "; 结束时间:" + endTime + "; 用时:" + (endTime - startTime) + "ms");
        }
    }
}

Nota: use

FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));

El
Inserte la descripción de la imagen aquí
Inserte la descripción de la imagen aquíresultado de ingresar parámetros para la operación es el siguiente:
Inserte la descripción de la imagen aquí

Inserte la descripción de la imagen aquíSi el directorio de entrada y el directorio de salida siguen siendo los mismos, se informará la siguiente excepción

在这里插入代码片`2020-09-03 16:37:03,938 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1129)) - session.id is deprecated. Instead, use dfs.metrics.session-id
2020-09-03 16:37:03,942 INFO  [main] jvm.JvmMetrics (JvmMetrics.java:init(76)) - Initializing JVM Metrics with processName=JobTracker, sessionId=
Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory file:/C:/Users/Think/Desktop/input already exists
	at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:146)
	at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:267)
	at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:140)
	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297)
	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1692)
	at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294)
	at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1315)
	at com.bjsxt.wc.WCRunner.main(WCRunner.java:77)

Process finished with exit code 1`

Solución: el texto de entrada existe en el directorio y el directorio de salida no es coherente

El programa mapreduce ejecuta un caso de estadísticas de palabras localmente, los datos de entrada y salida se colocan en hdfs y se agrega y modifica la siguiente configuración

 //本地运行,读取hdfs数据,并将数据提到hdfs
 conf.set("fs.defaultFS", "hdfs://node001:8020");
 FileInputFormat.setInputPaths(job, new Path("hdfs://node001:8020/wordcount/input"));
 FileOutputFormat.setOutputPath(job, new Path("hdfs://node001:8020/wordcount/output"));

Funcionamiento en modo de clúster: https://blog.csdn.net/weixin_43614067/article/details/108400938 Envío
local al clúster para su funcionamiento: https://blog.csdn.net/weixin_43614067/article/details/108401227

Supongo que te gusta

Origin blog.csdn.net/weixin_43614067/article/details/108386389
Recomendado
Clasificación