MapReduce之单表关联Join输出祖父母、孙子---(附例子)

需求:一个文件,有子女和对应的父母,要求输出 祖父母  孙子,
文件如下:

单表关联                        结果:
child        parent            grand	child
Tom        Lucy                Alice	Tom
Tom        Jack                Jesse	Tom
Jone        Lucy               Alice	Jone
Jone        Jack               Jesse	Jone
Lucy        Mary               Ben  	Tom
Lucy        Ben                Mary 	Tom
Jack        Alice              Ben	    Jone
Jack        Jesse              Mary	    Jone
Terry        Alice             Alice  	Philip
Terry        Jesse             Jesse	Philip
Philip       Terry             Alice	Mark
Philip       Alma             Jesse 	Mark
Mark       Terry
Mark       Alma

1.Mapper.class

public class SingleMapper  extends Mapper<LongWritable,Text,Text,Text>{
	@Override
	protected void map(LongWritable key, Text value, Context context)
			throws IOException, InterruptedException {
     String line=value.toString();
     if(line.contains("child ")| line.contains("parent")){
		 return ;
	 }
    // String[] _str=line.split("\t"); //会报数组越界
     StringTokenizer _str=new StringTokenizer(line);
    while(_str.hasMoreTokens()){
    	String child=_str.nextToken();
    	String parent=_str.nextToken();
    	context.write(new Text(parent),new Text("1,"+child));//1 儿子 :父母儿子
        context.write(new Text(child),new Text("0,"+parent));//0  祖父  :父母 祖父
    }
  }
}

2.Reducer.class

public class SingleReduce extends Reducer<Text, Text, Text, Text> {
	
	@Override
	protected void setup(Context context)
			throws IOException, InterruptedException {
		 context.write(new Text("grand"), new Text("child"));
	} //只执行一次
	@Override
	protected void reduce(Text key, Iterable<Text> values, Context context)
			throws IOException, InterruptedException {
		 ArrayList<String> left=new ArrayList<String>();
		 ArrayList<String> right=new ArrayList<String>();
		 for(Text v:values){
				if(v.toString().contains("1")){
					left.add(v.toString().split(",")[1]);//孙子
				}else{
					right.add(v.toString().split(",")[1]);//祖父母
				}
			}//对相同的key
			for(int i=0;i<left.size();i++){ //相当于笛卡儿积
				for(int j=0;j<right.size();j++){
					context.write(new Text(right.get(j)), new Text(left.get(i)));
				}		 
			}
	}
}

3.Driver.class

public class SingleDriver {
	public static void main(String[] args) throws IllegalArgumentException, IOException, ClassNotFoundException, InterruptedException, URISyntaxException {
		Configuration conf = new Configuration();
		 
		 Path outfile = new Path("file:///D:/输出结果/singleout");
		FileSystem fs = outfile.getFileSystem(conf);
		if(fs.exists(outfile)){
			fs.delete(outfile,true);
		}
		Job job = Job.getInstance(conf);
		job.setJarByClass(SingleDriver.class);
		job.setJobName("Sencondary Sort");
		job.setMapperClass(SingleMapper.class);  
	    job.setReducerClass(SingleReduce.class);
	 
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(Text.class);
		 
		FileInputFormat.addInputPath(job, new Path("file:///D:/测试数据/单表关联.txt/"));
		FileOutputFormat.setOutputPath(job,outfile);
		
		System.exit(job.waitForCompletion(true)?0:1);
	}
}

 
总结:join解决表关联查询的时候,特别要锁定标识位,通常作为key,去比较筛选所得的value,最后context.write(),写出.






 

猜你喜欢

转载自blog.csdn.net/xiaozelulu/article/details/81186580
今日推荐