学习hadoop遇到的错误

 

0 为何reduce也会有分组:

文件1--->map1分组---> 张三一组,  李四一组
文件2--->map2分组---> 张三一组,  李四一组

在map阶段,文件1和文件2仅仅在本map内分组但是map1和map2之间不会分组,因此只有在reduce的时候才能将所有数据合并并分组。

0.1

map任务 ---> 由调用文件hdfs的block个数决定
map函数: 调用文件每一行调用一次

reduce任务 ---> 由分区决定,分区代码需要自定义实现,默认分一个区。

                            具体见 hadoop patition 分区简介和自定义
reduce函数: 由map处理后得到的分组个数决定调用多少次

1 在eclipse写自定义reduce时,

要么Context带上泛型,

class MyReducer2 extends Reducer<LongWritable, LongWritable, LongWritable, LongWritable>{

	protected void reduce(LongWritable k2, Iterable<LongWritable> v2s,
			org.apache.hadoop.mapreduce.Reducer<LongWritable,LongWritable,LongWritable,LongWritable>.Context context)
			throws IOException, InterruptedException {
		 System.out.println("reduce2");
	}

}

要么不带泛型 也不需要带上包路径:

class MyReducer1 extends Reducer<LongWritable, LongWritable, LongWritable, LongWritable>{
	protected void reduce(LongWritable k2, java.lang.Iterable<LongWritable> v2s, Context context) throws java.io.IOException ,InterruptedException {
	   System.out.println("reduce");
	};
}

如果 带上包路径又不带上泛型,则reduce走不进去: 这种写法eclipse会有黄色波浪线提示,提示你应该加上泛型

class MyReducer2 extends Reducer<LongWritable, LongWritable, LongWritable, LongWritable>{

	protected void reduce(LongWritable k2, Iterable<LongWritable> v2s,
			org.apache.hadoop.mapreduce.Reducer.Context context)
			throws IOException, InterruptedException {
		 System.out.println("reduce2");
	}

}   

2  在map节点自定义key(一般是个实体类)时,如果这个类的属性有string类型,那么在流输入输出写法和

long等的写法不同,具体如下:

public static class MyUser implements Writable, DBWritable{
		int id;
		String name;
	
		@Override
		public void write(DataOutput out) throws IOException {
			out.writeInt(id);
			Text.writeString(out, name); // 使用org.apache.hadoop.io.Text类实现读写
		}

		@Override
		public void readFields(DataInput in) throws IOException {
			this.id = in.readInt();
			this.name = Text.readString(in); // // 使用org.apache.hadoop.io.Text类实现读写
		}

否则报错如下:

 java.io.DataInputStream.readFully(Unknown Source) 

猜你喜欢

转载自chengjianxiaoxue.iteye.com/blog/2165115
今日推荐