版权声明:有一种生活不去经历不知其中艰辛,有一种艰辛不去体会,不会知道其中快乐,有一种快乐,没有拥有不知其中纯粹 https://blog.csdn.net/wwwzydcom/article/details/83905121
源码中Mapper类中的方法
/**
* The <code>Context</code> passed on to the {@link Mapper} implementations.
*/
public abstract class Context
implements MapContext<KEYIN,VALUEIN,KEYOUT,VALUEOUT> {
}
上下文map结束后,向reduce或者下一个阶段写数据时候
/**
* Called once at the beginning of the task.
*/
protected void setup(Context context
) throws IOException, InterruptedException {
// NOTHING
}
任务开始的时候被调用一次
/**
* Called once for each key/value pair in the input split. Most applications
* 对于输入分割中的每个键/值对调用一次。所有的应用程序
* should override this, but the default is the identity function.
* 应该重写这个,但默认是identity函数
* 这里的key和value是输入的
*/
@SuppressWarnings("unchecked")
protected void map(KEYIN key, VALUEIN value,
Context context) throws IOException, InterruptedException {
//输出的key-value context是上下文,属于管理者
context.write((KEYOUT) key, (VALUEOUT) value);
}
处理整个map阶段的核心业务
/**
* Called once at the end of the task.
*/
protected void cleanup(Context context
) throws IOException, InterruptedException {
// NOTHING
}
任务结束的时候
/**
* Expert users can override this method for more complete control over the
* 专家用户可以重写此方法以更完整地控制执行的mapper
* execution of the Mapper.
* @param context
* @throws IOException
*/
public void run(Context context) throws IOException, InterruptedException {
//初始化数据(初始化集合,加载表等)
setup(context);
try {
while (context.nextKeyValue()) {
//核心业务逻辑
map(context.getCurrentKey(), context.getCurrentValue(), context);
}
} finally {
//最终结束:流的关闭,资源的处理
cleanup(context);
}
}
}
具体的执行map方法的顺序
Reducer类
/**
* The <code>Context</code> passed on to the {@link Reducer} implementations.
*/
public abstract class Context
implements ReduceContext<KEYIN,VALUEIN,KEYOUT,VALUEOUT> {
}
负责写出数据的
/**
* Called once at the start of the task.
*/
protected void setup(Context context
) throws IOException, InterruptedException {
// NOTHING
}
开始的时候调用,初始化操作
/**
* This method is called once for each key(这个方法被所有key使用). Most applications will define
* their reduce class by overriding this method(所有的应用都会重写这个方法). The default implementation(默认是identity函数)
* is an identity function.
*/
@SuppressWarnings("unchecked")
protected void reduce(KEYIN key, Iterable<VALUEIN> values, Context context
) throws IOException, InterruptedException {
for(VALUEIN value: values) {
context.write((KEYOUT) key, (VALUEOUT) value);
}
}
具体的Reducer业务逻辑
/**
* Called once at the end of the task.
*/
protected void cleanup(Context context
) throws IOException, InterruptedException {
// NOTHING
}
收尾的一些关闭流的操作
/**
* Advanced application writers can use the 高级应用程序编写者可以使用
* {@link #run(org.apache.hadoop.mapreduce.Reducer.Context)} method to
* control how the reduce task works.控制整个reduce task工作
*/
public void run(Context context) throws IOException, InterruptedException {
setup(context);
try {
while (context.nextKey()) {
reduce(context.getCurrentKey(), context.getValues(), context);
// If a back up store is used, reset it
Iterator<VALUEIN> iter = context.getValues().iterator();
if(iter instanceof ReduceContext.ValueIterator) {
((ReduceContext.ValueIterator<VALUEIN>)iter).resetBackupStore();
}
}
} finally {
cleanup(context);
}
}
将所有方法串在一起