Hadoop 源码详解之FileInputFormat类

版权声明:如若转载,请联系作者。 https://blog.csdn.net/liu16659/article/details/85198956

Hadoop 源码详解之FileInputFormat类【updating…】

1. 类释义

A base class for file-based InputFormats.
针对基于文件的 InputFormats 一个基类

FileInputFormat is the base class for all file-based InputFormats. This provides a generic implementation of getSplits(JobContext). Implementations of FileInputFormat can also override the isSplitable(JobContext, Path) method to prevent input files from being split-up in certain situations. Implementations that may deal with non-splittable files must override this method, since the default implementation assumes splitting is always possible.
FileInputFormat 是一个基类对于素有基于文件的InputFormats。这个类提供了一个一般的实现——getSplits(JobContext)FileInputFormat的实现也覆写了isSplitable(JobContext,Path)方法去阻止输入文件被文件在某些场景下被切割。 必须覆写这个方法才能同时实现不切割文件,因为默认的实现总是假设切割是可能的 。

2. 类源码

public abstract class FileOutputFormat<K, V> extends OutputFormat<K, V> {
...
}

3. 方法详解

3.1 setInputPaths()方法

Sets the given comma separated paths as the list of inputs for the map-reduce job.
使用给定的逗号分隔路径作为 为map-reduce job的文件列表

static void 	setInputPaths(Job job, Path... inputPaths)
Set the array of Paths as the list of inputs for the map-reduce job.

在这里插入图片描述注意,在调用这个方式时,可以看到有一个commaSeparate,这个表明的就是后面可跟逗号分隔的文件列表。

猜你喜欢

转载自blog.csdn.net/liu16659/article/details/85198956