Hadoop 源码详解之FileInputFormat类

Hadoop 源码详解之`FileInputFormat`类【updating…】

1. 类释义

A base class for file-based InputFormats.
针对基于文件的 InputFormats 一个基类

FileInputFormat is the base class for all file-based InputFormats. This provides a generic implementation of getSplits(JobContext). Implementations of FileInputFormat can also override the isSplitable(JobContext, Path) method to prevent input files from being split-up in certain situations. Implementations that may deal with non-splittable files must override this method, since the default implementation assumes splitting is always possible.
FileInputFormat 是一个基类对于素有基于文件的InputFormats。这个类提供了一个一般的实现——getSplits(JobContext)。FileInputFormat的实现也覆写了isSplitable(JobContext,Path)方法去阻止输入文件被文件在某些场景下被切割。必须覆写这个方法才能同时实现不切割文件，因为默认的实现总是假设切割是可能的。

2. 类源码

public abstract class FileOutputFormat<K, V> extends OutputFormat<K, V> {
...
}

3. 方法详解

3.1 `setInputPaths()`方法

Sets the given comma separated paths as the list of inputs for the map-reduce job.
使用给定的逗号分隔路径作为为map-reduce job的文件列表

static void 	setInputPaths(Job job, Path... inputPaths)
Set the array of Paths as the list of inputs for the map-reduce job.

在这里插入图片描述注意，在调用这个方式时，可以看到有一个commaSeparate，这个表明的就是后面可跟逗号分隔的文件列表。

Hadoop 源码详解之FileInputFormat类

Hadoop 源码详解之FileInputFormat类【updating…】

1. 类释义

2. 类源码

3. 方法详解

3.1 setInputPaths()方法

猜你喜欢

Hadoop 源码详解之`FileInputFormat`类【updating…】

3.1 `setInputPaths()`方法