hdfs 常见的获取文件系统代码,如下
Configuration conf = new Configuration();
FileSystem fs;
fs = FileSystem.get(conf);
fs 最终指向的是FileSystem哪个子类呢,过程是怎么样的,
逐步跳进代码一看
public static FileSystem get(Configuration conf) throws IOException {
return get(getDefaultUri(conf), conf);
}
再点进去
public static FileSystem get(URI uri, Configuration conf) throws IOException {
String scheme = uri.getScheme();
String authority = uri.getAuthority();
//如果均为空,使用默认的本地文件存储系统
if (scheme == null && authority == null) { // use default FS
return get(conf);
}
if (scheme != null && authority == null) { // no authority
URI defaultUri = getDefaultUri(conf);
if (scheme.equals(defaultUri.getScheme()) // if scheme matches default
&& defaultUri.getAuthority() != null) { // & default has authority
return get(defaultUri, conf); // return default
}
}
//如果不使用缓存,则直接创建文件系统对象,返回
String disableCacheName = String.format("fs.%s.impl.disable.cache", scheme);
if (conf.getBoolean(disableCacheName, false)) {
return createFileSystem(uri, conf);
}
//从CACHE 找出文件系统对象
return CACHE.get(uri, conf);
}
CACHE.get(uri, conf)实现
FileSystem get(URI uri, Configuration conf) throws IOException{
Key key = new Key(uri, conf);
return getInternal(uri, conf, key);
}
getInternal()实现
private FileSystem getInternal(URI uri, Configuration conf, Key key) throws IOException{
FileSystem fs;
synchronized (this) {
fs = map.get(key);
}
if (fs != null) {
return fs;
}
//创建文件系统
fs = createFileSystem(uri, conf);
synchronized (this) { // refetch the lock again
FileSystem oldfs = map.get(key);
if (oldfs != null) { // a file system is created while lock is releasing
fs.close(); // close the new file system
return oldfs; // return the old file system
}
// now insert the new file system into the map
if (map.isEmpty()
&& !ShutdownHookManager.get().isShutdownInProgress()) {
ShutdownHookManager.get().addShutdownHook(clientFinalizer, SHUTDOWN_HOOK_PRIORITY);
}
fs.key = key;
map.put(key, fs);
if (conf.getBoolean("fs.automatic.close", true)) {
toAutoClose.add(key);
}
//返回
return fs;
}
}
createFileSystem(uri, conf)
根据scheme 创建相应的文件系统
private static FileSystem createFileSystem(URI uri, Configuration conf
) throws IOException {
Class<?> clazz = getFileSystemClass(uri.getScheme(), conf);
if (clazz == null) {
throw new IOException("No FileSystem for scheme: " + uri.getScheme());
}
FileSystem fs = (FileSystem)ReflectionUtils.newInstance(clazz, conf);
fs.initialize(uri, conf);
return fs;
}
跳进getFileSystemClass(uri.getScheme(), conf)
public static Class<? extends FileSystem> getFileSystemClass(String scheme,
Configuration conf) throws IOException {
if (!FILE_SYSTEMS_LOADED) {
//这一行加载了FileSystem所有配置好的子类
loadFileSystems();
}
Class<? extends FileSystem> clazz = null;
if (conf != null) {
clazz = (Class<? extends FileSystem>) conf.getClass("fs." + scheme + ".impl", null);
}
if (clazz == null) {
clazz = SERVICE_FILE_SYSTEMS.get(scheme);
}
if (clazz == null) {
throw new IOException("No FileSystem for scheme: " + scheme);
}
return clazz;
}
这里要特别注意,hadoop 2.6 core-default.xml 已经没有 “fs.hdfs.impl” 的默认选项。
所以如果 scheme 是“hdfs” 在 core-default.xml 中将找不到实现类
不过 SERVICE_FILE_SYSTEMS.get(scheme);解决了这一问题。
loadFileSystems实现
private static void loadFileSystems() {
synchronized (FileSystem.class) {
if (!FILE_SYSTEMS_LOADED) {
ServiceLoader<FileSystem> serviceLoader = ServiceLoader.load(FileSystem.class);
for (FileSystem fs : serviceLoader) {
SERVICE_FILE_SYSTEMS.put(fs.getScheme(), fs.getClass());
}
FILE_SYSTEMS_LOADED = true;
}
}
}
ServiceLoader.load 依赖配置文件 hadoop-hdfs\src\main\resources\META-INF\services 目录下的 org.apache.hadoop.fs.FileSystem 配置文件
其内容是
org.apache.hadoop.hdfs.DistributedFileSystem
org.apache.hadoop.hdfs.web.HftpFileSystem
org.apache.hadoop.hdfs.web.HsftpFileSystem
org.apache.hadoop.hdfs.web.WebHdfsFileSystem
org.apache.hadoop.hdfs.web.SWebHdfsFileSystem
每个 FileSystem 子类都实现了 getScheme() 方法
例如DistributedFileSystem
@Override
public String getScheme() {
return HdfsConstants.HDFS_URI_SCHEME;
}
自此 scheme 与 FileSystem子类的 映射关系建立好了
注意 FileSystem 执行 close()方法,会执行以下代码。
@Override
public void close() throws IOException {
// delete all files that were marked as delete-on-exit.
processDeleteOnExit();
CACHE.remove(this.key, this);
}
如果是多线程环境,要慎用 close方法