hadoop 2.6 源码 解读之FileSystem.get(conf)实现

hdfs 常见的获取文件系统代码,如下

Configuration conf = new Configuration();  
        FileSystem fs;
        fs = FileSystem.get(conf);

fs 最终指向的是FileSystem哪个子类呢,过程是怎么样的,
逐步跳进代码一看

  public static FileSystem get(Configuration conf) throws IOException {
    return get(getDefaultUri(conf), conf);
  }

再点进去

  public static FileSystem get(URI uri, Configuration conf) throws IOException {
    String scheme = uri.getScheme();
    String authority = uri.getAuthority();

//如果均为空,使用默认的本地文件存储系统
    if (scheme == null && authority == null) {     // use default FS
      return get(conf);
    }

    if (scheme != null && authority == null) {     // no authority
      URI defaultUri = getDefaultUri(conf);
      if (scheme.equals(defaultUri.getScheme())    // if scheme matches default
          && defaultUri.getAuthority() != null) {  // & default has authority
        return get(defaultUri, conf);              // return default
      }
    }

    //如果不使用缓存,则直接创建文件系统对象,返回
    String disableCacheName = String.format("fs.%s.impl.disable.cache", scheme);
    if (conf.getBoolean(disableCacheName, false)) {
      return createFileSystem(uri, conf);
    }

//从CACHE 找出文件系统对象
    return CACHE.get(uri, conf);
  }

CACHE.get(uri, conf)实现

   FileSystem get(URI uri, Configuration conf) throws IOException{
      Key key = new Key(uri, conf);
      return getInternal(uri, conf, key);
    }

getInternal()实现

private FileSystem getInternal(URI uri, Configuration conf, Key key) throws IOException{
      FileSystem fs;
      synchronized (this) {
        fs = map.get(key);
      }
      if (fs != null) {
        return fs;
      }

//创建文件系统
      fs = createFileSystem(uri, conf);
      synchronized (this) { // refetch the lock again
        FileSystem oldfs = map.get(key);
        if (oldfs != null) { // a file system is created while lock is releasing
          fs.close(); // close the new file system
          return oldfs;  // return the old file system
        }

        // now insert the new file system into the map
        if (map.isEmpty()
                && !ShutdownHookManager.get().isShutdownInProgress()) {
          ShutdownHookManager.get().addShutdownHook(clientFinalizer, SHUTDOWN_HOOK_PRIORITY);
        }
        fs.key = key;
        map.put(key, fs);
        if (conf.getBoolean("fs.automatic.close", true)) {
          toAutoClose.add(key);
        }
        //返回
        return fs;
      }
    }

createFileSystem(uri, conf)
根据scheme 创建相应的文件系统

  private static FileSystem createFileSystem(URI uri, Configuration conf
      ) throws IOException {
    Class<?> clazz = getFileSystemClass(uri.getScheme(), conf);
    if (clazz == null) {
      throw new IOException("No FileSystem for scheme: " + uri.getScheme());
    }
    FileSystem fs = (FileSystem)ReflectionUtils.newInstance(clazz, conf);
    fs.initialize(uri, conf);
    return fs;
  }

跳进getFileSystemClass(uri.getScheme(), conf)

  public static Class<? extends FileSystem> getFileSystemClass(String scheme,
      Configuration conf) throws IOException {
    if (!FILE_SYSTEMS_LOADED) {
    //这一行加载了FileSystem所有配置好的子类
      loadFileSystems();
    }
    Class<? extends FileSystem> clazz = null;
    if (conf != null) {
      clazz = (Class<? extends FileSystem>) conf.getClass("fs." + scheme + ".impl", null);
    }
    if (clazz == null) {
      clazz = SERVICE_FILE_SYSTEMS.get(scheme);
    }
    if (clazz == null) {
      throw new IOException("No FileSystem for scheme: " + scheme);
    }
    return clazz;
  }

这里要特别注意,hadoop 2.6 core-default.xml 已经没有 “fs.hdfs.impl” 的默认选项。
所以如果 scheme 是“hdfs” 在 core-default.xml 中将找不到实现类
不过 SERVICE_FILE_SYSTEMS.get(scheme);解决了这一问题。
loadFileSystems实现

  private static void loadFileSystems() {
    synchronized (FileSystem.class) {
      if (!FILE_SYSTEMS_LOADED) {
        ServiceLoader<FileSystem> serviceLoader = ServiceLoader.load(FileSystem.class);
        for (FileSystem fs : serviceLoader) {
          SERVICE_FILE_SYSTEMS.put(fs.getScheme(), fs.getClass());
        }
        FILE_SYSTEMS_LOADED = true;
      }
    }
  }

ServiceLoader.load 依赖配置文件 hadoop-hdfs\src\main\resources\META-INF\services 目录下的 org.apache.hadoop.fs.FileSystem 配置文件
其内容是


org.apache.hadoop.hdfs.DistributedFileSystem
org.apache.hadoop.hdfs.web.HftpFileSystem
org.apache.hadoop.hdfs.web.HsftpFileSystem
org.apache.hadoop.hdfs.web.WebHdfsFileSystem
org.apache.hadoop.hdfs.web.SWebHdfsFileSystem

每个 FileSystem 子类都实现了 getScheme() 方法
例如DistributedFileSystem

 @Override
  public String getScheme() {
    return HdfsConstants.HDFS_URI_SCHEME;
  }

自此 scheme 与 FileSystem子类的 映射关系建立好了

注意 FileSystem 执行 close()方法,会执行以下代码。

  @Override
  public void close() throws IOException {
    // delete all files that were marked as delete-on-exit.
    processDeleteOnExit();
    CACHE.remove(this.key, this);
  }

如果是多线程环境,要慎用 close方法

猜你喜欢

转载自blog.csdn.net/zhixingheyi_tian/article/details/80301088