Hadoop_11_HDFS的流式 API 操作

　　对于MapReduce等框架来说，需要有一套更底层的API来获取某个指定文件中的一部分数据，而不是一整个文件

因此使用流的方式来操作 HDFS上的文件，可以实现读取指定偏移量范围的数据

1.客户端测试类代码：

package cn.bigdata.hdfs;
import java.io.IOException;
import java.net.URI;
import java.net.URISyntaxException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.junit.Before;

public class HdfsStreamAcess {
    //获取客户端操作hdfs的实例对象
    private FileSystem fs  = null;
    Configuration conf = null;
    @Before
    public void inin() throws IOException, InterruptedException, URISyntaxException{
        conf = new Configuration();
        //拿到一个文件系统操作的客户端实例对象,最后一个参数为用户名
        fs = FileSystem.get(new URI("hdfs://shizhan2:9000"),conf,"root");
    }
}

2.流式上传文件：

    //流式上传文件
    @Test
    public void testUploadWithStream() throws IllegalArgumentException, IOException{
        //true:该文件夹存在就覆盖  IOUtils:工具类
        FSDataOutputStream outputstream = fs.create(new Path("/angelababy.love"), true);
        FileInputStream input = new FileInputStream("c:/xxx.txt");
        IOUtils.copy(input, outputstream);
    }

3.流式下载文件：

    //流式下载文件
    @Test
    public void testDownloadWithStream() throws Exception{
        FSDataInputStream in = fs.open(new Path("/angelababy.love"));
        FileOutputStream out = new FileOutputStream("d:/access_stream.log");
        IOUtils.copy(in, out);
    }

4.流式读取指定长度的文件：

Hadoop_11_HDFS的流式 API 操作

猜你喜欢