Read data files from HDFS cluster:
Detailed analysis:
(1) In the Hadoop file system, the file is represented by the Hadoop path, not java.io.file, because the hdfs protocol must be adhered to, so of course it is the path! For example: hdfs:\\ubuntu:9000/result.
(2) To get an instance of FileSystem, it is not new, but constructed using several static factory methods of FieSystem.
①Run as user on the virtual machine:
FileSystem fs=FileSystem.get(conf);
②When running on the local machine, you need to communicate with the virtual machine:
conf.set("fs.defaultFS","hdfs://ubuntu:9000");
FileSystem fs =FileSystem.get(new URI("hdfs://ubuntu:9000"),conf,"oycc ");
(3) With the FileSystem instance, we can use the open() method to open the data stream, the open() method
The path to the file is passed, that is, the path.
(4) FSDataInputStream object: The FSDataInputStream class inherits a special class of the java.io.DataInputStream interface, and supports random access, which can be read from any position of the stream;
Also inherits the positionedReadable interface, you can use parameters to set the number of bytes read.
(5) The seek() method can find any absolute position in the stream, but it is expensive and rarely used. Use streaming data to build application access patterns, such as (mapreduce)
Code:
public class GetFile{ public static void main(String[] args) throws URISyntaxException, IOException, InterruptedException{ //Get the uri address of hdfs, and the username, // That is, what permissions to use to access files on the hdfs cluster. Configuration conf=new Configuration(); FileSystem fs= FileSystem.get(conf); //Get the input data stream object. FSDataInputStream is= fs.open(new Path("/result")); // get the output stream object FileOutputStream fos= new FileOutputStream( newFile("E://weather_result.txt")); //Use IOUtils to copy IOUtils.copyBytes(fis,fos,1024,true); fis.seek(0);//Go back to the beginning of the file IOUtils.copyBytes(fis,fos,1024,true); } }
write data file to hdfs
Detailed analysis:
1. Most of the above reading data files have been involved, that is, how to create an input stream to the hdfs cluster, here is to call the create(Path path) method. Create a file on hdfs and output data to this file.
Code:
importorg.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FSDataOutputStream; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IOUtils; import java.io. *; public class PutFile { public static void main(String[] args) throws IOException { Configuration conf = new Configuration(); FileSystem fs = FileSystem.get(conf);//Build instance FileInputStream fis= new FileInputStream(newFile("/home/oycc/file_example"));//Get the input stream object FSDataOutputStream fos =fs.create(new Path("/example1"));//Get the output stream object to the hdfs cluster IOUtils.copyBytes(fis,fos,1024,true); } }
April 22, 2018
Learn the Definitive Guide to hadoop