HDFS-Shell command operation
Enter /usr/local/hadoop:
cd /usr/local/hadoop
When using hdfs for the first time, you need to use the format command to format:
./bin/hdfs namenode -format
Use the start-dfs.sh command to start hdfs, you can use the jps command to check whether it started successfully:
./sbin/start-dfs.sh
Create hdfs file directory (virtual)
./bin/hdfs dfs -mkdir -p /user/hadoop/input
View the file directory under hdfs:
./bin/hdfs dfs -ls
Create a test text file on the desktop:
touch /home/hadoop/桌面/testinput.txt
Open the testinput text on the desktop and enter some information, such as: hello world! What····
Use the put command of hdfs to test the local files into the input directory of hdfs:
./bin/hdfs dfs -put /home/hadoop/桌面/testinput.txt input
Use -ls to view the files in the input directory:
./bin/hdfs dfs -ls input
Use the -cat command to view the contents of the testinput file in the input directory:
./bin/hdfs dfs -cat input/testinput.txt
Use the -get command to copy a file in the input directory to the local disk:
rm /home/hadoop/桌面/testinput.txt # 先删除桌面已有的testinput.txt
./bin/hdfs dfs -get input/testinput.txt /home/hadoop/桌面 #执行拷贝
Use the -rm command to delete a file in the input directory:
./bin/hdfs dfs -rm /user/hadoop/input/testinput.txt
HDFS-Java programming
First choose IDE, like: eclipse, idea, etc.
The fish monster chose idea, please refer to this blog post for the specific steps to install idea under Ubuntu
Create a new Java Project in idea, and import the required Hadoop JAR package, please refer to this blog post for specific steps
-
Determine whether the corresponding file exists in the HDFS directory
import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; public class HDFSFileIfExist { public static void main(String[] args){ try { String fileName = "hdfs://localhost:9000/user/hadoop/input/testinput.txt"; Configuration conf = new Configuration(); conf.set("fs.defaultFS", "hdfs://localhost:9000"); conf.set("fs.hdfs.impl", "org.apache.hadoop.hdfs.DistributedFileSystem"); FileSystem fs = FileSystem.get(conf); if(fs.exists(new Path(fileName))){ System.out.println("exists!"); } else{ System.out.println("not exists!"); } fs.close(); } catch (Exception e){ e.printStackTrace(); } } }
If the testinput.txt file exists, it returns exists!; otherwise it returns not exists!
If the following situation occurs:
Solution:
Add hadoop-hdfs-client-3.2.1.jar under hdfs (the version number may be different)
-
Read the contents of the testinput.txt file in the HDFS directory
import java.io.InputStream; import java.net.URL; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FsUrlStreamHandlerFactory; import org.apache.hadoop.io.IOUtils; public class ReadHDFSFileContents { public static void main(String[] args){ Configuration conf = new Configuration(); conf.set("fs.defaultFS", "hdfs://localhost:9000"); conf.set("fs.hdfs.impl", "org.apache.hadoop.hdfs.DistributedFileSystem"); try{ URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory()); InputStream in = new URL("hdfs://localhost:9000/user/hadoop/input/testinput.txt").openStream(); IOUtils.copyBytes(in, System.out, 4096, true); } catch(Exception e){ e.printStackTrace(); } } }
The returned result should be the content in the text of testinput.txt
-
Read the BLOCK information of the corresponding file in HDFS
import java.net.URI; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileStatus; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.fs.BlockLocation; public class ReadHDFSBlockContents { public static void main(String[] args){ String uri = "hdfs://localhost:9000/user/hadoop/input/testinput.txt"; Configuration conf = new Configuration(); try{ FileSystem fs = FileSystem.get(new URI(uri), conf); Path path = new Path(uri); FileStatus fileStatus = fs.getFileStatus(path); BlockLocation blkLocation[] = fs.getFileBlockLocations(fileStatus, 0, fileStatus.getLen()); for (int i=0; i<blkLocation.length; i++) { String[] hosts = blkLocation[i].getHosts(); System.out.println("block_" + i + "_location: "+hosts[0]); } } catch (Exception e){ e.printStackTrace(); } } }
-
Delete the corresponding file in the HDFS directory
import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; public class DeleteHDFSFile { public static void main(String[] args){ Configuration conf = new Configuration(); conf.set("fs.defaultFS", "hdfs://localhost:9000"); try{ FileSystem fs = FileSystem.get(conf); boolean deleteOnExit = fs.deleteOnExit(new Path("/user/hadoop/input/testinput.txt")); System.out.println(deleteOnExit); } catch (Exception e) { e.printStackTrace(); } } }
The return result is true or false