A, HDFS and the advantages and disadvantages of Introduction
HDFS (Hadoop Distributed File System) is an important part hadoop ecosystem is hadoop the storage components, the position in the Hadoop unusual, most part of the foundation, as it relates to data storage, and so the MapReduce calculation model must rely on the data stored in the HDFS. HDFS is a distributed file system to store large files streaming data access mode, the memory block to the data on different machines in a cluster commercial hardware.
Here we focus on several concepts which relate to: (1) large files . The current hadoop cluster can store hundreds of TB or even PB-level data. (2) the streaming data access . HDFS access mode is: write once, read many times , more attention is to read the entire data set of the whole time. (3) commodity hardware. HDFS clusters of how expensive equipment and does not require special hardware can be as long as some ordinary everyday use, and as such, the possibility of hdfs node failure is very high, so there must be mechanisms to deal with this single point of failure , a reliable guarantee for the data. (4) does not support the low latency data access time . hdfs concern is the high data throughput, not suitable for those applications that require low latency data access time. (5) single-user writing is not supported any modification. hdfs read-mostly data, only supports a single writer, and write operations are always added in the form of the text is added at the end, in the modification is not supported anywhere.
Two, HDFS operations in Shell
bin / hadoop fs particular command or bin / hdfs dfs specific commands
The following are a few commonly used commands
( 0 ) Start Hadoop cluster (to facilitate subsequent test)
$ sbin/start-dfs.sh
$ sbin/start-yarn.sh
( . 1 ) -help : output of this command parameter
$ hadoop fs -help rm
( 2 ) -ls: display directory information
$ Hadoop fs -Ls /
( 3 ) -mkdir : In the Create a directory on HDFS
$ Hadoop fs -mkdir -p / sanguo / shuguo
( . 4 ) -moveFromLocal : Paste from the local cut to HDFS
Touch kongming.txt $
$ hadoop FS -moveFromLocal ./kongming.txt / sanguo / shuguo
( 5 ) -appendToFile : Append a file to the end of the file already exists
Touch liubei.txt $
$ vi liubei.txt
Entry
san gu mao lu
carried out
$ hadoop fs -appendToFile liubei.txt /sanguo/shuguo/kongming.txt
( 6 ) -cat : display file contents
$ hadoop fs -cat /sanguo/shuguo/kongming.txt
(7 ) -chgrp , -chmod , -chown : Linux as file system usage permissions to modify the file belongs
$ hadoop fs -chmod 666 /sanguo/shuguo/kongming.txt $ hadoop fs -chown atguigu:atguigu /sanguo/shuguo/kongming.txt
(. 8 ) -copyFromLocal : from the local file system to copy files to HDFS path
$ hadoop fs -copyFromLocal README.txt /
(. 9 ) -copyToLocal : from copy to the local HDFS
$ hadoop fs -copyToLocal /sanguo/shuguo/kongming.txt ./
( 10 ) -cp : from copy HDFS path to another path of HDFS
$ hadoop fs -cp /sanguo/shuguo/kongming.txt /zhuge.txt
( 11 ) -mv : in moving files in HDFS directory
$ Hadoop fs -mv /zhuge.txt / sanguo / shuguo /
( 12 ) -get : equivalent to copyToLocal , that is, from HDFS download the file to a local
$ hadoop fs -get /sanguo/shuguo/kongming.txt ./
( 13 ) -getmerge : Merge download multiple files, such as HDFS directory / user / atguigu / test multiple files : log.1, log.2, log.3, ...
$ hadoop fs -getmerge /user/atguigu/test/* ./zaiyiqi.txt
( 14 ) -put : equivalent to copyFromLocal
$ hadoop fs -put ./zaiyiqi.txt /user/atguigu/test/
(15 ) -tail : appears at the end of a file
$ hadoop fs -tail /sanguo/shuguo/kongming.txt
( 16 ) -rm : delete a file or folder
$ hadoop fs -rm /user/atguigu/test/jinlian2.txt
( 17 ) -rmdir : Delete empty directory
$ hadoop fs -mkdir /test
$ hadoop fs -rmdir /test
( 18 ) -du size information folder statistics
$ hadoop fs -du -s -h /user/atguigu/test
2.7 K /user/atguigu/test
$ hadoop fs -du -h /user/atguigu/test
1.3 K /user/atguigu/test/README.txt
15 /user/atguigu/test/jinlian.txt
1.4 K /user/atguigu/test/zaiyiqi.txt
( 19 ) -setrep : Set the number of copies of the files in HDFS
$ hadoop fs -setrep 10 /sanguo/shuguo/kongming.txt
Here set the number of copies of the records in just NameNode of metadata, whether there really are so many copies, will have to see DataNode number. Because currently only 3 sets of equipment, at most 3 Ge copies, only the number of nodes increased to 10 Shi Taiwan , copy number to reach 10 .
Three, HDFS client operations
Hadoop first need good preparation in the local computer environment variables before they can
Using the idea to create an empty Maven project, has been added at the coordinates
<dependencies> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>RELEASE</version> </dependency> <dependency> <groupId>org.apache.logging.log4j</groupId> <artifactId>log4j-core</artifactId> <version>2.8.2</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <version>2.7.2</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>2.7.2</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-hdfs</artifactId> <version>2.7.2</version> </dependency> </dependencies>
本次导入的坐标时间较长,需要耐心等待
需要在项目的src/main/resources目录下,新建一个文件,命名为“log4j.properties”,在文件中填入
log4j.rootLogger=INFO, stdout log4j.appender.stdout=org.apache.log4j.ConsoleAppender log4j.appender.stdout.layout=org.apache.log4j.PatternLayout log4j.appender.stdout.layout.ConversionPattern=%d %p [%c] - %m%n log4j.appender.logfile=org.apache.log4j.FileAppender log4j.appender.logfile.File=target/spring.log log4j.appender.logfile.layout=org.apache.log4j.PatternLayout log4j.appender.logfile.layout.ConversionPattern=%d %p [%c] - %m%n
创建文件夹↓
@Test public void testMkdirs() throws IOException, InterruptedException, URISyntaxException { // 1 获取文件系统 Configuration configuration = new Configuration(); // 配置在集群上运行 // configuration.set("fs.defaultFS", "hdfs://hadoop102:9000"); // FileSystem fs = FileSystem.get(configuration); FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:9000"), configuration, "root"); // 2 创建目录 fs.mkdirs(new Path("/1108/daxian/banzhang")); // 3 关闭资源 fs.close(); }
从本地上传文件↓
@Test public void testCopyFromLocalFile() throws IOException, InterruptedException, URISyntaxException { // 1 获取文件系统 Configuration configuration = new Configuration(); configuration.set("dfs.replication", "2"); FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:9000"), configuration, "root"); // 2 上传文件 fs.copyFromLocalFile(new Path("D:/banzhang.txt"), new Path("/banzhang.txt")); // 3 关闭资源 fs.close(); System.out.println("over"); }