Hadoop from entry to master series-4. HDFS overview and HDFS shell operation

table of Contents

An overview of HDFS

Two advantages and disadvantages of HDFS

2.1 Advantages

2.2 Disadvantages

Three HDFS composition structure

Four HDFS file block size

4.1 What is a block

4.2 Summary

Five HDFS shell operations

5.1 Basic Grammar

5.2 Common commands


An overview of HDFS

HDFS (Hadoop Distributed File System) is a file system, mainly used to solve the problem that an operating system cannot store a large amount of data, so if the data is stored on multiple operating systems, a file system is needed to manage multiple operating systems. HDFS is only distributed A type of document management information system.

Applicable scenarios of HDFS: suitable for write-once and read-out scenarios, and do not support file modification, suitable for data analysis, not suitable for network disk applications

Two advantages and disadvantages of HDFS

2.1 Advantages

  1. High fault tolerance: automatically save multiple copies, and when one copy is lost, it can automatically find another machine to make a backup copy
  2. Suitable for processing big data: GB, TB and above
  3. Can be built on cheap machines

2.2 Disadvantages

  1. Not suitable for low-latency data access: the vernacular simply cannot store data quickly
  2. Helpless for small files: not suitable for processing small files
  3. Concurrent writing is not supported: the same file can only be written by one thread, multiple threads are not allowed to rewrite at the same time
  4. Only data append is supported, data modification is not allowed

Three HDFS composition structure

Four HDFS file block size

4.1 What is a block

HDFS files are physically stored according to block size, and the block size can be viewed through hdfs-default.xml:

I see it! It can be set by yourself, so why is it 128m? Few people think about this problem, let me explain it carefully: First of all, we must know the role of HDFS, storage and reading, then the size of this block must ensure that its reading speed reaches the optimal right, assuming that the addressing time is 10ms, the time to find this file is 10ms; when the addressing time is 1% of the transmission time, the entire system reaches the optimal, that is, 10 / 0.01 = 1000ms = 1s; most disk transmission speeds on the market are now 100m / s , So the block size is equal to 1s times 100m / s is equal to 100m, 128m is a multiple of 1024, so the block size is set to 128m.

4.2 Summary

The size of the block often appears as an interview question. To summarize, the size of the block is based on the disk transfer rate.

Five HDFS shell operations

5.1 Basic Grammar

HDFS shell operation refers to how to operate the distributed file system on our cluster. As long as the basic syntax is two, all can be:

  • bin / hadoop fs-specific commands
  • bin / hdfs dfs-specific commands

5.2 Common commands

  • Start the cluster (operation commands not belonging to hdfs): sbin / start-dfs.sh and sbin / start-yarn.sh
  • View the file system: bin / hadoop fs -ls / or bin / hdfs dfs -ls /

Verification is correct based on the web port. Some of the latter will no longer be verified by screenshots, which is too troublesome and affects reading efficiency

  • Create a folder on HDFS: bin / hadoop fs -mkdir -p / wangleijia / wanglei or bin / hdfs dfs -mkdir -p / wangleijia1 / wanglei1

  • Cut and paste from the local system (Linux) onto HDFS: bin / hadoop fs -moveFromLocal ./wanglei.txt / user / wanglei or bin / hdfs dfs -moveFromLocal ./wanglei.txt / user / wanglei
  • Append a file to the end of the existing file: this file is the local file bin / hadoop fs -appendToFile wanglei.txt /user/wanglei/wanglei.txt or bin / hdfs dfs -appendToFile wanglei.txt /user/wanglei/wanglei.txt
  • Display the contents of files on HDFS: bin / hadoop fs -cat /wangleijia/wanglei.txt or bin / hdfs dfs -cat /wangleijia/wanglei.txt
  • Modify the file's group, read and write executable permissions, owner: bin / hadoop fs -chgrp -R newgroup / wangleijia or bin / hdfs dfs -chgrp -R newgroup / wangleijia; bin / hadoop fs -chmod 777 / wangleijia / wanglei .txt or bin / hdfs dfs -chmod 777 /wangleijia/wanglei.txt; bin / hadoop fs -chown wanglei: wanglei /wangleijia/wanglei.txt or bin / hdfs dfs -chown wanglei: wanglei /wangleijia/wanglei.txt;
  • Copy from local operating system to HDFS: bin / hadoop fs -copyFromLocal ./haidai.txt / wangleijia or bin / hdfs dfs -copyFromLocal ./haidai.txt / wangleijia
  • Download from HDFS to local: bin / hadoop fs -copyToLocal /wangleijia/wanglei.txt ./ or bin / hdfs dfs -copyToLocal /wangleijia/wanglei.txt ./
  • Mutual copy on HDFS: bin / hadoop fs -cp /wangleijia/NOTICE.txt / wangleijia1 / or bin / hdfs dfs -cp /wangleijia/NOTICE.txt / wangleijia1 /
  • Mobile on HDFS: bin / hadoop fs -mv /wangleijia/NOTICE.txt / wangleijia1 / or bin / hdfs dfs -mv /wangleijia/NOTICE.txt / wangleijia1 /
  • Download from HDFS to local: equal to copyToLocal; bin / hadoop fs -get /wangleijia/wanglei.txt ./ or bin / hdfs dfs -get /wangleijia/wanglei.txt ./
  • Upload from local to HDFS: equal to copyFromLocal; bin / hadoop fs -put ./haidai.txt / wangleijia or bin / hdfs dfs -put ./haidai.txt / wangleijia
  • Delete: bin / hadoop fs -rm -R / wangleijia or bin / hdfs dfs -rm -R / wangleijia1
  • Set the number of copies: bin / hadoop fs -setrep 5 / wangleijia or bin / hdfs dfs -setrep 5 / wangleijia

The number of copies set here is only recorded in the metadata of the NameNode. Whether there are really so many copies depends on the number of DataNodes. Because there are currently only 3 devices, and at most 3 copies, only when the number of nodes increases to 10, the number of copies can reach 10.

The above is the commonly used HDFS shell operation, and it is still well understood by those who are familiar with Linux. Note that some instructions follow the parameters, such as -R needs to be capitalized, otherwise it will be wrong, but it does not matter, the error is obvious.

Published 111 original articles · Like 60 · 70,000 + views

Guess you like

Origin blog.csdn.net/Haidaiya/article/details/84932445