In-depth interpretation of large data framework Hadoop HDFS architecture

Hadoop Distributed File System (HDFS) is a distributed file system. It has many similarities with existing distributed file system. However, the difference with other distributed file system is worthy of our attention:

HDFS is highly fault-tolerant, designed to be deployed on low-cost hardware. (High fault tolerance)

HDFS provides high throughput data access for applications with large data sets. (High throughput)

HDFS relaxed some POSIX requirements to enable streaming access to the file system data. (Stream access)

HDFS was originally developed as infrastructure Apache Nutch web search engine project is constructed. HDFS is part of the Apache Hadoop Core project.

Objectives and assumptions

Want to be a big data cloud computing Spark master, look here! I read poke

50W annual salary of Java programmers to turn Big Data learning route poke me read

Artificial intelligence, big data trends and prospects poke me read

The latest and most complete big data exchange system path! ! Poke me read

2019 latest! Big Data Engineer jobs salary, it was amazing ! I read poke

Hardware fault detection: Hardware failure is the norm rather than the exception. Hadoop usually deployed in low-cost hardware, and typically contain hundreds of servers, each part of the file system data are stored. Due to the large number of components, and each component has a non-negligible probability of failure (non-trivial), and certain components which means that HDFS is always ineffective. Therefore, failure detection and recovery is quick HDFS core architecture goals.

Streaming access latency HDFS more suitable for interactive use batch rather than pay more attention to high-throughput data access rather than data access: Applications running on HDFS need streaming access to their data set.

Large data sets : an application running on HDFS having large datasets, a typical size of a file to the HDFS g TB is, therefore, HDFS is tuned to support large files. It should provide a high polymerization data bandwidth, and can be extended to a single cluster hundreds of nodes. It should support tens of millions of files in a single instance.

Consistency Model : HDFS application requires a write once read many times the file access model. Once the file is created, written and closed, in addition to append and truncate operation, no need to change. Support will append to the end of the file, but can not be updated at any point. This assumption simplifies data consistency and to achieve a high throughput of data access. MapReduce application or a Web crawler application is fully fit this model.

Mobile computing cheaper than mobile data: computing application requests data, if performed in the vicinity of its operation much more efficient. When the size of a large data set in particular. This minimizes network congestion and improve overall system throughput. Therefore, a better approach is to migrate to a location closer to the calculated data is located, rather than moving data to a location to run the application. HDFS provides interfaces for applications to make themselves closer to the location where the data resides.

And cross-platform portability: Hadoop Java language development, making Hadoop has a good cross-platform.

NameNode sum DataNodes

HDFS having a main / architecture from (master / slave). HDFS cluster consists of a NameNode and many DataNode composition, NameNode is a master server (master), manages the file system namespace and manage the guest access to the data (u NameNode act as housekeeper / u in the Hadoop cluster). Further each node in the cluster is typically a DataNode, Management DataNode data stored on their nodes.

HDFS disclosed file system namespace, and allow the user data stored in the file. Internally, the file is divided into one or more blocks (Block), these blocks are stored in the DataNode. NameNode perform file system name space-related operations, such as opening, closing, and renaming files and directories. It also identifies the block map to DataNode (block into which the DataNode). Data node is responsible for the service read and write requests from the file system client. Node also performs data blocks created based NameNode command, delete, and copy.

Thorough understanding of Hadoop HDFS architecture

NameNode single cluster structure greatly simplifies the system architecture. NameNode all HDFS metadata arbiter and managers, so that the user data is never flows NameNode.

File system name space (namespace)

HDFS supports the traditional hierarchical file organization structure. User or application can create a directory, and then save the file in the directory. Hierarchical file system namespace and most existing file systems like this: Users can create, delete, move or rename files. Currently, HDFS does not support user disk quotas and access controls, does not support hard links and soft links. But HDFS architecture does not preclude implementing these features.

NameNode responsible for maintaining the name space file system, any changes to the file system name space, or property will be NameNode recorded. The application can set the number of copies of the files stored in HDFS. The number of copies of the file is called the replication factor of a file, this information is also saved by the NameNode.

Data Replication

HDFS is designed to cross-machine reliably store large files in a large cluster. Each file stores it into a series of blocks, except the last, all the data blocks are the same size. For fault tolerance, all data blocks will have a copy of the file. And coefficient data block size copy of each file are configurable. An application can specify the number of copies of a file. A copy of the coefficients can be specified when the file is created, it can be changed later. HDFS files are write-once, and strict requirements at any time only one writer.

NameNode full copy management data blocks, which are periodically received from each of the DataNode heartbeat (Heartbeat) cluster and status reporting block (Blockreport).

Receive the heartbeat signal DataNode means that the node is working properly.

Block status report contains a list of all data blocks on the Datanode.

Thorough understanding of Hadoop HDFS architecture

A copy of the deposit: most start of step

The key is to store a copy of the reliability and performance of HDFS. Optimized copy storage strategy is distinguished from most other HDFS distributed file system, an important characteristic. This feature requires a lot of tuning, and need to accumulate experience. Reliability called strategy employed HDFS rack awareness (rack-aware) to improve data availability and utilization of network bandwidth. Storing a copy of the policy currently implemented only the first step in this direction. The short-term strategy to achieve this goal is to verify its validity in a production environment, observing its behavior, testing and research to lay the foundation for even more advanced strategies.

Large HDFS instance on a cluster computer typically runs across a plurality of racks in a composition, communication between two machines on different racks need to go through the switch. In most cases, the same bandwidth between two machines within a rack will be larger than the bandwidth between two machines of different rack.

A rack perceived by a process, NameNode id may be determined for each rack DataNode belongs. A simple but no optimization strategy is a copy stored on different racks. This can effectively prevent the loss of the entire rack when data failure and allowed time to fully utilize the bandwidth of the read data of the plurality of racks. This strategy can be set uniform distribution of copies in the cluster, when the component is conducive to load balancing in the case of failure. However, because this strategy needs to transmit a block of data write operation to the plurality of racks, which increases the cost of writing.

In most cases, the replication factor is 3, HDFS storage strategy is to store a copy on the local node rack, a copy of another node on the same rack, and the last copy in different racks node. This strategy reduces the data transmission between the racks, which improves the efficiency of the write operation. Far less than the rack error error node, so this policy does not affect the reliability and availability of data. Meanwhile, since the data block on only two (not three) of the racks, so this strategy reduces the total bandwidth required for transmission of network data is read. In this strategy, the copy is not evenly distributed across different racks. In the case of a third of a copy of a node, two-thirds of copies on a rack, additional copies evenly distributed in the rest of the rack, this strategy without compromising data reliability and read performance under improves write performance.

Replica selection

In order to reduce the overall bandwidth consumption and read latency, HDFS will try to make reading program reads the nearest copy. If you have a copy on the same rack reading program, then read the copy. If a HDFS cluster spans multiple data centers, then the client will first read a copy of the local data center. (The principle of proximity)

Safe Mode

After the start NameNode will enter a special state called safe mode. NameNode in the secure mode is not replicated data block. NameNode receiving heartbeat and status reports from all of the blocks DataNode. Block status report includes a list of all the data blocks DataNode. Each data block has a specified minimum number of copies. When testing to confirm the number of copies NameNode a data block reaches the minimum value, then the block will be considered a copy of the security (safely replicated); a data block of a certain percentage (this parameter is configurable) is detected to be confirmed NameNode after security (plus an additional 30 seconds of waiting time), NameNode will exit safe mode state. It then determines which there is a copy of the data block does not reach the specified number and copy these blocks to other DataNode.

Persistent file system metadata

HDFS holds DataNode space on NameNode. Generating modified operating any file system metadata, NameNode will EditLog called transaction log is recorded. For example, a file created in HDFS, will insert a record in the NameNode to represent Editlog; Likewise, a copy of the file will also modify the coefficient Editlog to insert a record. This Editlog NameNode stored in the local operating system's file system. DataNode entire file system space, to map data block includes a file attribute of the file, etc., are stored in a file called FsImage in this document is also on the local file system where NameNode.

NameNode holds the entire file system of DataNode space in memory and map the data block (Blockmap) image. This critical metadata structures designed to be compact, and therefore have a 4G memory NameNode enough to support a large number of files and directories. When NameNode starts, it reads Editlog and FsImage from the hard disk, the role of all transactions in Editlog in the memory FsImage, and this new version of FsImage saved from memory to the local disk, then delete the old Editlog because the old Editlog transactions are already acting on the FsImage up. This process is called a checkpoint (checkpoint). In the current implementation, the checkpoint only occurs when starting NameNode, will be achieved in the near future to support periodic checkpoints.

Datanode the HDFS data is stored as a file on the local file system, it does not know the information about the HDFS file. It HDFS each individual data block is stored in a file in the local file system. Datanode not created in the same directory all the files, in fact, it uses heuristic methods to determine the optimal number of files per directory, and create a subdirectory at the appropriate time. Create all local files are not the best choice in the same directory, this is because the local file system may not efficiently support a large number of files in a single directory. When a Datanode starts, it scans the local file system, generates a list of all HDFS data blocks corresponding to one of these local file, and then sent to a NameNode report, this report is the block status report.

Protocol

All HDFS communication protocols are built on top of TCP / IP protocol. Client through a TCP port configurable to NameNode, by interacting with NameNode ClientProtocol protocol. The Datanode use DatanodeProtocol agreement NameNode interaction. A remote procedure call (RPC) model is abstracted and encapsulated ClientProtocol Datanodeprotocol protocol. In the design, NameNode not initiate RPC, but the response from the client or Datanode RPC requests.

Robustness

The main objective of HDFS is that even in the case of an error but also to ensure the reliability of data storage. Three types of errors are common: NameNode error, Datanode errors and fragmented network (network partitions).

Disk data errors, the heartbeat detection and recopied

Datanode Each node periodically sends a heartbeat signal to NameNode. Network fragmentation may lead to the loss of part of Datanode Contact NameNode. NameNode by deletion heartbeat signal to detect this situation and does not send these heartbeats Datanode recent marked down, then the new IO request is not sent to them. Data stored on any downtime Datanode no longer valid. Datanode downtime may cause some copies of the data block coefficient is less than the specified value, the NameNode continuously detected data block which needs to be copied, the copy operation is started if it is found. In the following cases, you may need to re-copy: Datanode a node failure, a copy is damaged, the replication factor hard disk error on Datanode, or file increases.

Balanced cluster

HDFS architecture supports data balancing policy. If the free space on a node Datanode below a certain critical point, according to a balanced strategy system will automatically move the data from this Datanode to other free Datanode. When a request for a file sudden increase, it is also possible to start a program to create a new copy of the file, and at the same time rebalance other data in the cluster. These balancing strategy has not yet achieved.

Data integrity

It is possible to obtain from a data block Datanode is damaged, damage may be caused by an error Datanode storage device, a network error or a software bug. HDFS client software realizes the contents of HDFS file checksum (checksum) check. When creating a new client file HDFS, we calculate the checksum for each file and data blocks and the checksum as a separate hidden file in the same HDFSDataNode space. When the client gets the contents of the file, it will verify the data obtained from Datanode with the corresponding checksum file checksums match, if not match, the client can choose to obtain a copy of the data block from another Datanode.

Metadata disk error

FsImage and Editlog are core data structures of HDFS. If these files are damaged, the entire HDFS instances will fail. Thus, NameNode can be configured to support maintaining multiple copies of FsImage and Editlog. Any modifications to FsImage or Editlog, will be synchronized to a copy of them. Number DataNode Space Affairs multiple copies of this synchronization process may reduce NameNode per second. However, this price is acceptable, because even though HDFS applications are data-intensive, they are also non-metadata-intensive. When NameNode restart, it will select the most recent complete FsImage and Editlog to use.

Another option to increase resiliency is the use of a plurality of shared storage NameNode on NFS distributed or edit log (referred to Journal) to enable high availability. The latter is the recommended method.

Snapshot

Snapshots support storing a copy of data at a particular time. One use of the snapshot function may be damaged roll back to the previous instance of HDFS known good point in time.

Data Organization

data block

HDFS is designed to support large files for HDFS are those applications need to process large data sets. These data applications are written only once, but read one or more times, and the reading speed should be able to meet the needs of streaming read. HDFS support files "Write Once Read Many" semantics. A typical block size is 128MB. Thus, according to the HDFS file 128M always be cut into different blocks, as each block is stored in the different Datanode.

Copy pipeline

When the client writing data to HDFS file, a start is written to the local temporary file. A copy of the file is assumed that the coefficient is set to 3, when the cumulative size of the files to a local temporary block, the client obtains a list from Datanode NameNode for storing copies. Then the client starts the data transmission to the first Datanode first Datanode a fraction of a small portion (4 KB), the reception is written in each part of the local repository, and simultaneously transmits the second portion to the list Datanode node. The second Datanode same way, a fraction of a fraction of the received data, written into the local repository, while the third pass Datanode. Finally, the third Datanode reception data and stored locally. Thus, Datanode can be pipelined from front to a data receiving node, and forwarding to the next node at the same time, the previous data in a pipelined manner Datanode a copy to the next.

Accessibility

HDFS can access applications from a number of different ways. Local, HDFS provides FileSystem Java API for the application. A C language package Java API and REST API is also available. In addition, HTTP browser, can also be used for browsing files HDFS instance. By using NFS gateway, HDFS can be installed as part of the client's local file system.

FS Shell

HDFS organize user data in the form of files and directories. It provides a command line interface (FS Shell) allows data users to interact with HDFS in. The syntax and commands familiar to users of other shell (such as bash, csh) similar tools. Here are some examples of actions / commands:

Thorough understanding of Hadoop HDFS architecture

FS shell is suitable for applications need to store data scripting language to interact with.

DFSAdmin

A typical installation configuration HDFS Web server through a TCP port configurable HDFS namespace. This allows users to use a Web browser to navigate HDFS namespace and view the contents of their files.

Thorough understanding of Hadoop HDFS architecture

Storage space recovery

Delete and restore files

If you configure the Recycle Bin is enabled, when a user or application to delete a file, the file did not immediately deleted from the HDFS. In fact, HDFS will rename the file transfer to / trash directory (/user/username/.Trash). As long as the file is still / trash directory, the file can be quickly restored. Save the file in / trash time is configurable, when more than this time, NameNode will delete the file from DataNode space. Delete the file will cause the file-related data blocks are released. Note that deleting files from the user to have some time between HDFS increase free space delay.

The following is an example, it will show how FS Shell delete files from the HDFS. We created two files in the directory delete (test1 and test2)

$ hadoop fs -mkdir -p delete/test1

$ hadoop fs -mkdir -p delete/test2

$ hadoop fs -ls delete/

Found 2 items

drwxr-xr-x - hadoop hadoop 0 2015-05-08 12:39 delete/test1

drwxr-xr-x - hadoop hadoop 0 2015-05-08 12:40 delete/test2

We will delete the file test1. The document notes below show has been moved to / trash directory

$ hadoop fs -rm -r delete/test1

Moved: hdfs://localhost:8020/user/hadoop/delete/test1 to trash at: hdfs://localhost:8020/user/hadoop/.Trash/Current

Now we will use skipTrash option to delete the file, this option does not send the file to the Trash. It will be completely removed from the HDFS.

$ Hadoop fs -rm -r -skipTrash delete / test2

Deleted delete/test2

We can now see Trash directory contains only file test1.

$ hadoop fs -ls .Trash/Current/user/hadoop/delete/

Found 1 items\

drwxr-xr-x - hadoop hadoop 0 2015-05-08 12:39 .Trash/Current/user/hadoop/delete/test1

Therefore test1 file into the Trash and permanently delete files test2.

Reduce replication factor

When a copy of a file coefficient is reduced, NameNode excess copies will choose Delete. The next heartbeat will pass information to Datanode detection. Datanode moved for removing corresponding data block, the cluster of free space increases. Similarly, in the end setReplication API calls and increase the free space between the cluster there will be some delays.

The new technology trend has been booming into the sky, sooner lost opportunities, the first to master the core technology of data People's Congress, will be pushed to the era of the huge influx of higher success. Come post coordinate learn it!

In the sea the same level major courses will not only coordinate the comprehensive knowledge to the students about the theory, but also with the actual needs of enterprises and the development of current popular, real-time updates, and enable trainees to practice by independent operation of the project.

The following is a post-coordinate some large data stage combat course project

1 project combat: large-scale distributed electricity supplier "shopping street" System Architecture

Project combat 2: Big Data Internet public opinion analysis system

Project combat 3: shared bicycle riding Analysis System

Project combat 4: outside selling single big data analysis system

Project combat 5: Auto Portrait system big data users

In-depth interpretation of large data framework Hadoop HDFS architecture

Guess you like