Introduction to hadoop concepts and framework

1. Introduction to Hadoop

1. Introduction to Hadoop
1) Hadoop is a distributed system infrastructure developed by the Apache Foundation.
Users can develop distributed programs without knowing the underlying details of distribution. Make full use of the power of clusters for high-speed computing and storage.
Hadoop is an open source, highly reliable, scalable distributed software framework.
2) Hadoop composition
(1) MapReduce------Computation
(2) Yarn---- ---------------Resource Scheduling
(3) HDFS---------------Data Storage
(4) Common-----------auxiliary tools
Insert image description here
Insert image description here
HDFS architecture
1) NameNode (nn): storage Metadata of the file, such as file name, file directory structure, file attributes (generation time, number of copies, file permissions), as well as the block list of each file and the DataNode where the blocks are located, etc.
2) DataNode (dn): stores file block data in the local file system, and verifies block data
3) Secondary NameNode (2nn): used to Auxiliary background program that monitors HDFS status and obtains snapshots of HDFS metadata at regular intervals
Yarn architecture
1) ResourceManager (RM) handles client requests a> 2) The Reduce stage summarizes the Map results 1 ) The Map stage processes the input data in parallel MapReduce architecture 4) Resource abstraction in Container Yarn encapsulates multi-dimensional resources on a certain node, such as memory, CPU, disk, network, etc. 3) ApplicationMaster (AM) is responsible for data segmentation, applying for resources for applications and assigning them to internal tasks, task monitoring and fault tolerance 2) NodeManager (NM) manages resource processing on a single node from ResourceManager Command processing commands from ApplicationMaster ApplicationMaster resource allocation and scheduling
Monitoring NodeManager startup or monitoring






3) The core design of the Hadoop framework is: HDFS and MapReduce.
HDFS provides storage for massive data, while MapReduce provides calculation for massive data.

Guess you like

Origin blog.csdn.net/weixin_42223850/article/details/97672235