Introduction to hadoop--a brief introduction to the principles of Hadoop 1.X system

1. Hadoop 1.x structure

write picture description here

  • HDFS: hadoop distributed storage system.
  • MapReduce: A distributed computing framework, including resource management and task scheduling (separated to Yarn components in hadoop 2.x).
  • API: The entry point for users to interact with the system. There are native MapReduce APIs, and there are also Pig, Hive, and HBase that encapsulate and abstract Map Reduce.

2. The operating principle of hadoop 1.x

write picture description here

  • 1: The client first sends a request to the JobTracker of the Master node, and the JobTracker parses the request information to determine what file to process.

  • 2–>3: The JobTracker sends a request to the NameNode in hdfs to obtain the location and name of the file and all data block information corresponding to the file.

  • 4: The JobTracker calculates the number of map tasks and reduce tasks required to process these data blocks, and adds these tasks to the task queue.

  • 5–>6: The JobTracker checks the status of the DataNode where the data block that composes the file is located, and checks whether there is a free map
    slot or reduce slot. If there is an idle slot, the JobTracker sends a request to the TaskTracker of the DataNode to process the data task, and then the TaskTracker schedules the processing resources corresponding to the slot to the map task or reduce task, and the data processing phase of the MapReduce job begins. The TaskTracker monitors the status of the task and sends the status to the JobTracker.

  • 7: When the TaskTracker learns that all tasks are completed, it feeds back the processing response to the client.

Tips:
1. TaskTracker processes local resources in the form of slots, and logically divides local resources into slots, each slot corresponds to each task.
2. DataNode will periodically send block report and node running status to NameNode, so NameNode can grasp the information of hdfs cluster in quasi-real time.


Learning materials:
1. "Hadoop For Dummies"

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325867566&siteId=291194637