The meaning of hadoop and some commonly used projects and cores

1. hadoop (due to the rapid growth of data volume, hadoop came into being)
1. What is it?
He is a java written framework
2.hadoop role?
Function: distributed storage and distributed computing
Distributed storage: (multiple machine storage, such as cloud disk storage [distributed])
HDFS: Hadoop distributed
distributed computing (Mapreduce): a set of programs runs on multiple machines ( Reasonable division of labor)
Purpose of distributed computing: save computing time and improve computing efficiency (TB, PB-level data)

Points: Not all programs can perform distributed computing (only programs that can be staged can perform distributed computing)

3. Apache Hadoop
related projects:
use the extension framework to operate, avoid a lot of low-level code
Ambar: management and monitoring platform
Avro: data serialization system (multi-language)
Cassandra: there is no single point of the database (not a single point, such as: distributed Database)
Chukwa: data collection system (collect data from different nodes and send it to different purposes)
HBase: database that supports massive structured data storage
Hive: data warehouse that supports data aggregation and ad hoc query
Mahout: machine learning and data mining library
Pig: high-level data volume language
Spark: fast general-purpose computing engine
Tez: general-purpose dataflow programming framework
ZooKeeper :

A high -performance distributed service framework that provides coordination services By MapReduce (only it belongs to apache), Storm, Spark (resource congestion, need to manage scheduling when competing) If yarn is a road, then the computing model is a variety of cars 6. HDFS architecture (1) Responsible for data distribution Storage (2) Master-slave structure: master node [namenode], slave node [datanode] (3) namenode is responsible for: receiving user requests, maintaining the directory structure of the file system, called namespace (4) datanode is responsible for: storing files 7 .Yarn architecture (1) resource scheduling and management platform (2) master-slave structure: master node [ResourceManager], slave node [NodeManager] (3) ResourceManager is responsible for: allocation and scheduling of cluster resources (4) NodeManager is responsible for: single Management of node resources 8. Architecture of MapReduce (1) Batch computing model that relies on disk io
























(2) Master-slave structure: master node [JobTracker], slave node [TaskTracker]
(3) JobTracker is responsible for: receiving computing tasks submitted by customers, assigning computing tasks to TaskTracker for execution, that is, task scheduling, monitoring the execution of TaskTracker
(4) ) TskTracker is responsible for: Execute tasks assigned by JobTracker

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326170093&siteId=291194637