1. hadoop (due to the rapid growth of data volume, hadoop came into being)
1. What is it?
He is a java written framework
2.hadoop role?
Function: distributed storage and distributed computing
Distributed storage: (multiple machine storage, such as cloud disk storage [distributed])
HDFS: Hadoop distributed
distributed computing (Mapreduce): a set of programs runs on multiple machines ( Reasonable division of labor)
Purpose of distributed computing: save computing time and improve computing efficiency (TB, PB-level data)
Points: Not all programs can perform distributed computing (only programs that can be staged can perform distributed computing)
3. Apache Hadoop
related projects:
use the extension framework to operate, avoid a lot of low-level code
Ambar: management and monitoring platform
Avro: data serialization system (multi-language)
Cassandra: there is no single point of the database (not a single point, such as: distributed Database)
Chukwa: data collection system (collect data from different nodes and send it to different purposes)
HBase: database that supports massive structured data storage
Hive: data warehouse that supports data aggregation and ad hoc query
Mahout: machine learning and data mining library
Pig: high-level data volume language
Spark: fast general-purpose computing engine
Tez: general-purpose dataflow programming framework
ZooKeeper :
A high -performance distributed service framework that provides coordination services By MapReduce (only it belongs to apache), Storm, Spark (resource congestion, need to manage scheduling when competing) If yarn is a road, then the computing model is a variety of cars 6. HDFS architecture (1) Responsible for data distribution Storage (2) Master-slave structure: master node [namenode], slave node [datanode] (3) namenode is responsible for: receiving user requests, maintaining the directory structure of the file system, called namespace (4) datanode is responsible for: storing files 7 .Yarn architecture (1) resource scheduling and management platform (2) master-slave structure: master node [ResourceManager], slave node [NodeManager] (3) ResourceManager is responsible for: allocation and scheduling of cluster resources (4) NodeManager is responsible for: single Management of node resources 8. Architecture of MapReduce (1) Batch computing model that relies on disk io
(2) Master-slave structure: master node [JobTracker], slave node [TaskTracker]
(3) JobTracker is responsible for: receiving computing tasks submitted by customers, assigning computing tasks to TaskTracker for execution, that is, task scheduling, monitoring the execution of TaskTracker
(4) ) TskTracker is responsible for: Execute tasks assigned by JobTracker
The meaning of hadoop and some commonly used projects and cores
Guess you like
Origin http://43.154.161.224:23101/article/api/json?id=326170093&siteId=291194637
Ranking