Hadoop project structure

 

Reference from: http://www.icourse163.org/course/XMU-1002335004?tid=1003024012

 

HDFS

Responsible for the entire distributed file storage

 

YARN

Responsible for scheduling computing resources such as memory, CPU bandwidth

 

MapReduce

Responsible for off-line and batch computing

Disk-based computing

 

Also

The MapReduce job analysis and optimization, constituting directed acyclic graph (sort processing procedure, to avoid duplication)

 

Spark

The same logic MapReduce

Memory-based calculation, the performance is much higher than MapReduce

 

Hive

Hadoop-based data warehousing tools

After the SQL language support, SQL statements into a MapReduce job execution

Hive on the MapReduce architecture

 

Pig

Hadoop-based platform for large-scale data analysis

Provide a SQL-like query language Pig Latin

 

Oozie

Workflow Management Tools

 

Zookeeper

Responsible for coordinating distributed services (cluster management, distributed lock consistency)

 

HBase

Distributed, column-oriented, database storage for unstructured data

Random read and write and support real-time applications

 

Flume

Log collection and analysis framework

 

Sqoop

For transmitting data between the database and traditional Hadoop

 

Ambari

Hadoop rapid deployment tools

Support for Apache Hadoop clusters supply, management and monitoring

 

Guess you like

Origin www.cnblogs.com/0nzh0/p/11057483.html