Reference from: http://www.icourse163.org/course/XMU-1002335004?tid=1003024012
HDFS
Responsible for the entire distributed file storage
YARN
Responsible for scheduling computing resources such as memory, CPU bandwidth
MapReduce
Responsible for off-line and batch computing
Disk-based computing
Also
The MapReduce job analysis and optimization, constituting directed acyclic graph (sort processing procedure, to avoid duplication)
Spark
The same logic MapReduce
Memory-based calculation, the performance is much higher than MapReduce
Hive
Hadoop-based data warehousing tools
After the SQL language support, SQL statements into a MapReduce job execution
Hive on the MapReduce architecture
Pig
Hadoop-based platform for large-scale data analysis
Provide a SQL-like query language Pig Latin
Oozie
Workflow Management Tools
Zookeeper
Responsible for coordinating distributed services (cluster management, distributed lock consistency)
HBase
Distributed, column-oriented, database storage for unstructured data
Random read and write and support real-time applications
Flume
Log collection and analysis framework
Sqoop
For transmitting data between the database and traditional Hadoop
Ambari
Hadoop rapid deployment tools
Support for Apache Hadoop clusters supply, management and monitoring