Detailed explanation of Hadoop ecosystem

1. Hadoop ecological overview

Hadoop is a distributed system infrastructure developed by the Apache Foundation. Users can develop distributed programs without knowing the underlying details of the distribution. Make full use of the power of the cluster for high-speed computing and storage. It is reliable, efficient and scalable.

The core of Hadoop is YARN, HDFS and Mapreduce: Official website: Apache Hadoop

Common module architecture:

2. HDFS (Hadoop Distributed File System)

Derived from Google's GFS paper, published in October 2003, HDFS is a clone of GFS. HDFS is the basis of data storage management in the Hadoop system. It is a highly fault-tolerant system capable of detecting and responding to hardware failures, designed to run on low-cost general-purpose hardware.

HDFS simplifies the consistency model of files, provides high-throughput application data access functions through streaming data access, and is suitable for applications with large data sets.

It provides a mechanism for writing once and reading multiple times, and the data is distributed in the form of blocks on different physical machines in the cluster at the same time.

3、Mapreduce&#

Guess you like

Origin blog.csdn.net/qq_35029061/article/details/132252401
Recommended