0 Basics big data development need to learn what? Hadoop system

0 Basics big data development need to learn what? Hadoop system

Learning the basics of big data development needs, the next big data continue to introduce one of the key technological learning: Hadoop.

Hadoop technology system

(1 Introduction:

Hadoop is an Apache open source framework for distributed infrastructure, providing a distributed file system (HDFS), a distributed computing (MapReduce) and Uniform Resource management framework (YARN) software architecture. Users can distributed without knowing the underlying details of the development of distributed applications.

Hadoop framework core design is: HDFS, MapReduce, YARN. HDFS provides massive data storage, and vast amounts of data, compared with MapReduce provides computing, YARN provide computing services for resource scheduling program.

In 2006 a project was established in the beginning, "Hadoop" represent two components --HDFS and MapReduce. Today, this word represents the "core" (ie Core Hadoop project) and a growing eco-system related, including data storage, execution engine, programming and data access framework. Linux and similar, are made of a core and an eco-system.

I learn exciting content point

(2) origin of the name:

Hadoop is pronounced [hdu: p].

The name is the son of Hadoop Hadoop project creator Doug Cutting of a toy's name. His son has been called a yellow elephant toys for Hadoop. This is just to meet the needs of Cutting's name, a short, easy to spell and pronounce, meaningless, will not be used elsewhere. So Hadoop was born.

(3) Introducing the major players:

1) HDFS (Hadoop Distribute File System): a distributed file system that provides a high data throughput applications, highly scalable, fault tolerant access. Hadoop basic system data storage management.

2) MapReduce: is a distributed computing model, the Map and Reduce composition for large amount of data is calculated. MapReduce is divided into two phases: Map stage, Reduce stage, in which the Map stage mapping, Reduce stage of the Statute. Wherein the individual elements of the data set specified Map operation, generating key - value pairs of intermediate results; is the same as the Reduce "key" all "values" of the intermediate results for the statute, to obtain the final result. Such parallel environments for distributed large number of computers in the data processing composition.

3) YARN (Yet Another Resource Negotiator): distributed resource managers, is to separate the functions of resource scheduling and task scheduling. The biggest feature is independent of the type of task execution scheduling and running on Hadoop. Yarn may perform work other than MapReduce in Hadoop, its core is distributed scheduler.

If you are interested in big data development, want to learn the system big data, you can join the big data exchange technology to learn buttoned group: 458 Digital 345 Digital 782, welcomed the addition, private letters administrator for course descriptions, access to learning resources

I learn exciting content point

(4) Other modules Introduction (part):

4) HBase: it is based on the HDFS high reliability, high performance, column-oriented, distributed scalable NoSQL database.

5) Hive: is a Hadoop-based data warehousing tools, you can map the structure of the data file to a database table, and provides a simple sql query function, you can convert the sql statement to run MapReduce tasks.

6) Zookeeper: a distributed, open-source coordination service for distributed applications. Available features include: configuration maintenance, domain name service, distributed synchronization, group services.

7) FlumeNG: is a distributed, highly reliable data acquisition system, massive log data from various sources can be efficiently collected, polymerization, move, and finally stored in a centralized data storage system.

8) Sqoop: is an open source tool, mainly used in the conventional database (MySQL, PostgreSQL ...) between Hadoop (Hive) for data transfer, the data can be turned into a relational database to the HDFS Hadoop , the HDFS data may be turned into a relational database.

9) Pig: runs on Hadoop, for large data sets for analysis and evaluation platform.

(5) how to learn Hadoop?

1) Learn the basic requirements of Hadoop:

a, master JavaSEb, uses tools maven c, use the IDE (eclipse, IDEA) d, the system uses Linux

2) Hadoop environment to build and introduce

a, hadoop introduce b, Hadoop installation and testing of the independent mode c, Hadoop cluster configuration d, Hadoop pseudo-distribution pattern e, Hadoop fully distributed environment to build ...

3) HDFS underlying works, HDFS program

4) MapReduce principle, MapReduce practice

5) YARN Principles and Practice

6) Zookeeper Principles and Practice

7) Hbase, Hive, FlumeNG, Sqoop principle and practice.

Published 32 original articles · won praise 17 · views 40000 +

Guess you like

Origin blog.csdn.net/HAOXUAN168/article/details/104093786