Big Data introductory tutorial zero-based learning hadoop

1, Hadoop ecological profile

Hadoop is an Apache Foundation developed by the distributed integration architecture system, the user may not know at the underlying details of the distributed case, the development of distributed programs, to take full advantage of the power of the cluster for high-speed computing and storage, reliable, efficient, telescopic features

The core Hadoop is YARN, HDFS, Mapreduce, common modular architecture follows

Big Data introductory tutorial zero-based learning hadoop

 

2、HDFS

GFS papers from Google, and published in October 2013, HDFS is a clone version of GFS, HDFS is the basis for Hadoop data storage management system, which is a highly fault-tolerant system that can detect and respond to hardware failure

HDFS file consistency simplifies the model, by streaming data access, provides high-throughput access application data for applications with large data sets, which provides a mechanism for a write-once read many times, a block of data form, while in different physical machine cluster

3、Mapreduce

Derived from Google's MapReduce paper, is calculated for a large number of data, which shields the distributed computing frameworks detail, the calculation map and reduce abstracted into two portions

4, HBASE (column memory distributed database)

Bigtable paper from Google, is a built on top of HDFS, column-oriented data for structured scalable, highly reliable, high-performance column-oriented distributed and dynamic mode database

5、zookeeper

Solve data management problems in a distributed environment, a unified name, state synchronization, cluster management, configuration synchronization

6、HIVE

Revenue from the Facebook, defines a similar sql query language, SQL will be converted into mapreduce tasks performed in Hadoop above

7、flume

Log collection tool

8, yarn distributed resource managers

Is the next generation mapreduce, mainly to solve the original poor scalability of Hadoop, a variety of computing framework does not support the proposed architecture follows
Big Data introductory tutorial zero-based learning hadoop

 
The concept of big data and artificial intelligence are vague, in accordance with what the line to learn where to completion of the development, want to learn, want to learn the students welcome to join the Big Data learning skirt: 606 859 705, there are a lot of dry goods (zero basic and advanced combat classical) share to everyone, so that we know the most complete large domestic high-end real practical learning data process system. Starting from java and linux, followed by gradually deep into HADOOP-hive-oozie-web-flume-python-hbase-kafka-scala-SPARK eleven other related knowledge to share!

 

9、spark

spark provides a faster and more versatile data processing platform, and Hadoop comparison, spark can make your program running in memory

10、kafka

Distributed message queue, mainly for streaming data processing Active

11, Hadoop pseudo-distributed deployment

For now, no charge version of Hadoop There are three, all foreign manufacturers, namely

1, Apache original version

2, CDH version, for domestic users, the vast majority of the selected version

3, HDP version

Here we choose CDH version hadoop-2.6.0-cdh5.8.2.tar.gz, the environment is CentOS7.1, jdk 1.7.0_55 need more

[root@hadoop1 ~]# useradd hadoop

My system comes with default java environment are as follows

 
Big Data introductory tutorial zero-based learning hadoop

 

Add the following environmental variables

 
Big Data introductory tutorial zero-based learning hadoop

 

Do the following authorization

 

Big Data introductory tutorial zero-based learning hadoop

 

Here to Hadoop users to manage a variety of services and startup of Hadoop

 Big Data introductory tutorial zero-based learning hadoop

 

View service starts circumstances

 Big Data introductory tutorial zero-based learning hadoop

Guess you like

Origin blog.51cto.com/14342636/2401259