Getting Started with Big Data Hadoop framework based learning

Currently artificial intelligence and big data fiery scene used more and more widely, the development of front-end daily contact with the students gradually more associated with big data development needs. Therefore large data knowledge is necessary to understand some learning.

Basic concepts

The nature of big data

First, the data storage: distributed file system (distributed storage)

Second, the calculation data: deploying computing points

Basics

Big Data requires Java learning knowledge base and knowledge base of Linux

Learning Path

(1) Java and Linux foundation basic

(2) Hadoop study: architecture, theory, programming

The first stage: HDFS, MapReduce, HBase (NoSQL database)

Phase II: Data analysis engine -> Hive, Pig

Data acquisition engine -> Sqoop, Flume

The third stage: HUE: Web Administration Tool

ZooKeeper: Hadoop achieve the HA
Oozie: workflow engine

(3) Spark learning

The first stage: Scala Programming Language

The second stage: Spark Core -> memory-based, data calculated

The third stage: Spark SQL -> similar to mysql sql statement

Phase IV: Spark Streaming -> streaming calculated: For example: waterworks

(4) Apache Storm similar: Spark Streaming -> streaming calculated

NoSQL: Redis memory-based database

HDFS

Distributed File System addresses the following issues:

• Hard disk is not big enough: more than a few hard drives, can theoretically infinite

• Data security is not enough: redundancy, hdfs redundant default is 3, with a copy to improve efficiency levels, according to the database transmission units: Hadoop1.x 64M, Hadoop2.x 128M

• Administrator: NameNode hard disk: DataNode

![image.png](http://ata2-img.cn-hangzhou.img-pub.aliyun-inc.com/8ca9f78b244c7f991e73f71fd1e56421.png)

MapReduce

Basic programming model: to split a big task into smaller tasks, then summary

• MR task: Job = Map + Reduce

Map Reduce output is input, the input and output of the MR is in HDFS

MapReduce data flow analysis:

• Map of the output is input Reduce, Reduce the input is a collection of Map

Here I would like to recommend my own build large data exchange learning skirt qq: 522 189 307, there are learning skirt big data development, if you are learning to big data, you are welcome to join small series, we are all party software development, Share occasional dry (only big data development related), including a new advanced materials and advanced big data development tutorial myself finishing, advanced welcome and want to delve into the big data companion. The above information plus group can receive
 

 

Getting started with Big Data Hadoop based learning

 

HBase

What's BigTable ?: save all data into a table, redundant ---> Benefits: Improve efficiency

• Because bigtable thought: NoSQL: HBase database

• HBase Hadoop-based HDFS's

• Description of HBase table structure

The core idea is: the use of space for efficiency

 

Getting started with Big Data Hadoop based learning

 

Hadoop environment to build

Preparing the Environment

Linux环境、JDK、http://mirrors.shu.edu.cn/apache/hadoop/common/hadoop-3.0.0/hadoop-3.0.0-src.tar.gz

installation

1, install jdk, and configuration environment variable

vim /etc/profile 末尾添加 ![image.png](http://ata2-img.cn-hangzhou.img-pub.aliyun-inc.com/a9bf2e19410f9b3d38c8b0ca64b2f264.png)

2, extract hadoop-3.0.0.tar.gz, and configure the environment variables

tar -zxvf hadoop-3.0.0.tar.gz -C /usr/local/ mv hadoop-3.0.0/ hadoop

 

Getting started with Big Data Hadoop based learning

 

 

 

 

Getting started with Big Data Hadoop based learning

 

vim / etc / profile added to the end

 

Getting started with Big Data Hadoop based learning

 

Configuration

Hadoop has three installation modes:

Local mode:

• 1 host
• do not have HDFS, MapReduce can only test program

Pseudo-distributed mode:

Host • 1
• Hadoop includes all the features of a distributed simulation on a single machine environment
• (1) HDFS: Main: NameNode, data node: DataNodes
• (2) the Yarn: container MapReduce programs running
• the master node: ResourceManager
• from node: NodeManager

Full distribution pattern:

• At least three

Our pseudo-distributed mode, for example configuration:

Modify hdfs-site.xml: 1 redundancy, permission checks false

 

 

Modified core-site.xml

 

 

Modify mapred-site.xml

 

 

Modify yarn-site.xml

 

 

Formatting NameNode

hdfs namenode -format

See common.Storage: Storage directory / usr / local / hadoop / tmp / dfs / name has been successfully formatted representation format success

start up

start-all.sh

(*) HDFS: storing data

(*)YARN:

access

 

 

HDFS: http://192.168.56.102:50070

Yarn: http://192.168.56.102:8088

 

Getting started with Big Data Hadoop based learning

 

View HDFS management interface and yarn Resource Management System

 

Getting started with Big Data Hadoop based learning

 

 

 

 

Getting started with Big Data Hadoop based learning

 

Basic operation:

HDFS related commands

 

 

 

Getting started with Big Data Hadoop based learning

 

MapReduce example

 

Getting started with Big Data Hadoop based learning

 

result:

 

Getting started with Big Data Hadoop based learning

 

As a simple example on the successful implementation of the MapReduce

Think

Hadoop is a Java-based, front-end development is the daily use of PHP in use, when the error is still pretty hard to find the. Outside of work still needs to add more points to other language-related knowledge, we have developed a programming language, learning tools, and should not become a bottleneck restricting the growth of our technology

Guess you like

Origin blog.csdn.net/yyu000001/article/details/90578113