Big Data Technology Principles and Applications: [Lecture] Hadoop big data processing architecture

Introduction 2.1 Hadoop

Founder: Doug Cutting

 

1 Introduction:

 

Free open source;

Simple operation, greatly reduce the complexity of use;

Hadoop is a Java development;

On the Hadoop development and application support for multiple programming languages, not limited to Java;

 

Hadoop two core: HDFS + MapReduce

HDFS: mass data storage

MapReduce: massive data processing

 

2. Origin:

It was originally a text search library, imitating Google's search engine;

Into the Google technologies: distributed file system GFS; distributed parallel programming framework MapReduce;

 

3. History of fame: data sorting proud achievements

 

4. Characteristics:

1. High reliability

2. efficiency

3 scalability

4. A high fault tolerance

5. Low Cost

6. run on the Linux platform

7. supports a variety of programming languages

 

5. Application Status:

For example: Facebook

 

 

 

2.2 Hadoop project structure

 HDFS: distributed file storage

MapReduce: data processing, disk-based

The Spark (MapReduce performance than an order of magnitude): data processing, memory-based

Hive: data warehouse; decision making analysis; support for SQL statements (SQL statements to turn into MapReduce jobs, go to execution);

Pig: stream data processing, lightweight data; providing SQL-like query Pig Latin;

Oozie: workflow scheduling system

Zookeeper: distributed coordination services; distributed lock; cluster management;

HBase: column families database, random access

Flume: log collection

Sqoop: importing and exporting data, relational database to HDFS, HBase, Hive transconductance

Ambari: Rapid Deployment Tool

 

Installation and use of 2.3 Hadoop

1.Linux options:

Select the version of Linux: Ubuntu

Memory options: look at the computer. Memory than 4G, select 64

2. The system is installed virtual machine or dual system:

See computer configuration

Relatively new computer, install a virtual machine

3. With regard to Linux Basics

1.Shell: command parser

2.sudo command: rights management mechanism, administrators can authorize ordinary users to perform some operations require root privileges to perform

3. Enter the password: can not see the password you entered

4. English switch input method: "shift" key

Adhesive 5.Ubuntu terminal assignment shortcut: ctrl + shift + V

4. Installation:

单机模式,伪分布式模式,分布式模式

 

5.创建虚拟机:

1.材料与工具:虚拟机软件与系统映像文件

 2.确认系统版本:

 

2.4 Hadoop集群的部署与使用

考虑HDFS和MapReduce

 

(后补)

 

慕课链接:https://www.icourse163.org/learn/XMU-1002335004?tid=1003965001#/learn/content

Guess you like

Origin www.cnblogs.com/musecho/p/10991177.html