Storm Big Data learning in real-time computing deployment and installation overview 33

A: Storm Overview

 

URL: http://storm.apache.org/

 

ApacheStorm is a free open source distributed real-time computing systems. Storm can easily and reliably handle unlimited data streams for real-time processing on Hadoop batch made. Storm is very simple and can be used with any programming language, and is fun to use!

 

There are many use cases Storm: real-time analysis, online machine learning, continuous computing, distributed RPC, ETL and so on. Storm soon: a benchmark means that each node handles more than one million yuan per set. It is scalable, fault tolerance, make sure that you can get the data processing, and easy to set up and operate.

 

Storm integrated queuing and database technology that you already use. Storm topology consumption data stream and a complicated manner in any of these streams, and then re-divided between each phase flow calculations. Read more tutorial.

 

What offline computing is?

 

Get bulk data, the bulk data transfer, data storage volume, data periodicity calculation, data visualization

flume get bulk data, sqoop bulk transfer, hdfs / hive / hbase bulk storage, mr / hive calculation data, BI

 

 

What real-time computing is?

 

Generating real-time data, real-time data transmission, data calculation, real-time display

flume real-time data acquisition, kafka real-time data storage, Storm / JStorm real-time computing, real-time display (dataV / quickBI)

 

Two: Storm and Hadoop

  

 

hadoop

storm

 

Character

JobTracker

Nimbus

TaskTracker

Supervisor

Child

Worker

Application Name

Job

Topology

Programming Interface

Mapper/Reducer

Spout/Bolt

 

Three: Storm programming model

tuple: Ganso

Message transmission is a basic unit.

 

Spout: Faucet

storm's core abstraction. Source topology stream. Spout typically read data from an external data source. Is converted to internal data source.

 

主要方法:nextTuple() -》 发出一个新的元祖到拓扑。

      ack()

      fail()

 

Bolt:转接头

Bolt是对流的处理节点。Bolt作用:过滤、业务、连接运算。

 

Topology:拓扑

是一个实时的应用程序。

永远运行除非被杀死。

Spout到Bolt是一个连接流...

 

storm流式计算

hadoop与storm兼容性

 

闲聊:。。。。

spark-core

spark-sql离线计算

spark-streaming流式计算

一个团队开发 没有兼容性问题

spark团队:我要做一栈式开发平台!

但凡涉及到大数据计算 我都能搞定!

spark替代了mapreduce

spark没有底层存储

依赖hdfs

hdfs/mr............

完善整个生态圈系统!

mapreduce思想、编程 、sqoop->mr hive->mr hbasemr

dfs/mapreduce/bigtable

java/scala...

 

四:Storm集群安装部署

1)准备工作

zk01 zk02 zk03

storm01 storm02 storm03

 

2)下载安装包

http://storm.apache.org/downloads.html

3)上传

4)解压

5)修改配置文件

设置环境变量~/.bash_profile

$ vi storm.yaml

# 设置Zookeeper的主机名称

storm.zookeeper.servers:

- "bigdata11"

- "bigdata12"

- "bigdata13"

 

# 设置主节点的主机名称

nimbus.seeds: ["bigdata11"]

 

# 设置Storm的数据存储路径(需要自己提前创建)

storm.local.dir: "/root/training/storm/data"

 

# 设置Worker的端口号

supervisor.slots.ports:

- 6700

- 6701

- 6702

- 6703

分发到bigdata12 bigdata13,还有~/.bash_profile也要分发

 

6)启动nimbus

$ storm nimbus &

 

7) 启动supervisor

$ storm supervisor &

 

8)启动ui界面 端口8080

$ storm ui 

 

Storm命令行操作

1)查看命令帮助

storm help

 

2)查看版本

storm version

 

3)运行storm程序

storm jar [/路径/.jar][全类名][拓扑名称]

 

4)查看当前正在运行拓扑及其状态

storm list

 

5)终止拓扑程序

storm kill [拓扑名称]

 

6)激活拓扑程序

storm activate [拓扑名称]

 

7)禁止拓扑程序

storm deactivate [拓扑名称]

Guess you like

Origin www.cnblogs.com/hidamowang/p/10981271.html