Jstorm2.1.1集群安装

Strom是什么？

storm是Twitter开源的的一个分布式的，容错的实时流计算系统，用来处理大数据系统中一些实时计算业务。strom本身是一个类似Hadoop的MapReduce的计算框架，最大不同在于storm是一个启动后不会停止的服务，除非主动kill掉，而MapReduce则会主动运行结束，storm本身并不负责存储数据，通常互联网的业务场景下strom会从kafka里面读取数据，然后计算完毕后，把计算结果写入redis，mysql或者hbase等一些存储或缓存系统中。

Strom能干什么？

Storm 有许多应用领域，包括实时分析、在线机器学习、信息流处理（例如，可以使用Storm 处理新的数据和快速更新数据库）、连续性的计算（例如，使用Storm 连续查询，然后将结果返回给客户端，如将微博上的热门话题转发给用户）、分布式RPC（远过程调用协议，通过网络从远程计算机程序上请求服务）、ETL（Extraction Transformation Loading，数据抽取、转换和加载）等。

Strom的特点：

（1）简化了编程模型，降低了开发的难度
（2）支持多语言编程
（3）高容错性
（4）可水平扩展
（5）有ack机制，保证消息可靠快速至少得到一次完整处理
（6）支持local模式，方便快速开发调试

Strom的架构模型

Jstorm是什么？

Strom原生是用Clojure开发的，阿里团队在使用过程中，发现了不少了问题，于是使用Java重写了整个storm，使其更稳定，更快，更强大，并兼容原来storm的接口，所以命名为Jstrom，在阿里文档中提到，原storm写的jar，无须改动任何代码即可高效稳定的运行在jstrom的集群中，

如何安装Jstorm？

操作系统：
Centos7
节点三个：
192.168.10.38 zk1   jdk8 nimbus+ui+tomcat
192.168.10.39 zk2   jdk8 supervisor
192.168.10.40 zk3   jdk8 supervisor

（1）在3个节点上安装JDK，不再详细叙述
（2）在3个节点上安装zookeeper，不再详细叙述，不清楚者，可参考我之前的文章：
http://qindongliang.iteye.com/category/299318
zookeeper安装完之后，需要启动
（3）在github下下载jstorm的压缩包：
https://github.com/alibaba/jstorm/releases

（4）解压到指定目录并配置环境变量：
unzip jstorm-2.1.1.zip
vi .bashrc 加入如下变量：

export PATH
export PATH=.:$PATH
#jdk
export JAVA_HOME=/home/search/jdk1.8.0_102/
export CLASSPATH=.:$JAVA_HOME/lib
export PATH=$JAVA_HOME/bin:$PATH
#jstorm
export JSTORM_HOME=/home/search/jstorm-2.1.1
export PATH=$PATH:$JSTORM_HOME/bin

（5）下载tomcat
wget http://ftp.kddilabs.jp/infosystems/apache/tomcat/tomcat-8/v8.5.3/bin/apache-tomcat-8.5.3.tar.gz
（6）安装storm-ui

//拷贝storm-ui的war包到tomcat的webapps路径下面
cp /home/search/jstorm-2.1.1/storm-ui-2.1.1.war /home/search/apache-tomcat-8.5.4/webapps
//备份旧目录
mv ROOT ROOT.old
//创建软连接，此时linux上会闪烁，不用担心，启动tomcat后，即可正常
ln -s jstorm-ui-2.1.1 ROOT
//最后进入tomcat的bin目录，启动
bin/startup.sh

（7）安装配置jstorm
编辑vi jstorm-2.1.1/conf/storm.yaml文件

########### These MUST be filled in for a storm configuration
 storm.zookeeper.servers:  
              - "192.168.10.38"
              - "192.168.10.39"
              - "192.168.10.40"

 storm.zookeeper.root: "/jstorm"

# cluster.name: "default"

 #nimbus.host/nimbus.host.start.supervisor is being used by $JSTORM_HOME/bin/start.sh
 #it only support IP, please don't set hostname
 # For example
 # nimbus.host: "10.132.168.10, 10.132.168.45"
 nimbus.host: "192.168.10.38"
 #nimbus.host.start.supervisor: false
 
# %JSTORM_HOME% is the jstorm home directory
 storm.local.dir: "%JSTORM_HOME%/data"
 # please set absolute path, default path is JSTORM_HOME/logs
# jstorm.log.dir: "absolute path"
 
# java.library.path: "/usr/local/lib:/opt/local/lib:/usr/lib"



# if supervisor.slots.ports is null, 
# the port list will be generated by cpu cores and system memory size 
# for example, 
# there are cpu_num = system_physical_cpu_num/supervisor.slots.port.cpu.weight
# there are mem_num = system_physical_memory_size/(worker.memory.size * supervisor.slots.port.mem.weight) 
# The final port number is min(cpu_num, mem_num)
# supervisor.slots.ports.base: 6800
# supervisor.slots.port.cpu.weight: 1.2
# supervisor.slots.port.mem.weight: 0.7
# supervisor.slots.ports: null
# supervisor.slots.ports:
#    - 6800
#    - 6801
#    - 6802
#    - 6803

# Default disable user-define classloader
# If there are jar conflict between jstorm and application, 
# please enable it 
# topology.enable.classloader: false

# enable supervisor use cgroup to make resource isolation
# Before enable it, you should make sure:
#       1. Linux version (>= 2.6.18)
#       2. Have installed cgroup (check the file's existence:/proc/cgroups)
#       3. You should start your supervisor on root
# You can get more about cgroup:
#   http://t.cn/8s7nexU
# supervisor.enable.cgroup: false


### Netty will send multiple messages in one batch  
### Setting true will improve throughput, but more latency
# storm.messaging.netty.transfer.async.batch: true

### if this setting  is true, it will use disruptor as internal queue, which size is limited
### otherwise, it will use LinkedBlockingDeque as internal queue , which size is unlimited
### generally when this setting is true, the topology will be more stable,
### but when there is a data loop flow, for example A -> B -> C -> A
### and the data flow occur blocking, please set this as false
# topology.buffer.size.limited: true
 
### default worker memory size, unit is byte
# worker.memory.size: 2147483648

# Metrics Monitor
# topology.performance.metrics: it is the switch flag for performance 
# purpose. When it is disabled, the data of timer and histogram metrics 
# will not be collected.
# topology.alimonitor.metrics.post: If it is disable, metrics data
# will only be printed to log. If it is enabled, the metrics data will be
# posted to alimonitor besides printing to log.
# topology.performance.metrics: true
# topology.alimonitor.metrics.post: false

# UI MultiCluster
# Following is an example of multicluster UI configuration
# ui.clusters:
#     - {
#         name: "jstorm",
#         zkRoot: "/jstorm",
#         zkServers:
#             [ "localhost"],
#         zkPort: 2181,
#       }

（8）分发配置好的jstorm包，并启动集群

A：在nimbus上，执行nohup jstorm nimbus &启动nimbus，查看$JSTORM_HOME/logs/nimbus.log，检查是否有错误
B：在supervisor节点上执行 “nohup jstorm supervisor &”, 查看$JSTORM_HOME/logs/supervisor.log检查有无错误

（9）访问nimbus所在机的ip:8080，查看Jstorm的ui图：

至此安装成功！

（10）常用命令

提交任务命令：
jstorm jar xxxx.jar 类名参数1 参数2 参数n
杀死任务命令：
jstorm kill topologyName

参考文档：
http://storm.apache.org/
https://github.com/alibaba/jstorm/wiki/JStorm-Chinese-Documentation

有什么问题可以扫码关注微信公众号：我是攻城师(woshigcs)，在后台留言咨询。
技术债不能欠，健康债更不能欠，求道之路，与君同行。

Jstorm2.1.1集群安装

猜你喜欢