Strom是什么?
storm是Twitter开源的的一个分布式的,容错的实时流计算系统,用来处理大数据系统中一些实时计算业务。strom本身是一个类似Hadoop的MapReduce的计算框架,最大不同在于storm是一个启动后不会停止的服务,除非主动kill掉,而MapReduce则会主动运行结束,storm本身并不负责存储数据,通常互联网的业务场景下strom会从kafka里面读取数据,然后计算完毕后,把计算结果写入redis,mysql或者hbase等一些存储或缓存系统中。
Strom能干什么?
Storm 有许多应用领域,包括实时分析、在线机器学习、信息流处理(例如,可以使用Storm 处理新的数据和快速更新数据库)、连续性的计算(例如,使用Storm 连续查询,然后将结果返回给客户端,如将微博上的热门话题转发给用户)、分布式RPC(远过程调用协议,通过网络从远程计算机程序上请求服务)、ETL(Extraction Transformation Loading,数据抽取、转换和加载)等。
Strom的特点:
(1)简化了编程模型,降低了开发的难度
(2)支持多语言编程
(3)高容错性
(4)可水平扩展
(5)有ack机制,保证消息可靠快速至少得到一次完整处理
(6)支持local模式,方便快速开发调试
Strom的架构模型
Jstorm是什么?
Strom原生是用Clojure开发的,阿里团队在使用过程中,发现了不少了问题,于是使用Java重写了整个storm,使其更稳定,更快,更强大,并兼容原来storm的接口,所以命名为Jstrom,在阿里文档中提到,原storm写的jar,无须改动任何代码即可高效稳定的运行在jstrom的集群中,
如何安装Jstorm?
操作系统:
Centos7
节点三个:
192.168.10.38 zk1 jdk8 nimbus+ui+tomcat
192.168.10.39 zk2 jdk8 supervisor
192.168.10.40 zk3 jdk8 supervisor
(1)在3个节点上安装JDK,不再详细叙述
(2)在3个节点上安装zookeeper,不再详细叙述,不清楚者,可参考我之前的文章:
http://qindongliang.iteye.com/category/299318
zookeeper安装完之后,需要启动
(3)在github下下载jstorm的压缩包:
https://github.com/alibaba/jstorm/releases
(4)解压到指定目录并配置环境变量:
unzip jstorm-2.1.1.zip
vi .bashrc 加入如下变量:
export PATH export PATH=.:$PATH #jdk export JAVA_HOME=/home/search/jdk1.8.0_102/ export CLASSPATH=.:$JAVA_HOME/lib export PATH=$JAVA_HOME/bin:$PATH #jstorm export JSTORM_HOME=/home/search/jstorm-2.1.1 export PATH=$PATH:$JSTORM_HOME/bin
(5)下载tomcat
wget http://ftp.kddilabs.jp/infosystems/apache/tomcat/tomcat-8/v8.5.3/bin/apache-tomcat-8.5.3.tar.gz
(6)安装storm-ui
//拷贝storm-ui的war包到tomcat的webapps路径下面 cp /home/search/jstorm-2.1.1/storm-ui-2.1.1.war /home/search/apache-tomcat-8.5.4/webapps //备份旧目录 mv ROOT ROOT.old //创建软连接,此时linux上会闪烁,不用担心,启动tomcat后,即可正常 ln -s jstorm-ui-2.1.1 ROOT //最后进入tomcat的bin目录,启动 bin/startup.sh
(7)安装配置jstorm
编辑vi jstorm-2.1.1/conf/storm.yaml文件
########### These MUST be filled in for a storm configuration storm.zookeeper.servers: - "192.168.10.38" - "192.168.10.39" - "192.168.10.40" storm.zookeeper.root: "/jstorm" # cluster.name: "default" #nimbus.host/nimbus.host.start.supervisor is being used by $JSTORM_HOME/bin/start.sh #it only support IP, please don't set hostname # For example # nimbus.host: "10.132.168.10, 10.132.168.45" nimbus.host: "192.168.10.38" #nimbus.host.start.supervisor: false # %JSTORM_HOME% is the jstorm home directory storm.local.dir: "%JSTORM_HOME%/data" # please set absolute path, default path is JSTORM_HOME/logs # jstorm.log.dir: "absolute path" # java.library.path: "/usr/local/lib:/opt/local/lib:/usr/lib" # if supervisor.slots.ports is null, # the port list will be generated by cpu cores and system memory size # for example, # there are cpu_num = system_physical_cpu_num/supervisor.slots.port.cpu.weight # there are mem_num = system_physical_memory_size/(worker.memory.size * supervisor.slots.port.mem.weight) # The final port number is min(cpu_num, mem_num) # supervisor.slots.ports.base: 6800 # supervisor.slots.port.cpu.weight: 1.2 # supervisor.slots.port.mem.weight: 0.7 # supervisor.slots.ports: null # supervisor.slots.ports: # - 6800 # - 6801 # - 6802 # - 6803 # Default disable user-define classloader # If there are jar conflict between jstorm and application, # please enable it # topology.enable.classloader: false # enable supervisor use cgroup to make resource isolation # Before enable it, you should make sure: # 1. Linux version (>= 2.6.18) # 2. Have installed cgroup (check the file's existence:/proc/cgroups) # 3. You should start your supervisor on root # You can get more about cgroup: # http://t.cn/8s7nexU # supervisor.enable.cgroup: false ### Netty will send multiple messages in one batch ### Setting true will improve throughput, but more latency # storm.messaging.netty.transfer.async.batch: true ### if this setting is true, it will use disruptor as internal queue, which size is limited ### otherwise, it will use LinkedBlockingDeque as internal queue , which size is unlimited ### generally when this setting is true, the topology will be more stable, ### but when there is a data loop flow, for example A -> B -> C -> A ### and the data flow occur blocking, please set this as false # topology.buffer.size.limited: true ### default worker memory size, unit is byte # worker.memory.size: 2147483648 # Metrics Monitor # topology.performance.metrics: it is the switch flag for performance # purpose. When it is disabled, the data of timer and histogram metrics # will not be collected. # topology.alimonitor.metrics.post: If it is disable, metrics data # will only be printed to log. If it is enabled, the metrics data will be # posted to alimonitor besides printing to log. # topology.performance.metrics: true # topology.alimonitor.metrics.post: false # UI MultiCluster # Following is an example of multicluster UI configuration # ui.clusters: # - { # name: "jstorm", # zkRoot: "/jstorm", # zkServers: # [ "localhost"], # zkPort: 2181, # }
(8)分发配置好的jstorm包,并启动集群
A:在nimbus上,执行nohup jstorm nimbus &启动nimbus,查看$JSTORM_HOME/logs/nimbus.log,检查是否有错误
B:在supervisor节点上执行 “nohup jstorm supervisor &”, 查看$JSTORM_HOME/logs/supervisor.log检查有无错误
(9)访问nimbus所在机的ip:8080,查看Jstorm的ui图:
至此安装成功!
(10)常用命令
提交任务命令:
jstorm jar xxxx.jar 类名 参数1 参数2 参数n
杀死任务命令:
jstorm kill topologyName
参考文档:
http://storm.apache.org/
https://github.com/alibaba/jstorm/wiki/JStorm-Chinese-Documentation
有什么问题可以扫码关注微信公众号:我是攻城师(woshigcs),在后台留言咨询。
技术债不能欠,健康债更不能欠, 求道之路,与君同行。