Storm1.0新版本特性

This release represents a major milestone in the evolution of Apache Storm, and includes an immense number of new features, usability and performance improvements, some of which are highlighted below.

Storm 1.0.0版本增加了很多新的特性,可用性以及性能也得到了很大的改善,该版本是Storm发展历程上一个里程碑式的版本,目前最新版本是1.0.2但是方法参数和包名有很大变动,因此升级时候比较麻烦,以前有个方法传递byte[]参数,现在变为了javaNIO的ByteBuffer 



 

Improved Performance--性能提升

One of the main highlights in this release is a dramatice performance improvement over previous versions. Apache Storm 1.0 is *up to 16 times faster than previous versions, with latency reduced up to 60%. Obviously topology performance varies widely by use case and external service dependencies, but for most use cases users can expect a 3x performance boost over earlier versions.

Storm 1.0.0版本最大的亮点就是性能提升,和之前的版本先比,Storm 1.0的速度能够提升至16倍,延迟能够降低至60%。Storm的拓扑性能和应用案例以及依赖的外部服务相关,但是对于大部分应用,相对于之前的版本,性能能够实现3倍的提升。

Pacemaker - Heartbeat Server心跳服务器

Pacemaker is an optional Storm daemon designed to process heartbeats from workers. As Storm is scaled up, ZooKeeper begins to become a bottleneck due to high volumes of writes from workers doing heartbeats. Lots of writes to disk and large ammounts traffic across the network is generated as ZooKeeper tries to maintain consistency.

Because heartbeats are of an ephemeral nature, they do not need to be persisted to disk or synced across nodes, and an in-memory store will do. This is the role of Pacemaker. Pacemaker functions as a simple in-memory key/value store with ZooKeeper-like, directory-style keys and byte array values.

Distributed Cache API--分布式缓存API

In the past it was common for developers to bundle resources required by a topology (such as lookup data, machine learning models, etc.) within a topology jar file. One problem with this approach is that updating that data required the repackaging and redeployment of the topology. Another problem is that at times that data can be very large (gigabytes or more), which negatively impacts topology startup time.

Storm version 1.0 introduces a distributed cache API that allows for the sharing of files (BLOBs) among topologies. Files in the distributed cache can be updated at any time from the command line, without the need to redeploy a topology. The distributed cache API allows for files from several KB in size to several GB, and also supports compression formats such as ZIP and GZIP.

Storm 1.0 comes with two implementations of the distributed cache API: One backed by the local file system on Supervisor nodes, and one backed by Apache Hadoop HDFS. Both implementations also support fine-grained access control through ACLs.

HA Nimbus --Nimbus的高可用

Experienced Storm users will recognize that the Storm Nimbus service is not a single point of failure in the strictest sense (i.e. loss of the Nimbus node will not affect running topologies). However, the loss of the Nimbus node does degrade functionality for deploying new topologies and reassigning work across a cluster.

In Storm 1.0 this “soft” point of failure has been eliminated by supporting an HA Nimbus. Multiple instances of the Nimbus service run in a cluster and perform leader election when a Nimbus node fails, and Nimbus hosts can join or leave the cluster at any time. HA Nimbus leverages the distributed cache API for replication to guarantee the availability of topology resources in the event of a Nimbus node failure.

Storm之前的版本中,Nimbus节点存在单点失败的问题(Nimbus节点挂掉不会影响正在运行的拓扑),但是如果Nimbus节点不存在,用户不能提交新的拓扑,之前拓扑的任务也不能实现重新分配。 在Storm 1.0中,采用HA Nimbus来解决单点失败问题。在集群中运行多个Nimbus 服务实例,当Nimbus节点挂掉时,重新选举出新的Nimubs 节点,Nimbus主机可以随时加入或者离开集群。HA Nimbus通过采取分布式缓存API来实现数据的备份,保证拓扑资源的可用性。

Native Streaming Window API --- 原生本地流式窗口API

Window based computations are common among use cases in stream processing, where the unbounded stream of data is split into finite sets based on some criteria (e.g. time) and a computation is applied on each group of events. One example would be to compute the top trending twitter topic in the last hour.

Windowing is primarily used for aggregations, joins, pattern matching and more. Windows can be seen as an in-memory table where events are added and evicted based on some policies.

In past releases Storm relied on developers to build their own windowing logic. There were no recommended or high level abstractions that developers could use to define a Window in a standard way in a Topology.

Apache Storm 1.0 now includes a native windowing API. Windows can be specified with the following two parameters,

Window length - the length or duration of the window

Sliding interval - the interval at which the window slides

Storm has support for sliding and tumbling windows based on time duration and/or event count.

State Management - Statefule Bolts with Automatic Checkpointing

         ----状态管理-自动Checkpoint有状态Bolt

Storm 1.0 introduces a new Stateful Bolt API with automatic checkpointing. Stateful Bolts are easy to implement -- simply extend the BaseStatefulBolt class -- and can be combined with stateless bolts in a topology. Storm will automatically manage bolt state and recover that state in the event of a failure.

Storm 1.0 comes with a state implementations backed by memory as well as Redis. Future point releases will include additional support for alternative state stores.

Automatic Backpressure --自动反压机制

In previous Storm versions, the only way to throttle the input to a topology was to enable ACKing and set topology.max.spout.pending. For use cases that don't require at-least-once processing guarantees, this requirement imposed a significant performance penalty.

Storm 1.0 includes a new automatic backpressure mechanism based on configurable high/low watermarks expressed as a percentage of a task's buffer size. If the high water mark is reached, Storm will slow down the topology's spouts and stop throttling when the low water mark is reached.

Storm's backpressure mechanism is implemented independently of the Spout API, so all existing Spouts are supported.

之前的版本中,限制注入到拓扑的数据流量的方式是启用ACKing机制,并且设置topology.max.spout.pending参数。 当用例不需要实现at-least-once语义容错时,采用这种方式会极大的降低性能。 Storm 1.0引入了基于高/低水位的自动反压机制,这里的水位可通过Task的缓冲区大小来表示。当缓冲区达到高水位时,反压机制自动触发,降低Spout的数据注入速率,直到达到低水位为止。 Storm的反压机制和Spout API是独立的,所以所有已经存在的Spout都支持自动反压。

备注:生产案例,生产环境使用的是0.9.6版本,使用过程中发现worker非常占据内存,使用jmap命令转储JVM文件经过MAT工具分析发现0.9.6strom中使用DisruptorQueue占据内存,DisruptorQueue又是一ConcurrentLinkedQueue即消息进入太快,处理太慢导致消息挤压

 

Resource Aware Scheduler --资源感知调度器

Based on Storm pluggable topology scheduler API, Storm 1.0 adds a new scheduler implementation that takes into account both the memory (on-heap and off-heap) and CPU resources available in a cluster. The resources aware scheduler (AKA "RAS Scheduler") allows users to specify the memory and CPU requirements for individual topology components (Spouts/Bolts), and Storm will schedule topology tasks among workers to best meet those requirements.

In the future, the Storm community plans to extend the RAS implmentation to support network resources and rack awareness as well.

Dynamic Log Levels --动态日志等级

Storm 1.0 now allows users and administrators to dynamically change the log level settings for a running topology both from the Storm UI as well as the command line. Users can also specify an optional timeout after which those changes will be automatically reverted. The resulting log files are also easily searchable from the Storm UI and logviewer service.

Tuple Sampling and Debugging Tuple--采样和调试

In the course of debugging a topology, many Storm users find themselves adding "debug" bolts or Trident functions to log information about the data flowing through the topology, only to remove or disable them for production deployment. Storm 1.0 eliminates this need through the new Topology Debug capability.

Storm UI now includes a function that allow you to sample a percentage tuples flowing through a topology or individual component directly from the Storm UI. The sampled events can then be viewed directly from the Storm UI and are also saved to disk.

Distributed Log Search --分布式的日志查询

Another improvement to Storm's UI is the addition of a distributed log search. This search capability allows users to search across all log files of a specific topology, including archived (ZIP'ed) logs. The search results will include matches from all Supervisor nodes.

Dynamic Worker Profiling 动态Worker性能分析

The last, but certainly not the least, usability improvement in Storm 1.0 is dynamic worker profiling. This new feature allows users to request worker profile data directly from Storm UI, including:

Heap Dumps

JStack Output

JProfile Recordings

The generated files are then available for download for off-line analysis with various debugging tools. It is also now possible to restart workers from the Storm UI.

猜你喜欢

转载自gaojingsong.iteye.com/blog/2346822