Cache Topologies

Introduction 简介

Ehcache is used in the following topologies:Ehcache可以在如下几种拓扑结构下使用：

Standalone – The cached data set is held in the application node. Any other application nodes are independent with no communication between them. If standalone caching is being used where there are multiple application nodes running the same application, then there is Weak Consistency between them. They contain consistent values for immutable data or after the time to live on an Element has completed and the Element needs to be reloaded.

单独模式——被缓存的数据集保存在应用节点上。其他的应用节点不会与之发生交互。如果使用了standalone模式，而有多个应用节点运行了同一个应用程序，那么这些应用节点之间存在弱一致性。它们为不可变数据保持一致的值或者在一个Element的生命周期结束后需要重新加载该Element。
Distributed Ehcache – The data is held in a Terracotta Server Array with a subset of recently used data held in each application cache node. The distributed topology supports a very rich set of consistency modes.

分布式Ehcache——数据被保存在一个 Terracotta Server Array中，其中保存了各个应用节点最常用的那部分数据子集。分布式拓扑支持丰富的一致性模型。
- More information on configuring consistency 关于配置一致性的更多信息
- More information on how consistency affects performance 关于一致性如何影响性能的更多信息
- More in-depth information on how consistency works 关于一致性如何工作的深层信息
Replicated – The cached data set is held in each application node and data is copied or invalidated across the cluster without locking. Replication can be either asynchronous or synchronous, where the writing thread blocks while propagation occurs. The only consistency mode available in this topology is Weak Consistency.

复制模式——被缓存的数据集被保存在所有的应用节点上并且数据在群集中被不加锁地复制或销毁。复制模式可以是异步或同步的，写线程将会在propagation 发生时被阻塞。在这种拓扑模式下一致性模型仅有Weak Consistency是可用的。

Many production applications are deployed in clusters of multiple instances for availability and scalability. However, without a distributed or replicated cache, application clusters exhibit a number of undesirable behaviors, such as:

出于可用性和伸缩性的考虑，许多产品应用的多个实例会被部署到群集上。然而如果没有一个分布式或复制式的缓存，应用群集将会面临一些不理想的问题，例如：

Cache Drift -- if each application instance maintains its own cache, updates made to one cache will not appear in the other instances. This also happens to web session data. A distributed or replicated cache ensures that all of the cache instances are kept in sync with each other.
缓存漂移——如果每个应用实例保持其自身的缓存，对其中一个缓存的更新无法体现到其他应用实例上。这同样体现在web session数据上。一个分布式或复制式缓存可以确保多个缓存实例的同步。
Database Bottlenecks -- In a single-instance application, a cache effectively shields a database from the overhead of redundant queries. However, in a distributed application environment, each instance much load and keep its own cache fresh. The overhead of loading and refreshing multiple caches leads to database bottlenecks as more application instances are added. A distributed or replicated cache eliminates the per-instance overhead of loading and refreshing multiple caches from a database.
数据库瓶颈——在一个单例应用中，缓存可以有效降低数据库上的冗余查询开销。然而在分布式应用环境下，每个实例需要承担更多的负载，并需要保持其自身的缓存更新。在应用实例不断增加的情况下加载和刷新多个缓存将会导致数据库瓶颈。分布式或复制式的缓存可以降低每个实例在加载和刷新多个缓存方面的开销。

The following sections further explore distributed and replicated caching.下面的部分进一步探讨了分布式和复制式缓存。

Distributed Caching (Distributed Ehcache)

Ehcache provides distributed caching using the Terracotta Server Array, enabling data sharing among multiple CacheManagers and their caches in multiple JVMs. By combining the power of the Terracotta Server Array with the ease of Ehcache application-data caching, you can:

linearly scale your application to grow with requirements;
rely on data that remains consistent across the cluster;
offload databases to reduce the associated overhead;
increase application performance with distributed in-memory data;
access even more powerful APIs to leverage these capabilities.

Using distributed caching is the recommended method of operating Ehcache in a clustered or scaled-out application environment. It provides the highest level of performance, availability, and scalability.

Adding distributed caching to Ehcache takes only two lines of configuration. To get started, see the tutorial for distributed caching with Terracotta.

Replicated Caching

In addition to the built-in distributed caching, Ehcache has a pluggable cache replication scheme which enables the addition of cache replication mechanisms. The following additional replicated caching mechanisms are available:

RMI
JGroups
JMS
Cache Server

Each of the is covered in its own chapter. One solution is to replicate data between the caches to keep them consistent, or coherent. Typical operations include:

put
update (put which overwrites an existing entry)
remove

Update supports updateViaCopy or updateViaInvalidate. The latter sends the a remove message out to the cache cluster, so that other caches remove the Element, thus preserving coherency. It is typically a lower cost option than a copy.

Using a Cache Server

Ehcache 1.5 supports the Ehcache Cache Server. To achieve shared data, all JVMs read to and write from a Cache Server, which runs it in its own JVM. To achieve redundancy, the Ehcache inside the Cache Server can be set up in its own cluster. This technique will be expanded upon in Ehcache 1.6.

Notification Strategies

The best way of notifying of put and update depends on the nature of the cache. If the Element is not available anywhere else then the Element itself should form the payload of the notification. An example is a cached web page. This notification strategy is called copy. Where the cached data is available in a database, there are two choices. Copy as before, or invalidate the data. By invalidating the data, the application tied to the other cache instance will be forced to refresh its cache from the database, preserving cache coherency. Only the Element key needs to be passed over the network. Ehcache supports notification through copy and invalidate, selectable per cache.

Potential Issues with Replicated Caching

Potential for Inconsistent Data

Timing scenarios, race conditions, delivery, reliability constraints and concurrent updates to the same cached data can cause inconsistency (and thus a lack of coherency) across the cache instances. This potential exists within the Ehcache implementation. These issues are the same as what is seen when two completely separate systems are sharing a database, a common scenario. Whether data inconsistency is a problem depends on the data and how it is used. For those times when it is important, Ehcache provides for synchronous delivery of puts and updates via invalidation. These are discussed below:

Synchronous Delivery

Delivery can be specified to be synchronous or asynchronous. Asynchronous delivery gives faster returns to operations on the local cache and is usually preferred. Synchronous delivery adds time to the local operation, however delivery of an update to all peers in the cluster happens before the cache operation returns.

Put and Update via Invalidation

The default is to update other caches by copying the new value to them. If the replicatePutsViaCopy property is set to false in the replication configuration, puts are made by removing the element in any other cache peers. If the replicateUpdatesViaCopy property is set to false in the replication configuration, updates are made by removing the element in any other cache peers. This forces the applications using the cache peers to return to a canonical source for the data. A similar effect can be obtained by setting the element TTL to a low value such as a second. Note that these features impact cache performance and should not be used where the main purpose of a cache is performance boosting over coherency.

Use of Time To Idle

Time To Idle is inconsistent with replicated caching. Time-to-idle makes some entries live longer on some nodes than in others because of cache usage patterns. However, the cache entry "last touched" timestamp is not replicated across nodes. Do not use Time To Idle with replicated caching, unless you do not care about inconsistent data across nodes.

Ehcache学习文档2——Cache Topologies（中英文对照beta1.0）