Distributed cache those things

In the previous articles, from a practical perspective, the application, disaster tolerance, monitoring, etc. of memcached were explained . However, there is a lack of theoretical explanation and principle analysis. This article will introduce from a theoretical perspective, so that everyone has a macro understanding of "distributed cache, nosql" and other technologies for further study and use. When building large-scale web applications, caching technology can be said to be a must, and the necessity of learning is self-evident.

Overview of Distributed Cache

1.1 Features of Distributed Cache

Distributed cache has the following characteristics:
1) High performance: When traditional databases face large-scale data access, disk I/O often becomes a performance bottleneck, resulting in excessive response delays. Distributed cache uses high-speed memory as the storage medium for data objects, and data is key/value Form storage, ideally, DRAM-level read and write performance can be obtained;
2) Dynamic scalability: Support elastic expansion, by dynamically increasing or reducing nodes to respond to changing data access load, providing predictable performance and scalability; at the same time, maximize resource utilization;
3) High availability: Availability includes two aspects: data availability and service availability. High availability is achieved based on a redundancy mechanism, no single point of failure, automatic failure detection, transparent implementation of failover, and no server failure Failure to cause cache service interruption or data loss. Automatically balance data partitions during dynamic expansion, while ensuring the continuous availability of cache services;
4) Ease of use: Provides a single data and management view; the API interface is simple and has nothing to do with the topology; no manual configuration is required for dynamic expansion or failure recovery; automatic selection of backup nodes; most cache systems provide a graphical management console , To facilitate unified maintenance;
5) Distributed code execution: The task code is transferred to each data node for parallel execution, and the client aggregates and returns the results, thereby effectively avoiding the movement and transmission of cached data. The latest Java data grid specification JSR-347 Distributed code execution and Map/reduce API support have been added to this new programming model. Major distributed caching products, such as IBM WebSphere eXtreme Scale, VMware GemFire, GigaSpaces XAP and Red Hat Infinispan, also support this new programming model.
1.2 Typical application scenarios
Typical application scenarios of distributed caching can be divided into the following categories:
1) Page caching. It is used to cache the content fragments of Web pages, including HTML, CSS and pictures, etc., which are mostly used in social networking sites;
2) Application object caching. The caching system serves as the second-level cache of the ORM framework to provide external services, with the purpose of reducing the load pressure of the database and accelerating application access;
3) State cache. Cache includes Session session state and state data during horizontal expansion of the application. Such data is generally difficult to recover and requires high availability and is mostly used in high-availability clusters;
4) Parallel processing. Usually involves a large number of intermediate calculation results that need to be shared;
5) Event processing. Distributed cache provides continuous query processing technology for event streams to meet real-time requirements;
6) Extreme transaction processing. Distributed cache provides high throughput and low latency solutions for transactional applications, supports high concurrent transaction request processing, and is mostly used in railways, financial services, and telecommunications.
1.3 Development of Distributed Cache
Distributed cache has gone through multiple stages of development, from the initial local cache to the elastic cache platform to the elastic application platform.The goal is to develop in the direction of building a better distributed system (as shown in the figure below).

1) Local cache: The data is stored in the memory space where the application code is located. The advantage is that it can provide fast data access; the disadvantage is that the data cannot be distributed and shared, and there is no fault-tolerant processing. Typically, such as Cache4j;


2) Distributed cache system: Data is distributed and stored among a fixed number of cluster nodes. The advantage is that the cache capacity can be expanded (static expansion); the disadvantage is that a large number of configurations are required during the expansion process, and there is no fault tolerance mechanism. Typical, such as Memcached ;


3) Elastic caching platform: Data is distributed and stored among cluster nodes, and high availability is achieved based on a redundancy mechanism. The advantage is that it can be dynamically expanded and has fault tolerance; the disadvantage is that replication and backup will have a certain impact on system performance. Typical, such as Windows Appfabric Caching ;


4) Elastic application platform: The elastic application platform represents the future development direction of the distributed cache system in the cloud environment. Simply put, the elastic application platform is a combination of elastic cache and code execution, which transfers business logic code to the node where the data is located for execution , Can greatly reduce data transmission overhead and improve system performance. Typical, such as GigaSpaces XAP .

1.4 Distributed cache and NoSQL
NoSQL, also known as Not Only Sql, mainly refers to non-relational, distributed, and horizontally scalable database design Mode. NoSQL abandons the strict transaction consistency and paradigm constraints of traditional relational databases, and adopts a weak consistency model. Compared with NoSQL systems, traditional databases are difficult to meet the storage requirements of application data in the cloud environment, which are specifically reflected in the following three aspects:
1) According to the CAP theory, the three elements of consistency, availability, and partition tolerance can meet at most two at the same time, and it is impossible to balance the three. For the large number of web applications deployed in the cloud platform The priority of data availability and partition fault tolerance is usually higher, so it is generally chosen to appropriately relax the consistency constraints. The transaction consistency requirements of traditional databases restrict the realization of horizontal
scalability and high availability technologies; 2) Traditional databases are difficult to adapt to the new Data storage access mode. There is a large amount of semi-structured data in Web 2.0 sites and cloud platforms, such as user session data, time-sensitive transactional data, and computationally intensive task data. These status data are more suitable for storage in the form of Key/Value. Does not require the complicated query and management functions provided by RDBMS;
3) NoSQL provides low-latency read and write speeds and supports horizontal expansion. These features are essential for cloud platforms with massive data access requests. Traditional relational data cannot provide the same performance, and memory database capacity is limited And it does not have the ability to expand. As an important implementation form of NoSQL, distributed caching can provide cloud platforms with highly available state storage and scalable application acceleration services, and there is no clear boundary with other NoSQL systems. access to system failures have unpredictability, in order to better address these challenges, application software architecture commonly used in the stateless design, a lot of state information is no longer managed by components, containers or platform, but directly pay
back pay Distributed cache service or NoSQL system.
1.5 Distributed cache and extreme transaction processing

With the further development of cloud computing and Web 2.0, many companies or organizations often face unprecedented demands: millions of concurrent user access, thousands of concurrent transaction processing per second, flexible elasticity and scalability, Low latency and 7×24×365 availability, etc. Traditional transactional applications face extreme scale of concurrent transaction processing, and extreme transaction processing applications have appeared, typically railway ticketing systems. Wikipedia believes that extreme transaction processing is more than 500 transactions or more than 10,000 concurrent access transaction processing. Gartner defines extreme transaction processing (extreme transaction processing, referred to as XTP) as an application mode that supports the development, deployment, management, and maintenance of transactional applications. Its characteristics are Extreme demand for performance, scalability, availability, manageability, etc. Gartner predicts in its report that the scale of extreme transaction processing applications will increase from 10% in 2005 to 20% in 2010, and extreme transaction processing Technology is a hot technology in the next 5 to 10 years. The introduction of extreme transaction processing has undoubtedly brought new challenges to the traditional three-tier Web architecture. That is, how to use cheap and standardized hardware and software platforms for the big Capacity, business-critical transaction processing applications provide good support. As a key XTP technology, distributed caching can provide high-throughput and low-latency technical solutions for transactional applications. Its write-behind ) Mechanism can provide shorter response time, while greatly reducing the transaction processing load of the database, staged event-driven architecture (staged event-driven architecture) can support large-scale, highly concurrent transaction processing requests. In addition, distributed cache in Manage transactions in memory and provide data consistency guarantees, and use data replication technology to achieve high availability, with a better combination of scalability and performance.



  Related articles recommended

  About the selection and use of NoSQL

 Memcached single point of failure and load balancing

memcached performance monitoring

Use Memcached under Windows .NET platform

 Popular explanation of "cluster and load balancing"

Guess you like

Origin blog.csdn.net/dinglang_2009/article/details/9071075