Distributed architecture of the cache system

  Behind a large-scale distributed systems mature sound, often designed to support a large number of components, these support systems in the infrastructure of a distributed system. Infrastructure system architecture design is dependent, also includes collaboration and configuration management of distributed components, distributed caching component, persistent storage components, distributed messaging systems, search engines, and CDN systems, load balancing systems, operation and maintenance of automation systems etc., as well as real-time computing systems, offline computing systems, distributed file systems, log collection system, monitoring system and data warehouse. Here mainly to talk about caching system components.

Cache component layer

Caching system benefits:

  1. Accelerate read and write. Cache memory is usually full, such as Redis, Memcache. Read and write directly to memory than traditional storage layer, such as MySQL, much better performance. Due to the limited memory resources of a single machine and carrying capacity, and is used if a large number of local cache, but also make multiple copies of the same data are stored in different nodes, resulting in a greater waste of memory resources, so it gave birth to a distributed caching .
  2. Reduce the load on the back end. In a highly concurrent environment, a large number of read and write requests flock to the database, processing speed and memory disk is clearly not an order of magnitude to consider to reduce the pressure from the database system responsiveness and provide two angles, usually added before the database level cache.

Caching system brings cost:

  1. Data inconsistency: in a distributed environment, is concurrent read and write data, a plurality of upstream applications, by deploying a plurality of service (in order to ensure the availability, must be deployed multiple parts), on the same data read, concurrent read and write database level does not guarantee complete sequence, that is to say after issuing the read request is likely to be completed (dirty read)
  2. Code maintenance costs: After addition of the cache, the cache need to the processing logic and storage layers, increasing the cost of maintenance of the code developer.
  3. Operation and maintenance costs: the buffer layer is introduced, such as Redis. To ensure high availability, you need to call the shots from high concurrency needs to be done clusters.

 

Cache update

Cached data usually has a life time, it will fail after a period of time, need to reload when accessed again. Cache invalidation is to ensure that the data source real data to ensure consistency and efficient use of cache space. Common cache update strategy:

  1. LRU / LFU / FIFO algorithms: when three algorithms belong update algorithm when the cache is not enough. Just out of element selected rules are not the same: LRU has not been out of the longest visited, LFU eliminate the minimum number of visits, FIFO FIFO. Suitable memory space is limited, the data does not change the long-term, without fundamental data of a business does not exist. For example, some information has been determined to not allow changes. To clean up what data is given by a specific algorithm, the developer can only choose one of them, relatively poor consistency.
  2. Excluding overtime: to cache data manually set an expiration time, such as Redis expire command. When the time is over, when the visit again reloaded from the data source and set-back cache. Development and maintenance cost is not very high, the system comes with a lot of cache expiration time API (such as Redis expire). Can not guarantee real-time consistency, suitable for a certain period of time can tolerate data inconsistency business.
  3. Automatic Updates: If the data source is updated, then take the initiative to update the cache. Consistency is better, as long as determine the correct update, consistency can be guaranteed. Updating the update service data cache coupled together, need updating processes traffic data cache update success of failure scenarios, the general message for decoupling way to update queue. However, in order to improve fault tolerance, usually combined with culling program timeout to avoid cache update fails, the cache can not be updated scene. Generally applies to data consistency demanding, such as trading systems, the total number of sheets of coupons.

 

 Caching System Common Problems and Solutions

In general, use in the following manner caching feature: when the service system initiates a particular query request first determines whether there is the data in the buffer; if the cache exists, the process directly returns data; if the cache does not exist, then the query database and return data.

  1.  Cache penetration
    1. Description: The data service system to be queried simply there, when the business system sends a query, according to the above process, will first go to the query cache, since the cache does not exist, then go to the database query. Since the data simply do not exist, the database also returns empty. This is the cache penetration. If a large number of high-penetration cache concurrent scenario, the request goes straight into the database of the storage layer, back-end system will be slightly crushed accidentally.
    2. The reason: malicious attacks, deliberately create a large amount of data does not exist for service, because the data does not exist in the cache, so requests are massive fall in the database.
    3. For optimization cache penetration
      1. Null Object Cache : Cache penetration occurs, because the cache is not stored in the dummy data key, leading to all these requests to the database. So, can a database query result is empty key is also stored in the cache, when a subsequent query requesting the key and the emergence of cache direct return null, without having to query the database. But this program still can not cope with a large number of high concurrency and does not penetrate the same cache, if the effective range under clear business case, not the moment to initiate a large number of the same request, first query will still penetrate into the DB. Another disadvantage of this solution is this: every time a different cache penetration, an empty object cache, resulting in a number of different objects penetrate the large number of empty cache, memory is occupied by a large number of in vain, so that data can not be truly effective cached. So for this scheme needs to be done: first, good traffic filtering. The request does not fall within the scope of business to filter out the system directly back, do not go directly to the query. Second, to empty object cache setting a shorter expiration time, it can be effectively and quickly cleared out of memory space.
      2. Bloom filter : Bloom filter is a combination of a hash function and a data structure of a bitmap. It is actually a long series of random binary vector and mapping functions. Bloom filter can be used to retrieve an element is in a set, it has the advantage query time and space efficiency are far more than the general algorithm, the disadvantage is a certain error recognition rate and remove difficulties. When the business system query request, first to BloomFilter query whether there is the key, if not, then the database does not exist in the data, so the cache do not check, direct return null; if there is, continue to follow the process, first go to the query cache, the cache if not then go to the database query.
      3. In general, different for key recommendations dummy data, low probability key repeat request scenario, it should choose the second embodiment; and for a limited number of null data key, key repeat request scenario in terms of a higher probability , you should choose the first option. Where appropriate, the Bloom filter and empty the cache object is also entirely possible to combine, Bloom filters were local cache implementation, low memory footprint, walk redis / memcache This remote query cache miss.
  2. Cache avalanche
    1. Description: When the cache server reboot or failure of a large number of cache concentrated in a certain period of time, so that when failure, will bring a lot of pressure to the back-end systems (such as DB), resulting in back-end database fails, causing the application server avalanche .
    2. Appears scenario: a cache server hung up; the peak of the failure of the local cache; cache miss hot spots and other reasons
    3. Avalanche cache optimization program:
      1. Ensure high availability service caching layer, such as a master multi-slave, Redis Sentine mechanism.
      2. Dependent isolation assembly to the rear end of the flow restrictor and degraded, such as the netflix hystrix.
      3. Project resource isolation. Avoid bug a project, affecting the overall system architecture, there is also a problem confined within the project.
      4. Avoid caching centralized failure, the expiration time of the cached data set randomly, to prevent the same time a lot of data expired phenomenon.
      5. If you can, then set the hot data never expires.
  3. (Cache breakdown) hot dataset Failure
    1. Description: Cache breakdown refers to a key very hot, stop carrying large concurrency, concurrent large centralized access to this point, the moment when the key failure, complicated by ongoing large cache wore out, in a direct request database causing the database could not carry high concurrency and cause downtime.
    2. Cache optimization breakdown:
      1. In reality, it is difficult to crush resistance put pressure on the database server, the cache is similar breakdown is infrequent. If there really is such a case, the simplest approach is to make popular the data cache to never expire.
      2. Use mutex. There is no data in the cache, acquire the lock and fetch data from the database, to prevent excessive requests to the database a short time, resulting in the database hang.
  4. Cache bottomless pit problem (this problem rarely occurs, the facebook discovered, is not recorded here, an article written by a very good blog: https: //blog.csdn.net/yonggeit/article/details/72862134)

 

memcache achieve distributed caching system

  memcache is an open source content of high-performance distributed object caching system, used by many large sites, to reduce the application access to the database, improve access speed applications, and reduce the load on the database. In order to provide high-speed data in memory, the ability to find, memcache form using key-value store and access data, maintain a huge HashTable in memory, so that the time for data query complexity is reduced to O (1), to ensure the High-performance access to data. Memory space is always limited, when the memory is no more space to store new data, memcache will use the LRU (Least Recently Used) algorithm, infrequently accessed data recently eliminated to make room to store the new data. memcache storage supported data formats are also flexible, through serialization mechanism object, the object can be converted into more high-level binary data stored in the cache server, the current time end application needs, and can deserialize binary content, the data is reduced to the original object. memcache client and the server is performed by constructing memcache protocol on top of TCP communication protocol, the protocol supports two kinds of data transfer, the data are two text lines and unstructured data. Text line is mainly used to carry command and response from the client server, and are mainly used for unstructured data transfer client and server data. Since the unstructured data in the form of a stream of bytes transmitted and stored between the client and server, so use is very flexible, the cache data store almost without any limitation, and the server does not need to care about the specific content stored and endian.

  memcache itself is not a distributed caching system, which is distributed by accessing its clients to achieve. A simpler implementation is performed in accordance with the cache Hash key, when the rear end of N cache servers, the access server hash (key)% N, which can be balanced mapping request to the front end of the rear end of the buffer server. But it will also cause a problem, once the back-end of a cache server is down, or because of cluster pressure is too large, need to add cache server, most of the key will be re-distributed. For highly concurrent systems, this could turn into a disaster, all requests would flood of crazy flock to the back-end database server, and database server is unavailable, it will cause the entire application is unavailable, the formation of so-called "avalanche effect."

  Use consistent Hash algorithm can improve the above problem to some extent. When the algorithm was presented in the paper early Consistent hashing and random trees in 1997, it can remove / add a cache server, change as little as possible key mapping relationship that already exists, to avoid a large number of key remapping of. Hash consistent principle is such that it Hash functions range space organized into a ring, Hash function is assumed to range space (the 32 th -12) 0, i.e. is a 32-bit Hash value unsigned int, the entire space is organized according to the clockwise direction, then the corresponding Hash server nodes, the mapping to a Hash ring, assuming there are four servers node1, node2, node3, node4, they position on the ring shown in FIG.

  Next, using the same Hash function calculates the position corresponding to the key Hash value on the corresponding ring. The consistent Hash algorithm, in a clockwise direction, the key distribution between node1 and node2, their access requests will be positioned to node2, and between the key and Node4 node2, the access request will be positioned to Node4, so . Assuming there is a new node node5 increase in, assuming it is Hash between node2 and node4, then the key between affected only node2 and node5, they will be remapped to node5, and other key mapping relationship will not It will change, thus avoiding a large number of key remapping of.

  Of course, knowledge of an ideal case depicted above, each node on the ring, very evenly distributed. Under normal circumstances, when there is less node data, a node can be very uneven distribution, resulting in a tilt data access, a large number of key mapped to the same server. To avoid this situation, the mechanism can be introduced virtual nodes, each server node of a plurality of Hash value calculated, for each position corresponds to a Hash value of a node on the ring, that node is called virtual node, and the key mapping the same way, but more step in the process of re-mapping from virtual to real node node. Thus, if the number of virtual nodes enough, even if only a few of the actual node, the key can be distributed relatively balanced.

Guess you like

Origin www.cnblogs.com/czx1/p/11438645.html