Himalayan Redis and Pika cache usage rules

Author: Himalaya Dong Daoguang

Declaration: Caching is not a panacea, nor is it a trash can! ! !

As one of the most important basic components of Ximalaya, cache carries a huge amount of business requests every day. Once the cache fails, the impact on the business will be very serious. Therefore, ensuring the stable and efficient operation of the cache service is always an important goal for us.

The following is a set of cache usage specifications that we summarized after reviewing the historical faults of Xima cache. We would like to share them with you. We hope that friends can avoid pitfalls in the process of cache selection and use.

1. Cache selection

1.1 Introduction to cache types

There are four main types of Xima online cache:

1. redis  master-slave mode : official original version

2. codis -redis : Wandoujia open source, redis cluster solution

3. Cloud database redis : redis-cluster container deployment

4.xcache : A set of self-developed massive KV storage solutions based on codis, pika, and redis

1.2 Introduction to cache usage model

There are two main usage modes:

1. Cache mode : data does not need to be persisted, instance recovery does not need to load data, and expansion and contraction do not require data migration.

2. Store mode : data needs to be persisted, instance recovery needs to load data, and expansion and contraction need to migrate data.

The following is a simple comparison of various types of cache:

 

2. Cache usage rules

2.1 Cache type usage specifications

1) Cloud database redis is preferred for redis cluster mode, and xcache is preferred for massive KV storage.

Cloud database redis adopts the official redis cluster mode, containerized deployment, supports automatic fault recovery and elastic scaling, and is the main solution for the current redis cluster. However, it does not support data persistence. If data persistence is necessary, and the delay requirements are very high High, you can use codis redis. The amount of data is very large and the latency requirements are not particularly high. You can choose xcache.

2) Do not use redis as db . If the data must be persisted, you can choose xcache.

When redis is used as DB, data recovery after failure is very slow, seriously affecting SLA. And if the master and slave all hang up and the slave machine cannot be restored, the data will be completely lost. xcache naturally supports data persistence

3) Do not use client-side sharding mode

The client sharding mode does not have high availability and elastic scaling capabilities. It is recommended to use the real cluster mode, such as codis-redis, cloud database redis, xcache

4) Cluster mode does not support lua and redisson clients. If the business must use it, you can only choose redis master-slave mode.

5) Redis single node capacity should not exceed 10GB, xcache single node capacity should not exceed 200GB

When the single redis node capacity is too large, the instance restart will be slow, affecting the recovery time.

2.2 Key value design specifications

1) Key should be kept as simple, readable and manageable as possible

On the premise of ensuring semantics, control the length of the key ; prefix the business name (or database name) (to prevent key conflicts), and separate it with a colon, such as business name: table name: id; do not contain special characters

2) Reject bigkey to prevent excessive network card traffic and slow queries

The string type is controlled  within 10KB , and the number of hash, list, set, and zset elements should not exceed 5000

3) Avoid hot keys

Hot keys will cause data skew and excessive pressure on a single node. It is recommended that the business side disperse the hot keys.

4) Control key life cycle

Cache is not a trash can . It is best to set ttl for all keys and break up the ttl of keys to avoid centralized expiration of keys.

2.3 Command usage specifications

1) Use full operation commands with caution

Disable the `keys *` command and try not to use hgetall, smembers and other commands. When obtaining multiple elements under the key, use the corresponding scan command to obtain a small number of elements at a time and obtain them in multiples. It is recommended that one scan should not exceed 200 elements

2) Control the number of single operation elements of mset, mget, hmset, hmget, *scan, *range and other commands, it is recommended not to exceed 200

3) Control the number of commands in the pipeline, it is recommended not to exceed 100

4) When redis deletes the key, do not use the del command, use the unlink command

Deling a large key will directly cause redis to get stuck. The key can be deleted asynchronously using the unlink command, which will not affect the main redis thread and therefore will not affect business traffic.

5) The set and expire commands are merged into the setex command to reduce the writing pressure on the server

6) evalsha instead of eval

Use evalsha instead of eval in the redis-cluster cluster to reduce network IO, and also reduce redis network  IO  pressure to improve performance

2.4 Business cache architecture specifications

1) Do not use logical db for redis , only use default db 0

It can be isolated through instances, and data of different businesses can be saved in different instances. (Only redis master and slave can select logical db, cluster mode uses db0 by default)

2) Avoid multiple services reusing the same cache resources

Different business data use different clusters, S-level applications should not be mixed with B-level applications, and too many businesses reuse the same resource to be split. Businesses try to provide rpc interfaces for other business calls, instead of directly allowing other businesses to access data sources (such as one business write, one business read)

3) xcache tries to use string type

xcache supports six data types: string, hash, ehash, list, set, and zset. The ehash data type is an extension of the hash data type and supports setting the expiration time for the field. The string type in xcache is the fastest, and other data types are realized by combining and transforming strings. The performance of the six types of data is as follows:

string > hash > set > ehash > list > zset

Suggestion: Try to use string type

4) Reduce the use of lua scripts

The cluster mode has restrictions on Lua support, and it must be ensured that the keys operated in Lua are sharded to the same node. So try to reduce the use of lua as much as possible

5) No complex logic is run in Lua scripts

Complex logic is placed in business code instead of lua scripts

6) Use efficient serialization methods and compression methods

In order to save memory, if the value is large, you can use compression tools (such as snappy or gzip) to compress the data and then write it to redis

7) Avoid too much traffic of batch tasks, scheduled tasks, and periodic tasks affecting online business

Batch tasks, scheduled tasks, and periodic tasks must be speed-limited

8) Business changes, storage traffic model changes must be evaluated first

Business model changes, QPS, capacity increase, O (N) commands increase, etc. must first evaluate whether the current cache can withstand it, go online in grayscale, and continue to observe (especially during peak traffic periods )

9) Apply for recycling of unused resources as soon as possible

Dormant resource recycling can not only reduce the storage cost of the business, but also allocate resources to the businesses that really need it. It can be said to be a win-win situation.

 

Supplement: OpenAtom Open Source Contest Pika has been released, with a prize of 500,000 yuan. Please scan the following QR code to  register :

You can also add  Pika assistant and join the Pika WeChat group to learn more dynamic news:

The web version of Windows 12 deepin-IDE compiled by junior high school students was officially unveiled. It is known as "truly independently developed" QQ has achieved "three-terminal simultaneous updates", and the underlying NT architecture is based on Electron QQ for Linux officially released 3.2.0 "Father of Hongmeng" Wang Chenglu : Hongmeng PC version system will be launched next year to challenge ChatGPT, these 8 domestic AI large model products GitUI v0.24.0 are released, the default wallpaper of Ubuntu 23.10, a Git terminal written in Rust, is revealed, the "Tauren" in the maze JetBrains announces the WebStorm 2023.3 roadmap China Human Java Ecosystem, Solon v2.5.3 released
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/dubbogo/blog/10108965