Implementation of distributed cache based on redis

First: What is Redis  ?

Redis is a memory-based, persistent log-type, Key-Value database  high-performance storage system, and provides APIs in multiple languages.

Second: the background appears

  • There are more and more data structure requirements, but not in memcache, which affects development efficiency
  • Performance requirements, which need to be solved as the amount of read operations increases. The processes experienced are: 
    database read and write separation (M/S) –> database uses multiple slaves –> increase Cache (memcache) –> go to Redis
  • Solve the problem of writing: 
    split horizontally, split the table, put some users in this table, and some users in another table;
  • Reliability Requirements 
    The "avalanche" problem of Cache makes people entangled. 
    Cache is faced with the challenge of rapid recovery

  • Development cost requirements 
    The consistency maintenance cost of Cache and DB is getting higher and higher (clean up the DB first, then clean up the cache, no, it's too slow!) 
    Development needs to keep up with the influx of product requirements 
    The most expensive hardware cost is the database level The machine is basically several times more expensive than the front-end machine, which is mainly IO-intensive and consumes a lot of hardware;

  • Maintenance complexity and 
    consistency maintenance costs are getting higher and higher; 
    BerkeleyDB uses B-trees, and will always write new ones, and there will be no internal file reorganization; this will lead to larger and larger files; The operation should be done regularly; in 
    this way, a certain down time is required;

Based on the above considerations, Redis was chosen

Third: Application of Redis in Sina Weibo

Introduction to Redis

1. Support 5 data structures

支持strings, hashes, lists, sets, sorted sets 
string是很好的存储方式,用来做计数存储。sets用于建立索引库非常棒;

2. K-V 存储 vs K-V 缓存

新浪微博目前使用的98%都是持久化的应用,2%的是缓存,用到了600+服务器 
Redis中持久化的应用和非持久化的方式不会差别很大: 
非持久化的为8-9万tps,那么持久化在7-8万tps左右; 
当使用持久化时,需要考虑到持久化和写性能的配比,也就是要考虑redis使用的内存大小和硬盘写的速率的比例计算;

3. 社区活跃

Redis目前有3万多行代码, 代码写的精简,有很多巧妙的实现,作者有技术洁癖 
Redis的社区活跃度很高,这是衡量开源软件质量的重要指标,开源软件的初期一般都没有商业技术服务支持,如果没有活跃社区做支撑,一旦发生问题都无处求救;

Redis基本原理

redis持久化(aof) append online file: 
写log(aof), 到一定程度再和内存合并. 追加再追加, 顺序写磁盘, 对性能影响非常小

1. 单实例单进程

Redis使用的是单进程,所以在配置时,一个实例只会用到一个CPU; 
在配置时,如果需要让CPU使用率最大化,可以配置Redis实例数对应CPU数, Redis实例数对应端口数(8核Cpu, 8个实例, 8个端口), 以提高并发: 
单机测试时, 单条数据在200字节, 测试的结果为8~9万tps;

2. Replication

过程: 数据写到master–>master存储到slave的rdb中–>slave加载rdb到内存。 
存储点(save point): 当网络中断了, 连上之后, 继续传. 
Master-slave下第一次同步是全传,后面是增量同步;、

3. 数据一致性

长期运行后多个结点之间存在不一致的可能性; 
开发两个工具程序: 
1.对于数据量大的数据,会周期性的全量检查; 
2.实时的检查增量数据,是否具有一致性;

对于主库未及时同步从库导致的不一致,称之为延时问题; 
对于一致性要求不是那么严格的场景,我们只需要要保证最终一致性即可; 
对于延时问题,需要根据业务场景特点分析,从应用层面增加策略来解决这个问题; 
例如: 
1.新注册的用户,必须先查询主库; 
2.注册成功之后,需要等待3s之后跳转,后台此时就是在做数据同步。

第四:分布式缓存的架构设计

1.架构设计

由于redis是单点,项目中需要使用,必须自己实现分布式。基本架构图如下所示

2.分布式实现

通过key做一致性哈希,实现key对应redis结点的分布。

一致性哈希的实现:

l        hash值计算:通过支持MD5与MurmurHash两种计算方式,默认是采用MurmurHash,高效的hash计算。

l        一致性的实现:通过Java的TreeMap来模拟环状结构,实现均匀分布

3.client的选择

The main modification of jedis is the modification of the partition module, so that it supports partitioning according to BufferKey, and according to different redis node information, different ShardInfo can be initialized, and the underlying implementation of JedisPool is also modified to connect to the pool pool. Support the construction method according to key and value, according to different ShardInfos, create different jedis connection clients to achieve the effect of partition, which can be called by the application layer

4. Description of the module

l Dirty data processing module, which handles cache operations that fail to be executed.

l Shield monitoring module, for the abnormal monitoring of jedis operation, when a node is abnormal, it can control the redis node removal and other operations.

The entire distributed module uses hornetq to remove abnormal redis nodes. For the addition of new nodes, the addition can also be achieved through the reload method. (This module can also be easily implemented for new nodes)

The implementation of the above distributed architecture meets the needs of the project. In addition, for some more important cache data in use, some redis nodes can be set separately, and a specific priority can be set. In addition, for the design of the cache interface, the basic interface and some special logic interfaces can also be implemented according to the requirements. For cas-related operations, and some transaction operations can be implemented through its watch mechanism.

Source: http://minglisoft.cn/technology

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324523999&siteId=291194637