[360 open source Redis-like storage system: Pika introduction]

what is pika

pika is a Redis-like storage system jointly developed by the DBA and the infrastructure team, so it fully supports the Redis protocol, and users can migrate services to pika without modifying any code. Pika is a persistent large-capacity redis storage service, compatible with the vast majority of interfaces of string, hash, list, zset, and set (compatibility details), solving the capacity bottleneck of insufficient memory due to the huge amount of stored data in redis, and can Like redis, master-slave backup is performed through the slaveof command, which supports full synchronization and partial synchronization. At the same time, the DBA team also provides migration tools, so users will not perceive the migration process, and the migration is smooth.

Comparison with redis

Compared with redis, the biggest difference between pika is that pika is persistent storage, data is stored on disk, and redis is memory storage. This difference also brings advantages and disadvantages to pika compared to redis.

Advantage:

  1. Large capacity: Pika does not have the memory limit of Redis, and the maximum used space is equal to the size of the disk space
  2. Fast loading of db: When Pika is writing, the data is dropped to the disk, so even if the node hangs, no rdb or oplog is needed, and the previous data can be restored without loading all the data into the memory after pika restarts, and there is no need to play back the data. operate.
  3. Fast backup speed: The speed of Pika backup is roughly equal to the speed of cp (there is a snapshot recovery process after copying data files, which will take some time), so it is fast and faster for backup of 100G databases The speed is better to solve the problem of full synchronization of master and slave

Disadvantage:

Since Pika stores data based on memory and files, the performance is definitely lower than that of Redis, but we generally use SSD disks to store data to keep up with the performance of Redis as much as possible.

Applicable scene

From the above comparison, it can be seen that if the data in your business scenario is relatively large and Redis is difficult to support, such as more than 50G, or your data is very important and is not allowed to be lost after power failure, then using Pika can solve your problem. In actual use, the performance of pika is about 50% of that of Redis.

 

Pika is a Redis-like storage system jointly developed by 360 DBA and the infrastructure group. It fully supports the Redis protocol. Users can migrate services to Pika without modifying any code. There is no learning cost for DBAs with experience maintaining Redis to maintain Pika.

 

Pika mainly solves the problem that the memory size of users using Redis exceeds 50G, 80G, etc., and they will encounter problems such as long startup and recovery time, high cost for one master and multiple slaves, expensive hardware, and buffers that are easy to fill up. Pika is a solution for these scenarios.

 

Pika is a persistent huge storage service , compatible with the vast majority of redis interfaces (details), including string, hash, list, zset, set and management interfaces. With the huge amount of data stored, redis may suffer for a capacity bottleneck, and pika was born for solving it. Except huge storage capacity, pika also support master-slave mode by slaveof command, including full and partial synchronization. You can also use pika together with twemproxy or codis(pika has supported data migration in codis,thanks left2right) for distributed Redis solution

 

Pika is a persistent large-capacity redis storage service, compatible with most interfaces of string, hash, list, zset, and set (compatibility details), to solve the capacity bottleneck of insufficient memory due to the huge amount of stored data in redis, and Like redis, master-slave backup can be performed through the slaveof command, supporting full synchronization and partial synchronization. Pika can also be used in twemproxy or codis to achieve static data sharding (pika can already support the dynamic migration slot function of codis, which is currently being merged Go to the master branch, welcome to use, thank the author left2right for submitting pr)



 

 

Feature

huge storage capacity

compatible with redis interface, you can migrate to pika easily

support master-slave mode (slaveof)

various management interfaces

 

Features

Large capacity, supporting hundreds of G data storage

Compatible with redis, you can smoothly migrate from redis to pika without modifying the code

Support master-slave (slaveof)

Perfect operation and maintenance commands

 

 

Pika uses a multi-threaded model, using multiple worker threads for read and write operations, and the underlying nemo engine ensures thread safety. Threads are divided into 11 types:

PikaServer: main thread

DispatchThread: listens on port 1 and receives user connection requests

ClientWorker: There are multiple (user configuration), each thread has several user client connections, responsible for receiving and processing user commands and returning results, each thread executes the write command and appends it to the binlog

Trysync: Attempt to establish first connection with master and reconnect after failure

ReplicaSender: There are multiple (dynamically created and destroyed, as many slave nodes as the master node hangs), each thread starts real-time synchronization commands from the offset specified by binlog to the slave according to the synchronization offset sent by the slave node node

ReplicaReceiver: there is 1 (dynamically created and destroyed, a slave node can only have one master at the same time), send the user-specified or current offset to the master node and start to receive and execute the synchronization command sent by the master in real time, use and master exactly the same offset to append binlog

SlavePing: slave is used to send heartbeat to master for liveness detection

bgsave: background dump thread

HeartBeat: The master is used to receive heartbeats sent by all slaves and reply to liveness detection

scan: background scan keyspace thread

purge: delete the binlog thread in the background



 Problems encountered with large-capacity Redis

1. Long recovery time

Our online Redis generally opens rdb and aof at the same time.

We know that the role of aof is to record the user's write operations in real time, and rdb is a complete snapshot of Redis data at a certain moment. Then the recovery is generally performed by rdb + aof. According to our online situation, the recovery time of 50G Redis takes about 70 minutes.

2. One master and multiple slaves, the cost of master-slave switching is high

After the main library of Redis hangs, the slave library is upgraded to the new main library. Then after switching the main library, all the slave libraries need to do a full synchronization with the new master, and the cost of a full synchronization of a large-capacity Redis is very high.

3. Buffer full problem

In order to prevent the synchronization buffer from being overwritten, dba sets a huge synchronization buffer of 2G for Redis, which is very expensive for memory resources. When the master-slave synchronization is delayed by more than 2G due to a network failure between the computer rooms, the full synchronization process will be triggered. If multiple slave libraries trigger the full synchronization process at the same time, it is easy to drag the master library to death.

4. Memory is too expensive

The Redis machines we generally use online are 64G and 96G, and we only use 80% of the space.

If a Redis instance is 50G, then basically a machine can only run one Redis instance, which is a waste of resources.

Summarize

It can be seen that these problems are not problems when Redis is relatively small, but when the Redis capacity increases, many operations take longer and longer.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326326731&siteId=291194637