Redis principle
Redis memory model
redisServer
public class redisServer {
int dbnum;// 当前redis节点内数据库数量,默认16
redisDb[] db;// 数组,保存数据库信息
redisClient clients;// 链表,保存客户端信息
// serverCron函数维护的属性
Date unixtime;// 秒级别时间戳
long mstime;// 毫秒级别时间戳
Date lruclock;// LRU时钟,每十秒更新一次
long ops_sec_samples;// Redis server每秒执行命令次数
long stat_peak_memory;// Redis server内存峰值记录
int shutdown_asap;// Redis server运行状态 1关闭 0运行
int cronloops;// serverCron函数计数器
// 持久化相关
String rdb_child_pid;// 执行BGSAVE子进程ID,-1表示未执行
String aof_child_pid;// 执行BGREWRITEAOF子进程ID,-1表示未执行
long dirty;// 修改计数器
Date lastsave;// 上次BGSAVE时间
sdshdr aof_buf;// AOF缓冲区
// 慢查询相关
long slowlog_entry_id;// 下一条慢查询日志ID
Object slowlog;// 慢查询日志链表
long slowlog_log_slower_than;// 超出该属性值则为慢查询,单位微秒
long slowlog_max_len;// 慢查询日志保存数量
}
redisDb
public class redisDb{
dict dict;// 保存键值对
dict expires;// 保存设置过期时间的键和过期时间
dict watched_keys;// 保存被WATCH监视的键
}
redisClient
public class redisClient{
redisDb db;// 当前客户端正在使用的数据库
sdshdr querybuf;// 输入缓冲区
String[] argv;// 命令与命令参数数组
int argc;// argv长度
sdshdr buf;// 输出缓冲区
int bufpos;// buf已使用长度
int authenticated;// 0未通过身份验证 1通过身份验证
}
Redis data structures
Redis Java objects form shows Redis data structures
Redis operating mechanism
Redis server initialization
- Objects instantiated redisServer
- Initialization redisServer object attributes and parameters specified by the user profiles
- Object Initialization redisServer other properties
- Create a constant: "OK" and the string "1" - "10000" string
- Created event serverCron
- Load persisted file (AOF or RDB)
- Start time of the event execution
The sixth step load persisted file flow chart
Redis client sends a request
- The operation command encapsulation protocol RESP
- Redis server through the socket to a
Redis server receives the request
- Through a socket receiving the requested content, to save redisClient.querybuf (input buffer)
- Resolution request content, save to redisClient.argv (operation command and operation command parameter array) redisClient.argc (argv length)
- Call operation command executor
Redis server processing request
- The operation command to command function lookup table operation command corresponding to the command (redisCommand)
- The verification operation redisCommand.arity redisClient.argc and the number of command parameters are correct
- According redisClient.authenticated verify whether the client is authenticated
- Call the command function
- To save the processing result redisClient.buf (output buffer)
- For subsequent processing (slow query log & redisCommand count +1 & AOF & synchronization)
- The processing result is sent to Redis client
Redis event
File event
Socket: Socket
the IO multiplexing program: using the Redis epoll underlying
file event dispatcher: Event executed by
the event processor: a processor connection response, the processor and command reply command request processor
file event processing flow
- When the socket is ready to execute connection response, write, read, shut down other operations, will generate a file event
- IO multiplexing program monitor multiple sockets, the socket is generated in the event of a queue
- IO multiplexing program orderly push socket to file event dispatcher
- File event dispatcher according to the type of event, select the event handler function call
Time Event serverCron function
- Update timestamp redisServer.unixtime and redisServer.mstime
- Update LRU clock redisServer.lruclock
- Update Redis server command execution times per second redisServer.ops_sec_samples
- Update Redis server memory peak recorded redisServer.stat_peak_memory
- SIGTERM signal processing, the received signal SIGTERM to 1 redisServer.shutdown_asap
- Check the client resources
- Check the database resources
- Check the persistence operations run state
- If you turn AOF persistence, will write the contents of the buffer AOF AOF
- redisServer.cronloops+1
Redis persistence
RDB
Trigger conditions
manual trigger (Save command and command bgsave)
automatically triggers (save mn) (from the master copy) (the shutdown command)
- save command will block Redis server, until the RDB file is created (almost obsolete)
- bgsave command creates a child process to generate the RDB file, create a parent process to block the child process
- It refers to save mn m second occurs n times operation automatically triggers bgsave; and judged according redisServer.dirty redisServer.lastsave properties, responsible for triggering the time event serverCron
- Master-slave replication scenario, perform a full volume copy from the node, the master node performs bgsave command to send the file to the RDB node
- shutdown command automatically generates RDB file, and then the process ends
AOF
Open AOF persistent: yes appendOnly
AOF execution process is divided into three steps: Append command + + files are written to the file overwriting
- Additional commands: the successful implementation of the modified operation of writing redisServer.aof_buf
- File Write: according to the policy data in the buffer is written to disk
always: the data is written to the disk buffer has
no: wait for the operating system call write command, usually 30 seconds
everysec: wait for the operating system command through fsync calls per second once (default policy) - Overwriting files: Convert data in Redis as a command to write a new file AOF
Overwriting files triggers
a manual trigger (bgrewriteaof command)
automatically triggers (AOF file exceeds 64MB file is larger than the original and new AOF AOF file)
Why AOF may lose up to two seconds of data
Redis will fsync record the last time the command is successful, if less than 2 seconds, without triggering fsync; if more than 2 seconds, then blocked carried fsync. So sudden downtime could lose two seconds of data before the trigger fsync.
Redis4.0 mixed Persistence
Open mixed Persistence: AOF of-use-RDB-Preamble yes
Redis5.0 enabled by default persistence mixing
mixing persistent execution process
- Manual / automatic trigger bgrewriteaof command
- Main course fork child process, fork during the primary process blocked
- The child process the full amount of data written to the RDB AOF file format, the main process operation command buffer and write AOF AOF rewrite buffer
- Child process notifies the main process, the main process will AOF rewrite buffer data is written to AOF AOF file format
- The main process replaces the original file new AOF AOF file (AOF document is the first half of RDB data format, the second half is AOF format command)
Redis availability
Master-slave replication mode
The advantages of
separate read and write: the master node to write, read from node
failover: master node goes down, the node upgrade from the master node
shortcomings
master / node failover require manual intervention from the
write operation can not load balance
connection establishment phase
- Recording the master node from the node masterhost ip, masterport recording master node port
- Slaveof command transmitted from the node to the master node, the master node returns OK
- Establishing a socket connection node from the master node to receive commands from other files and RDB node socket, socket stored in the master node redisServer.clients
- Ping command transmitted from the node to the master node, the master node returns pong
- Auth command sent from the authentication node to the master node
- Port number transmitted from the node to the master node, the master node sends the data to the subsequent port
Data synchronization phase
- Psync command transmitted from the node to the master node, the master node determines the total amount of replication or partial replication
Command propagation stage
- Master node operation command is executed successfully, transmitting the operation command to the slave node
- From the master node and the heartbeat mechanism REPLCONF ACK mechanism
repl-disable-tcp-nodelay yes : merge operation command, 40ms transmission time
repl-disable-tcp-nodelay no : operation command sent once per
heartbeat mechanism: the master node transmits the ping command to every 10 seconds from the node
REPLCONF ACK mechanism: from REPLCONF ACK transmission command per node to the master node, maintaining offset property
Sentinel mode
Advantages of
automatic failover node to achieve the main
disadvantage of
a node failure recovery still requires manual intervention from the
write load balancing can not
realize the principle of
each sentinel node maintains three regular tasks
- Get the latest master command to the master node sends info from the structure
- Sentinel node subscribe for additional information by publishing
- Detecting heartbeat ping command sent to other nodes
Heartbeat detection process, the master node does not respond, the master node sentinel node subjective offline, and asks the master node status other sentinel node sentinel is-master-down-by -addr command, if it is judged subjectively sentinel node reaches a certain offline number, the objective of the master node offline, proceed to the election.
Sentinel node selection algorithm leader: Raft algorithm, first-served basis.
Select the primary node algorithm:
- Filtered from node unhealthy
- Select the highest priority from the node
- If priority can not be distinguished from the largest selection offset node
- If the offset can not be distinguished from a node to select the smallest runid
Cluster Mode
Advantage
to solve the problem can not write load balancing
to achieve the principle of
the cluster model will be 16,384 slots distributed on each primary node, the data falls through each slot data partitioning scheme.
Data partitioning scheme
- Hash modulo partition
- Consistent hashing partition
- Virtual node consistent hashing partition
Cluster members
- Data node (master node and slave node)
- Sentinel node
master reads and writes, only backup data from the node is responsible for
each node maintains two ports - Common ports: provide users with services
- Cluster port (common interface +10000): for each node cluster communication
Add a node
- Start node
- Node handshake
- Migration slot
- Specify the master-slave relationship
Reduce node
- Migration slot
- Node offline
Failover
sentinel node identification objective offline master node, the master node by the other vote for a node to become the master node