Redis performance testing and bottleneck analysis and tuning

1. Introduction

Redis (Remote Dictionary Server), that is, the remote dictionary service, is an open source written in ANSI C language, supports the network, can be based on memory and can be persistent log type, Key-Value database, and provides APIs in multiple languages

The difference between mysql and redis:

  • In terms of type, mysql is a relational database, and redis is a cache database;

  • In terms of function, mysql is used to store data persistently to the hard disk, which is powerful, but the speed is slow; while redis is used to store frequently used data in the cache, and the reading speed is fast

  • Mysql and redis are generally used together because of different requirements

Second, the application of redis

  1. The reason why the backend uses Redis in addition to Mysql/Oracle

  • Latency difference between memory and disk

  • Mysql database has high performance and high cost. Under the same machine configuration, Redis performance is significantly faster than Mysql data

  • Internet companies must use it 100% and now many projects are using it

  1. Application of caching technology in back-end architecture

Data storage method: For data persistence, a copy of data will be stored in a relational database; for high performance, a copy of data will also be stored in Redis, an in-memory database

Cache application process:

  • Data writing: general use of relational database

  • Data query: Check the Redis cache first, and then query the database if it is not found

  • High-concurrency system architecture: read more writes and use less cache

3. Precautions for performance testing

1. Cache preheating

If the program is run for the first time, the program's response time will be significantly longer because the data has not been loaded into the cache at this time. Corresponding to the performance test phenomenon - the program has just started, very unstable, its performance is significantly lower than the one that has been running for a while

Note the need to test the scenario:

1). What is the performance of the system when there is no test cache? And how long will it take to return to normal performance (find development - clear the data and simulate a clean environment)

2). After the test cache has been loaded, the system has been running for a period of time, and various business scenarios have been executed for several rounds

2. Cache Avalanche

The current architecture of redis cannot guarantee 100% data loss, so it is necessary to check whether the system can tolerate cache problems.

Simulate the scenario where redis fails, 1). Check how many seconds, the redis cache service needs to be restored; 2). If the cache fails, causing high concurrent requests to the database, whether there will be an exception

3. Cache breakdown

If the target of the query is data that does not exist in the system, the cache will inevitably fail, and a large number of Misses will be cached, and a large number of high-concurrency requests will also be sent to the database, which may cause the system to crash

4. Redis bottleneck analysis

1. Server resources

1) The machine resources are not enough to store too much data: consider building a redis cluster

2) To check whether redis itself occupies so many resources of the server, you can use the built-in info command to view the information in the following sections:

  • server : get server information

属性名 属性值 说明
redis_version 5.0.3 Redis 服务版本号
redisgitsha1 00000000 Git SGA1
redisgitdirty 0 Git dirty flag
redisbuildid 89e87c8197752890 Redis build ID
redis_mode standalone 运行模式,分为:standalone、sentinel、cluster
os Darwin 18.2.0 x86_64 Redis 所在机器的操作系统
arch_bits 64 架构(32位或者64位)
multiplexing_api kqueue Redis 所使用的事件处理机制
atomicvar_api atomic-builtin Atomicvar API
gcc_version 4.2.1 编译 Redis 时所使用的 GCC 版本
process_id 40163 服务进程 ID
run_id c4f8bb49f8214f406725136e6f589b46502a0e00 run_id
tcp_port 6379 监听端口
uptimeinseconds 496059 自 Redis 服务器启动已来,运行的秒数
uptimeindays 5 自 Redis 服务启动已来,运行的天数
hz 10 serverCron 每秒运行次数
configured_hz 10  
lru_clock 5452491 以分钟为单位进行自增的时钟,用于 LRU 管理
executable /../redis-5.0.3/src/./redis-server 运行文件
config_file /../redis-5.0.3/redis.conf 配置文件
  • clients : Get clients information, such as the number of client connections, etc.

connected_clients 1: 当前客户端连接数
clientrecentmaxinputbuffer 2: 当前连接的客户端当中,最大输入缓存
clientrecentmaxoutputbuffer 0: 当前连接的客户端当中,最长的输出列表
blocked_clients 0 正在等待阻塞命令(BLPOP、BRPOP、BRPOPLPUSH):的客户端的数量
  • memory : Obtain server memory information, including current memory consumption and peak memory usage

used_memory 18813680:由 redis 分配器(标准libc,jemalloc或其他分配器,例如tcmalloc)分配的内存总量,以字节(byte)为单位
usedmemoryhuman 17.94M:redis 分配的内存总量
usedmemoryrss 1945600 从操作系统的角度,Redis 已分配的内存总量(俗称常驻集大小)。这个值和 top、ps 等命令的输出一致。
usedmemoryrss_human 1.86M:操作系统角度,返回 redis 分配的内存总量
usedmemorypeak 18900752:redis 的内存消耗峰值(以字节为单位)
usedmemorypeak_human 18.03M:Redis 的内存消耗峰值
usedmemorypeak_perc 99.54%:usedmemorypeak在used_memory中所占的百分比
usedmemoryoverhead 11135798:分配用于管理其内部数据结构的所有开销的总字节数
usedmemorystartup 988448:启动时消耗的初始内存量(以字节为单位)
usedmemorydataset 7677882:数据集的大小(以字节为单位,usedmemory - usedmemory_overhead)
usedmemorydataset_perc 43.07%:usedmemorydataset在净内存(usedmemory-usedmemory_startup)使用量中所占的百分比
allocator_allocated 18780336:分配器分配的内存
allocator_active 1907712:分配器活跃的内存
allocator_resident 1907712:分配器常驻的内存
totalsystemmemory 34359738368:主机拥有的内存总量
totalsystemmemory_human 32.00G:主机拥有的内存总量
usedmemorylua 37888:Lua引擎使用的字节数
usedmemorylua_human 37.00K:以可读的格式返回Lua引擎使用内存
usedmemoryscripts 0  usedmemoryscripts_human 0B  numberofcached_scripts 0  maxmemory 0 配置设置的最大可使用内存值,默认0,不限制
maxmemory_human 0B:以可读的格式返回最大可使用内存值
maxmemory_policy noeviction:内存容量超过maxmemory后的处理策略,noeviction当内存使用达到阈值的时候,所有引起申请内存的命令会报错
allocatorfragratio 0.10:分配器的碎片率
allocatorfragbytes 18446744073692678992:分配器的碎片大小(以字节为单位)
allocatorrssratio 1.00:分配器常驻内存比例
allocatorrssbytes 0:分配器的常驻内存大小(以字节为单位)
rssoverheadratio 1.02:常驻内存开销比例
rssoverheadbytes 37888:常驻内存开销大小(以字节为单位)
memfragmentationratio 0.10:内存碎片率,usedmemoryrss 和 used_memory 之间的比率
memfragmentationbytes -16834736:内存碎片的大小(以字节为单位)
memnotcountedforevict 112:被驱逐的大小
memreplicationbacklog 0 repl_backlogmemclientsslaves 0 clients_slavesmemclientsnormal 49694 clients_normalmemaofbuffer 112 aof时,占用的缓冲mem_allocator libc 内存分配器(在编译时选择)
activedefragrunning 0:碎片整理是否处于活动状态
lazyfreependingobjects 0:等待释放的对象数(由于使用ASYNC选项调用UNLINK或FLUSHDB和FLUSHALL)
  • persistence : Get the persistent configuration information of the server

loading 0 记录服务器是否正在载入持久化文件
rdbchangessincelastsave 0 最近一次成功创建持久化文件之后,经过了多少秒
rdbbgsavein_progress 0 记录了服务器是否正在创建 RDB 文件
rdblastsave_time 1582508875 最近一次成功创建 RDB 文件的 UNIX 时间戳
rdblastbgsave_status ok 记录最近一次创建 RDB 文件的状态,是成功还是失败
rdblastbgsavetimesec 1 记录了最近一次创建 RDB 文件耗费的秒数
rdbcurrentbgsavetimesec -1 如果正在创建 RDB 文件,记录当前的创建操作已经耗费的秒数
rdblastcow_size 0 上一次RBD保存操作期间写时复制的大小(以字节为单位)
aof_enabled 1 AOF是否开启
aofrewritein_progress 0 记录了是否正在创建 AOF 文件
aofrewritescheduled 0 记录了 RDB 文件创建完毕之后,是否需要执行 AOF 重写操作
aoflastrewritetimesec -1 最近一次创建 AOF 文件耗费的秒数
aofcurrentrewritetimesec -1 如果正在创建 AOF 文件,那么记录当前的创建操作耗费的秒数
aoflastbgrewrite_status ok 记录了最近一次创建 AOF 文件的状态,是成功还是失败
aoflastwrite_status ok AOF的最后写入操作的状态,是成功还是失败
aoflastcow_size 0 上一次AOF保存操作期间写时复制的大小(以字节为单位)
aofcurrentsize 4747340 AOF 文件当前的大小
aofbasesize 4746996 最近一次启动或重写时的AOF文件大小
aofpendingrewrite 0 记录了是否有 AOF 重写操作在等待 RDB 文件创建完毕之后执行
aofbufferlength 0 AOF缓冲区的大小
aofrewritebuffer_length 0 AOF 重写缓冲区的大小
aofpendingbio_fsync 0 后台 I/O 队列里面,等待执行的 fsync 数量
aofdelayedfsync 0 被延迟的 fsync 调用数量,如果该值比较大,可以开启参数:no-appendfsync-on-rewrite=yes
  • stats: Obtain some basic statistical information of the server, such as the number of processed connections, etc. # The higher the number of cache hits, it means that the cache has played a great role. For example: keyspace_hits:9808 hits keyspace_misses:1 miss

totalconnectionsreceived 11 服务器接受的连接总数totalcommandsprocessed 48 服务器已执行的命令数量instantaneousopsper_sec 0 服务器每秒钟执行的命令数量totalnetinput_bytes 1312 启动以来,流入的字节总数totalnetoutput_bytes 114800 启动以来,流出的字节总数instantaneousinputkbps 0.00 接收输入的速率(每秒)instantaneousoutputkbps 0.00 输出的速率(每秒)rejected_connections 0 由于maxclients限制而被拒绝的连接数sync_full 0 与slave full sync的次数syncpartialok 0 接受的部分重新同步(psync)请求的数量syncpartialerr 0 被拒绝的部分重新同步(psync)请求的数量expired_keys 0 key过期事件总数expiredstaleperc 0.00 过期的比率expiredtimecapreachedcount 0 过期计数evicted_keys 0 由于最大内存限制而被驱逐的key数量keyspace_hits 6 key命中次数keyspace_misses 0 key未命中次数pubsub_channels 0 发布/订阅频道的数量pubsub_patterns 0 发布/订阅的模式数量latestforkusec 803 最近一次 fork() 操作耗费的毫秒数(以微秒为单位)migratecachedsockets 0 为迁移而打开的套接字数slaveexpirestracked_keys 0 跟踪过期key数量(仅适用于可写从)activedefraghits 0 活跃碎片执行的值重新分配的数量activedefragmisses 0 活跃碎片执行的中止值重新分配的数量activedefragkey_hits 0 活跃碎片整理的key数activedefragkey_misses 0 活跃碎片整理过程跳过的key数
  • replication: Get the master-slave configuration information of the server

role master 角色(master、slave),一个从服务器也可能是另一个服务器的主服务器
connected_slaves 1 连接slave实例的个数
slave0 ip=127.0.0.1,port=6380,state=online,offset=14,lag=1 连接的slave的信息
master_replid 899b9814f2e841ca194dcc2620c83aa5df0abc10 服务器的复制ID
master_replid2 0000000000000000000000000000000000000000 第二服务器复制ID,用于故障转移后的PSYNC,用于集群等高可用之后主从节点的互换
masterreploffset 14 复制偏移量1
secondreploffset -1 第二服务器复制偏移量2
replbacklogactive 1 复制缓冲区状态
replbacklogsize 1048576 复制缓冲区的大小(以字节为单位)
replbacklogfirstbyteoffset 1 复制缓冲区的偏移量,标识当前缓冲区可用范围
replbackloghistlen 14 复制缓冲区中数据的大小(以字节为单位)
  • cpu: Get the CPU usage information of the server.

usedcpusys 2.559564 消耗的系统CPU
usedcpuuser 0.878593 消耗的用户CPU
usedcpusys_children 0.001414 后台进程占用的系统CPU
usedcpuuser_children 0.000510 后台进程占用的用户CPU
  • keyspace: Get the number of keys of each DB in the server

  • cluster: Get the cluster node information, only visible after the cluster is turned on

2. There are a large number of connections in redis

View the number of connections through info, too many redis connections will affect performance

3. Response time

1) info commandstats to obtain the statistical information of each command, and check which command operations of Redis are time-consuming

  • calls: times

  • usec: total time

  • usec_per_call: average time

2) Redis slow query log

The following two parameters can be set in the configuration file:

  • slowlog-log-slower-than 10000: in microseconds, the specified execution time exceeds how many microseconds the command will be recorded in the log

  • slowlog-max-len 128: the maximum number of slow query logs saved on the server

Check the slow query log: slowlog get

Factors Affecting Response Time:

  • Persistence: Persistence has a direct impact on performance, because not only the memory must be operated, but also the disk [will directly affect the processing time of the command]

  • 有大量的数据过期失效:内部 数据过期机制,Redis 自动删除 ,同样 需要消耗资源

五、Redis 监控体系

1.采集服务部署

1)官网下载redis_exporter:https://github.com/oliver006/redis_exporter

2)下载后进行解压

3)启动(在解压后的路径操作)

前台启动:./redis_exporter -redis.addr 127.0.0.1:6379

后台启动:nohup ./redis_exporter -redis.addr 127.0.0.1:6379 > nginx_exporter.log 2>&1 &

启动后,默认端口 9121,通过采集服务所在的IP+端口可以访问到采集的数据内容

参数说明:

  • -redis.addr:指明一个或多个 Redis 节点的地址,多个节点使用逗号分隔,默认为 redis://localhost:6379

  • -redis.password:验证 Redis 时使用的密码;

  • -redis.file:包含一个或多个redis 节点的文件路径,每行一个节点,此选项与 -redis.addr 互斥。

  • -web.listen-address:监听的地址和端口,默认为 0.0.0.0:9121

2.Prometheus配置

修改配置文件prometheus.yml,新增如下内容,修改后重启prometheus服务;

# 以下为新增内容  
- job_name: "redis"    
  static_configs:      
    - targets: ['redis机器ip:9121']  # 这个就是你的redis_exporter启动的端口

3.grafana配置

导入grafana_prometheus_redis_dashboard.json看板模板

https://download.csdn.net/download/qq_38571773/87387394

六、Redis 调优建议

Redis本身的性能足够逆天,大部分的问题在于 开发人员没有用好Redis

  1. pipelining 管道批处理:比如 将大量数据 加载到 redis就可以用 管道

  1. Redis处理命令时采取的单线程,所以不需要大量 Redis连接

  1. Redis的应用场景对速度要求很高,采用合理的数据类型,更快速的操作

例如,采用字符串 string 类型,每一次查询 必须从redis 查询所有的内容,会增加 网络消耗,增加处理的资源占用;采用 hash,可以 将 用户信息 分为多个key进行存放,查询效率高

  1. 尽量避免让开发使用复杂的命令

例如 keys,Redis 里面,标记为 O(1) 命令,代表速度快 ,O(n)、 O(logn) 速度较慢。如果你看到慢查询日志 里边大量的命令,都是 非 O(1),则需要开发优化命令

Guess you like

Origin blog.csdn.net/qq_38571773/article/details/128678078