一、上讲总结
你好,我是倪朋飞。
在访问商品搜索接口时,我们发现接口的响应特别慢。通过对系统 CPU、内存和磁盘 I/O等资源使用情况的分析,我们发现这时出现了磁盘的 I/O 瓶颈,并且正是案例应用导致的。
接着,我们借助 pidstat,发现罪魁祸首是 mysqld 进程。我们又通过 strace、lsof,找出了 mysqld 正在读的文件。根据文件的名字和路径,我们找出了 mysqld 正在操作的数据
库和数据表。综合这些信息,我们猜测这是一个没利用索引导致的慢查询问题。
为了验证猜测,我们到 MySQL 命令行终端,使用数据库分析工具发现,案例应用访问的字段果然没有索引。既然猜测是正确的,那增加索引后,问题就自然解决了。
从这个案例你会发现,MySQL 的 MyISAM 引擎,主要依赖系统缓存加速磁盘 I/O 的访问。可如果系统中还有其他应用同时运行, MyISAM 引擎很难充分利用系统缓存。缓
存可能会被其他应用程序占用,甚至被清理掉。
所以,一般我并不建议,把应用程序的性能优化完全建立在系统缓存上。最好能在应用程序的内部分配内存,构建完全自主控制的缓存;或者使用第三方的缓存应用,
比如Memcached、Redis 等。
Redis 是最常用的键值存储系统之一,常用作数据库、高速缓存和消息队列代理等。Redis基于内存来存储数据,不过,为了保证在服务器异常时数据不丢失,很多情况下,我们要
为它配置持久化,而这就可能会引发磁盘 I/O 的性能问题。
今天,我就带你一起来分析一个利用 Redis 作为缓存的案例。这同样是一个基于 PythonFlask 的应用程序,它提供了一个 查询缓存的接口,但接口的响应时间比较长,并不能满
足线上系统的要求。
非常感谢携程系统研发部资深后端工程师董国星,帮助提供了今天的案例
二、案例准备
1、系统运行环境
本次案例还是基于 Ubuntu 18.04,同样适用于其他的 Linux 系统。我使用的案例环境如下所示:
机器配置:2 CPU,8GB 内存 预先安装 docker、sysstat 、git、make 等工具,如 apt install docker.io sysstat
2、服务运行环境
今天的案例由 Python 应用 +Redis 两部分组成。其中,Python 应用是一个基于 Flask 的应用,它会利用 Redis ,来管理应用程序的缓存,并对外提供三个 HTTP 接口:
/:返回 hello redis; /init/:插入指定数量的缓存数据,如果不指定数量,默认的是 5000 条; 缓存的键格式为 uuid: 缓存的值为 good、bad 或 normal 三者之一 /get_cache/<type_name>:查询指定值的缓存数据,并返回处理时间。其中, type_name 参数只支持 good, bad 和 normal(也就是找出具有相同 value 的 key 列 表)。
由于应用比较多,为了方便你运行,我把它们打包成了两个 Docker 镜像,并推送到了Github 上。这样你就只需要运行几条命令,就可以启动了。
3、服务架构图
今天的案例需要两台虚拟机,其中一台用作案例分析的目标机器,运行 Flask 应用,它的IP 地址是 192.168.0.10;而另一台作为客户端,请求缓存查询接口。我画了一张图来表示
它们的关系。
接下来,打开两个终端,分别 SSH 登录到这两台虚拟机中,并在第一台虚拟机中安装上述工具。
跟以前一样,案例中所有命令都默认以 root 用户运行,如果你是用普通用户身份登陆系统,请运行 sudo su root 命令切换到 root 用户。
到这里,准备工作就完成了。接下来,我们正式进入操作环节。
三、案例分析
首先,我们在第一个终端中,执行下面的命令,运行本次案例要分析的目标应用。正常情况下,你应该可以看到下面的输出:
[root@luoahong ~]# [root@luoahong ~]# docker run --name=redis -itd -p 10000:80 feisky/redis-server Unable to find image 'feisky/redis-server:latest' locally latest: Pulling from feisky/redis-server cd784148e348: Pull complete 48d4c7155ddc: Pull complete 6d908603dbe8: Pull complete 0b981e82e1e2: Pull complete 7074f4a1fd03: Pull complete 447ac2b250dc: Pull complete b6d44ce71e94: Pull complete Digest: sha256:a69d39256eb970ab0d87a70d53fa2666d0c32cbf68fb316ef016efd513806146 Status: Downloaded newer image for feisky/redis-server:latest 7a2773579d069d6f30cba4c769c7813104c29ade3b70fdb341bf635900c07150 [root@luoahong ~]# docker run --name=app --network=container:redis -itd feisky/redis-app Unable to find image 'feisky/redis-app:latest' locally latest: Pulling from feisky/redis-app 54f7e8ac135a: Already exists d6341e30912f: Already exists 087a57faf949: Already exists 5d71636fb824: Already exists 0c1db9598990: Already exists 2eeb5ce9b924: Already exists d3029c597b32: Pull complete 265a9c957eba: Pull complete 3bb7ae9463c5: Pull complete b3198935e7ab: Pull complete ca3ab58d03d9: Pull complete Digest: sha256:ac281cbb37e35eccb880a58b75f244ef19a9a8704a43ae274c12e6576ed6082b Status: Downloaded newer image for feisky/redis-app:latest bd81cb48d9e6af842fb6f049f7c639ff0345ffdeb888eb61cf6a0487e3c5751
然后,再运行 docker ps 命令,确认两个容器都处于运行(Up)状态:
[root@luoahong ~]# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES bd81cb48d9e6 feisky/redis-app "python /app.py" About a minute ago Up About a minute app 7a2773579d06 feisky/redis-server "docker-entrypoint.s…" 2 minutes ago Up 2 minutes 6379/tcp, 0.0.0.0:10000->80/tcp redis
今天的应用在 10000 端口监听,所以你可以通过 http://192.168.0.10:10000 ,来访问前面提到的三个接口。
比如,我们切换到第二个终端,使用 curl 工具,访问应用首页。如果你看到 helloredis 的输出,说明应用正常启动:
[root@luoahong ~]# curl http://192.168.118.115:10000/ hello redis
代码的段1
[root@luoahong ~]# curl http://192.168.118.115:10000/init/5000 {"elapsed_seconds":6.553544521331787,"keys_initialized":5000} [root@luoahong ~]# curl http://192.168.118.115:10000/init/10000 {"elapsed_seconds":11.853578329086304,"keys_initialized":10000} [root@luoahong ~]# curl http://192.168.118.115:10000/init/20000 {"elapsed_seconds":24.469749689102173,"keys_initialized":20000}
代码的段1
[root@luoahong ~]# curl http://192.168.118.115:10000/get_cache c110002","14f7eeb8-838d-11e9-af71-0242ac110002","1ebceac0-838d-11e9-af71-0242ac110002","198488ba-838d-11e9-af71-0242ac110002"], "elapsed_seconds":13.855976581573486,"type":"good"}
代码的段1
代码的段1
代码的段1
[root@luoahong ~]# iostat -d -x 1 Linux 5.1.0-1.el7.elrepo.x86_64 (luoahong) 05/31/2019 _x86_64_ (2 CPU) Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz aqu-sz %util sda 4.85 262.58 0.00 0.09 15.77 54.16 585.42 1642.26 3.04 0.52 0.51 2.81 0.00 0.00 0.00 0.00 0.00 0.00 0.16 41.97 Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz aqu-sz %util sda 0.00 0.00 0.00 0.00 0.00 0.00 964.00 2442.00 0.00 0.00 0.37 2.53 0.00 0.00 0.00 0.00 0.00 0.00 0.03 75.50 Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz aqu-sz %util sda 0.00 0.00 0.00 0.00 0.00 0.00 934.00 2363.00 0.00 0.00 0.38 2.53 0.00 0.00 0.00 0.00 0.00 0.00 0.03 73.40 Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz aqu-sz %util sda 0.00 0.00 0.00 0.00 0.00 0.00 859.00 2177.00 0.00 0.00 0.42 2.53 0.00 0.00 0.00 0.00 0.00 0.00 0.06 69.70 Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz aqu-sz %util sda 0.00 0.00 0.00 0.00 0.00 0.00 861.00 2179.00 0.00 0.00 0.40 2.53 0.00 0.00 0.00 0.00 0.00 0.00 0.03 71.60
代码的段1
[root@luoahong ~]# pidstat -d 1 06:33:54 PM UID PID kB_rd/s kB_wr/s kB_ccwr/s iodelay Command 06:33:55 PM 100 9844 0.00 1200.00 0.00 44 redis-server 06:33:55 PM 0 10080 0.00 0.00 0.00 1 kworker/1:0-events_power_efficient 06:34:14 PM UID PID kB_rd/s kB_wr/s kB_ccwr/s iodelay Command 06:34:15 PM 100 9844 0.00 1928.00 0.00 21 redis-server
代码的段1
[root@luoahong ~]# strace -f -T -tt -p 9844 [pid 9844] 18:36:14.677753 read(8, "*2\r\n$3\r\nGET\r\n$41\r\nuuid:b9a3a104-"..., 16384) = 61 <0.000526> [pid 9844] 18:36:14.678486 read(3, 0x7fff49f5bdc7, 1) = -1 EAGAIN (Resource temporarily unavailable) <0.000531> [pid 9844] 18:36:14.679259 write(8, "$4\r\ngood\r\n", 10) = 10 <0.000500> [pid 9844] 18:36:14.680151 epoll_pwait(5, [{EPOLLIN, {u32=8, u64=8}}], 10128, 61, NULL, 8) = 1 <0.000353> [pid 9844] 18:36:14.680711 read(8, "*3\r\n$4\r\nSADD\r\n$4\r\ngood\r\n$36\r\nb9a"..., 16384) = 67 <0.000553> [pid 9844] 18:36:14.681469 read(3, 0x7fff49f5bdc7, 1) = -1 EAGAIN (Resource temporarily unavailable) <0.000523> [pid 9844] 18:36:14.682197 write(7, "*3\r\n$4\r\nSADD\r\n$4\r\ngood\r\n$36\r\nb9a"..., 67) = 67 <0.000393> [pid 9844] 18:36:14.682837 fdatasync(7) = 0 <0.014043> [pid 9844] 18:36:14.697125 write(8, ":1\r\n", 4) = 4 <0.000339> [pid 9844] 18:36:14.697776 epoll_pwait(5, [{EPOLLIN, {u32=8, u64=8}}], 10128, 43, NULL, 8) = 1 <0.000373> [pid 9844] 18:36:14.698545 read(8, "*2\r\n$3\r\nGET\r\n$41\r\nuuid:a498b6dc-"..., 16384) = 61 <0.000366> [pid 9844] 18:36:14.699243 read(3, 0x7fff49f5bdc7, 1) = -1 EAGAIN (Resource temporarily unavailable) <0.000329> [pid 9844] 18:36:14.699922 write(8, "$4\r\ngood\r\n", 10) = 10 <0.000355> [pid 9844] 18:36:14.700554 epoll_pwait(5, [{EPOLLIN, {u32=8, u64=8}}], 10128, 40, NULL, 8) = 1 <0.000345> [pid 9844] 18:36:14.701020 read(8, "*3\r\n$4\r\nSADD\r\n$4\r\ngood\r\n$36\r\na49"..., 16384) = 67 <0.000351> [pid 9844] 18:36:14.701765 read(3, 0x7fff49f5bdc7, 1) = -1 EAGAIN (Resource temporarily unavailable) <0.000377> [pid 9844] 18:36:14.702537 write(7, "*3\r\n$4\r\nSADD\r\n$4\r\ngood\r\n$36\r\na49"..., 67) = 67 <0.000353> [pid 9844] 18:36:14.703006 fdatasync(7) = 0 <0.003890> [pid 9844] 18:36:14.707026 write(8, ":1\r\n", 4) = 4 <0.000383> [pid 9844] 18:36:14.707684 epoll_pwait(5, [{EPOLLIN, {u32=8, u64=8}}], 10128, 33, NULL, 8) = 1 <0.000314> [pid 9844] 18:36:14.708436 read(8, ^Cstrace: Process 9844 detached <detached ...>
代码的段1
[root@luoahong ~]# lsof -p 9844 lsof: no pwd entry for UID 100 lsof: no pwd entry for UID 100 lsof: no pwd entry for UID 100 lsof: no pwd entry for UID 100 lsof: no pwd entry for UID 100 lsof: no pwd entry for UID 100 lsof: no pwd entry for UID 100 lsof: no pwd entry for UID 100 lsof: no pwd entry for UID 100 lsof: no pwd entry for UID 100 lsof: no pwd entry for UID 100 lsof: no pwd entry for UID 100 lsof: no pwd entry for UID 100 lsof: no pwd entry for UID 100 lsof: no pwd entry for UID 100 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME lsof: no pwd entry for UID 100 redis-ser 9844 100 cwd DIR 8,2 44 39165095 /data lsof: no pwd entry for UID 100 redis-ser 9844 100 rtd DIR 0,42 6 39165093 / lsof: no pwd entry for UID 100 redis-ser 9844 100 txt REG 0,42 8191592 100988460 /usr/local/bin/redis-server lsof: no pwd entry for UID 100 redis-ser 9844 100 mem REG 8,2 100988460 /usr/local/bin/redis-server (stat: No such file or directory) lsof: no pwd entry for UID 100 redis-ser 9844 100 mem REG 8,2 42154694 /etc/localtime (path inode=100763704) lsof: no pwd entry for UID 100 redis-ser 9844 100 mem REG 8,2 69138393 /lib/ld-musl-x86_64.so.1 (stat: No such file or directory) lsof: no pwd entry for UID 100 redis-ser 9844 100 0u CHR 136,0 0t0 3 /dev/pts/0 lsof: no pwd entry for UID 100 redis-ser 9844 100 1u CHR 136,0 0t0 3 /dev/pts/0 lsof: no pwd entry for UID 100 redis-ser 9844 100 2u CHR 136,0 0t0 3 /dev/pts/0 lsof: no pwd entry for UID 100 redis-ser 9844 100 3r FIFO 0,12 0t0 51238 pipe lsof: no pwd entry for UID 100 redis-ser 9844 100 4w FIFO 0,12 0t0 51238 pipe lsof: no pwd entry for UID 100 redis-ser 9844 100 5u a_inode 0,13 0 9881 [eventpoll] lsof: no pwd entry for UID 100 redis-ser 9844 100 6u sock 0,9 0t0 51240 protocol: TCP lsof: no pwd entry for UID 100 redis-ser 9844 100 7w REG 8,2 43817347 39165098 /data/appendonly.aof lsof: no pwd entry for UID 100 redis-ser 9844 100 8u sock 0,9 0t0 52244 protocol: TCP
代码的段1
代码的段1
[root@luoahong ~]# strace -f -p 9844 -T -tt -e fdatasync strace: Process 9844 attached with 4 threads [pid 9844] 18:43:47.338221 fdatasync(7) = 0 <0.001130> [pid 9844] 18:43:47.346831 fdatasync(7) = 0 <0.001628> [pid 9844] 18:43:47.364249 fdatasync(7) = 0 <0.004883> [pid 9844] 18:43:47.379078 fdatasync(7) = 0 <0.001171> [pid 9844] 18:43:47.386578 fdatasync(7) = 0 <0.002402> [pid 9844] 18:43:47.393262 fdatasync(7) = 0 <0.001068> [pid 9844] 18:43:47.399585 fdatasync(7) = 0 <0.001528> [pid 9844] 18:43:47.410446 fdatasync(7) = 0 <0.008607> [pid 9844] 18:43:47.433699 fdatasync(7) = 0 <0.001223> [pid 9844] 18:43:47.442298 fdatasync(7) = 0 <0.002179> [pid 9844] 18:43:47.451013 fdatasync(7) = 0 <0.001091>
代码的段1
代码的段1
[root@luoahong ~]# PID=$(docker inspect --format {{.State.Pid}} app) [root@luoahong ~]# nsenter --target $PID --net -- lsof -i lsof: no pwd entry for UID 100 lsof: no pwd entry for UID 100 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME lsof: no pwd entry for UID 100 redis-ser 9844 100 6u IPv4 51240 0t0 TCP localhost:6379 (LISTEN) lsof: no pwd entry for UID 100 redis-ser 9844 100 8u IPv4 52244 0t0 TCP localhost:6379->localhost:54074 (ESTABLISHED) python 9953 root 3u IPv4 52217 0t0 TCP *:http (LISTEN) python 9953 root 4u IPv4 194915 0t0 TCP luoahong:http->192.168.118.115:60266 (ESTABLISHED) python 9953 root 5u IPv4 53526 0t0 TCP localhost:54074->localhost:6379 (ESTABLISHED)
代码的段1
[root@luoahong ~]# docker run --rm -v /usr/local/bin:/target jpetazzo/nsenter Unable to find image 'jpetazzo/nsenter:latest' locally latest: Pulling from jpetazzo/nsenter ff4229790957: Pull complete c6e9de17d69e: Pull complete Digest: sha256:e1722e1503b24eb17daa5cb530766bac840c064c5b065861616eec825ef9953b Status: Downloaded newer image for jpetazzo/nsenter:latest Installing nsenter to /target Installing docker-enter to /target Installing importenv to /targe
代码的段1
[root@luoahong ~]# docker exec -it redis redis-cli config set appendfsync everysec OK
代码的段1
代码的段1
def get_cache(type_name): '''handler for /get_cache''' for key in redis_client.scan_iter("uuid:*"): value = redis_client.get(key) if value == type_name: redis_client.sadd(type_name, key[5:]) data = list(redis_client.smembers(type_name)) redis_client.delete(type_name) return jsonify({"type": type_name, 'count': len(data), 'data': data})
代码的段1
代码的段1
代码的段1
代码的段1
代码的段1
代码的段1
代码的段1
代码的段1
代码的段1
代码的段1
代码的段1
代码的段1
代码的段1
代码的段1
代码的段1
代码的段1
代码的段1
代码的段1
代码的段1
代码的段1
代码的段1
代码的段1
代码的段1
代码的段1
代码的段1
代码的段1
代码的段1
代码的段1
代码的段1