A News
News follows
php engineer to perform redis keys * database downtime resulting in
a technical department occurred since the year 2 PO level accidents, resulting in the loss of 4 million of company funds, for the following reasons:
Since php engineer directly on-line operation redis
, performed
keys * wxdb(此处省略)cf8*
Such commands result in redis
lock, causes the CPU to surge, causing payment jammed links, like ten seconds after the end, all requests to squeeze all the traffic database rds the avalanche effect generated database, the database occurs dang machine event.
The company said similar incidents such as recidivism, will be directly expelled, and that will gradually recover after the operation and maintenance department permission.
text
An iron law
In the industry, redis
there is shown a development specification iron law as follows
Online Redis prohibited Keys regular matching operation
However, we all know, has been forgotten, so accidents will continue to happen.
Here to talk about the implementation of the regular match online operation, causing an avalanche cache, database eventually causes of downtime.
Analyze the reasons
OK, first say a few words nonsense
1, redis single-threaded, all operations are atomic, no abnormal data due to concurrent
2, time-consuming Redis high command is very dangerous, take up a lot of processing time only one thread, causing all requests are slowing down. (E.g., time complexity is O (N) of the KEYS command strictly prohibit the use in a production environment)
There are above two to pave the way, the reason is obvious.
-
(1) operation and maintenance personnel to
keys *
operate, the operation is relatively time-consuming, and because theredis
single-threaded, itredis
is locked. -
(2) At this
QPS
relatively high, again tens of thousands ofredis
read and write requests, asredis
is locked, so allHang
there. -
(3) because too many threads
Hang
there,CPU
serious soared, causing theredis
host server goes down -
(4) all of the threads
redis
that to get any data, take a moment full of data to the database, the database is down.
It should be noted that the same command is not only dangerous keys *
, there are the following groups
Flushdb 命令用于清空当前数据库中的所有 key
Flushall 命令用于清空整个 Redis 服务器的数据(删除所有数据库的所有 key )
CONFIG 客户端连接后可配置服务器
Therefore, a qualified redis
operation and maintenance or development, should know how to disable the above command. So I always think the reason that situation appears in the news, the general level of the problem is the staff.
How to disable these commands do?
That it is, redis.conf
in, in SECURITY
this one, we add the following command:
rename-command FLUSHALL ""
rename-command FLUSHDB ""
rename-command CONFIG ""
rename-command KEYS ""
In addition, for the FLUSHALL
command, you need to set the configuration file appendonly no
or the server is unable to start
Note, the above commands may have missed, you can check the official documentation. In addition to Flushdb
such and redis
safety hazards related to command the accident, found that whenever the time complexity O(N)
of command, must be careful, do not use in production. For example hgetall
, lrange
, smembers
, zrange
, sinter
and other commands that are not can not be used, but the time complexity of these commands are O(N)
, use these commands need to clear N
the value, otherwise there will be cached downtime.
Suggestions for improvement
The industry recommends using scan
commands to improve keys
and SMEMBERS
command
redis2.8 later versions have a new command scan, can be used to scan batches redis record, this will certainly lead to the total time consumed by the entire query increases, but does not affect redis service Caton, affecting service to use.
Specific use, you can get access to your details below of this documenthttp://doc.redisfans.com/key/scan.html