How to efficiently insert large amounts of data into Redis

https://www.cnblogs.com/ivictor/p/5446503.html


How to efficiently insert large amounts of data into Redis

Recently, a buddy asked in the group, there is a log, which stores IP addresses (one per line), how to quickly import these IPs into Redis.

My starting suggestion is Shell + Redis client.

Today, I checked the official Redis document and found that the home page of the document ( http://www.redis.io/documentation ) has a special topic about "Redis Mass Insertion", and I realized that my suggestion is very low.

The official reasons are as follows:

Using a normal Redis client to perform mass insertion is not a good idea for a few reasons: the naive approach of sending one command after the other is slow because you have to pay for the round trip time for every command. It is possible to use pipelining, but for mass insertion of many records you need to write new commands while you read replies at the same time to make sure you are inserting as fast as possible.

Only a small percentage of clients support non-blocking I/O, and not all the clients are able to parse the replies in an efficient way in order to maximize throughput. For all this reasons the preferred way to mass import data into Redis is to generate a text file containing the Redis protocol, in raw format, in order to call the commands needed to insert the required data.

To the effect:

1> There is a round-trip delay between each redis client command.

2> As long as some clients support non-blocking I/O.

Personal understanding is that there is a certain delay from the execution of the redis command to the return of the result. Even if multiple redis clients are used for single concurrent insertion, it is difficult to improve the throughput, because only non-blocking I/O can only operate for a limited number of connections. .

 

So how to insert efficiently?

The official launched a new function in version 2.6 - pipe mode , which will import text files that support the Redis protocol directly to the server through pipe.

It's hard to say, the specific implementation steps are as follows:

1. Create a new text file containing redis commands

SET Key0 Value0
SET Key1 Value1
...
SET KeyN ValueN

If you have the original data, it is not difficult to construct this file, such as shell and python.

2. Convert these commands into Redis Protocol.

Because the Redis pipeline function supports the Redis Protocol, not the direct Redis command.

How to convert, please refer to the script below.

3. Insertion with pipe

cat data.txt | redis-cli --pipe

 

Shell VS Redis pipe

下面通过测试来具体看看Shell批量导入和Redis pipe之间的效率。

测试思路:分别通过shell脚本和Redis pipe向数据库中插入10万相同数据,查看各自所花费的时间。

 

Shell

脚本如下:

#!/bin/bash
for ((i=0;i<100000;i++))
do
echo -en "helloworld" | redis-cli -x set name$i >>redis.log
done

每次插入的值都是helloworld,但键不同,name0,name1...name99999。

 

Redis pipe

Redis pipe会稍微麻烦一点

1> 首先构造redis命令的文本文件

在这里,我选用了python

#!/usr/bin/python
for i in range(100000):
    print 'set name'+str(i),'helloworld'

# python 1.py > redis_commands.txt

# head -2 redis_commands.txt 

set name0 helloworld
set name1 helloworld

2> 将这些命令转化成Redis Protocol

在这里,我利用了github上一个shell脚本,

复制代码
#!/bin/bash

while read CMD; do
  # each command begins with *{number arguments in command}\r\n
  XS=($CMD); printf "*${#XS[@]}\r\n"
  # for each argument, we append ${length}\r\n{argument}\r\n
  for X in $CMD; do printf "\$${#X}\r\n$X\r\n"; done
done < redis_commands.txt
复制代码

# sh 20.sh > redis_data.txt

# head -7 redis_data.txt 

复制代码
*3
$3
set
$5
name0
$10
helloworld
复制代码

至此,数据构造完毕。

 

测试结果

如下:

时间消耗完全不是一个量级的。

 

最后,来看看pipe的实现原理,

  • redis-cli --pipe tries to send data as fast as possible to the server.
  • At the same time it reads data when available, trying to parse it.
  • Once there is no more data to read from stdin, it sends a special ECHO command with a random 20 bytes string: we are sure this is the latest command sent, and we are sure we can match the reply checking if we receive the same 20 bytes as a bulk reply.
  • Once this special final command is sent, the code receiving replies starts to match replies with this 20 bytes. When the matching reply is reached it can exit with success.

即它会尽可能快的将数据发送到Redis服务端,并尽可能快的读取并解析数据文件中的内容,一旦数据文件中的内容读取完了,它会发送一个带有20个字节的字符串的echo命令,Redis服务端即根据此命令来确认数据已插入完毕。

 

总结:

后续有童鞋好奇,构造redis命令的时间和将命令转化为protocol的时间,这里一并贴下:

复制代码
[root@mysql-server1 ~]# time python 1.py > redis_commands.txt

real    0m0.110s
user    0m0.070s
sys    0m0.040s
[root@mysql-server1 ~]# time sh 20.sh > redis_data.txt

real    0m7.112s
user    0m5.861s
sys    0m1.255s
复制代码

 

参考文档:

1. http://www.redis.io/topics/mass-insert

2. https://gist.github.com/abtrout/432ce44fa77a9620c739

3. http://blog.chinaunix.net/uid-26284395-id-3124337.html


Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324597867&siteId=291194637