#IT star is not a dream # Mysql data on the order of one million Redis quickly import data

Foreword

With the operation of the system, the amount of data becomes more and more, simply will in mysql, already can not meet the data storage requirements of the inquiry, this time we introduce a Redis as query caching layer, heat will save business data to Redis , extension of traditional relational database service capabilities, users get through the application directly from the Redis common data quickly, or use Redis to save active user sessions in interactive applications, can greatly reduce the load on back-end relational databases, upgrade user experience.

Shortcomings of the traditional command

Using conventional redis client command has the following disadvantages in large data import scenarios:

Since redis model is a single-threaded, multi-threading is avoided although the thread switch time-consuming, sequential single execution command quickly, but in large quantities of data import scene, and the reception time sent by the server in response to the results of the command takes time taken will be amplified.

If you need to import data 1,000,000, that just command execution time, it takes 1,000,000 * (t1 + t2).

image.png


In addition to sending commands one by one, of course redis design will certainly consider this issue, so there pipelining pipeline mode.

image.png

But pipelining the command line is not, so we need to write a new handling code to receive the response quantities. But only very little support client-side code, such as php-redis extension does not support asynchronous.

pipelining pipeline mode, in fact, reduces the interaction time of the TCP connection, when a batch of commands is finished, send a one-time result.

The principle is to use its FIFO (First In First Out) queue to ensure that the order of the data.

Only a small portion of the client supports non-blocking I / O, not all clients are able to be an effective way to resolve the answer to maximize throughput.

For these reasons, the preferred method of introducing into the vast amounts of data is formed including Redis Redis protocol data format, the batch is transmitted in the past.

Data Import Redis warm-up

Command to import data using nc

nc is netcat shorthand, nc role are:

(1) arbitrary TCP / UDP port listener, after increasing the parameter -l, NC as TCP or UDP server in listening mode designated port

(2) the port scan, nc being the client may initiate a TCP or UDP connection

(3) to transfer files between machines

(4) between the machine speed network

image.png

Import data using the model pipe

However, the use nc listener is not a very reliable way to perform large-scale data import, because netcat do not really know when to transmit all the data, can not check for errors. In version 2.6 or higher in Redis, Redis -cli script supports a new model called pipe pipeline mode, this model is inserted in order to perform large-scale and design.
Use pipeline mode command operates as follows:

image.png

From the above chart, you can see the results of the command return pipe, how many lines there are in command txt file and return the number is the number of replies,
errors indicate the number of execution in which the wrong command.

redis learning agreement

Protocol format is as follows:

* <Number of parameters> \ R & lt \ n- 
<number of bytes of the parameter 1> $ \ R & lt \ n- 
<data parameter 1> \ R & lt \ n- 
... 
$ <number of bytes of the parameter N> \ R & lt \ n- 
<Parameter N data> \ r \ n

For example:
Insert a hash of the data type.

HSET  id  book1  book_description1

Redis according to the agreement, a total of four parts, so beginning with * 4, the rest is explained as follows:

content length Protocol command
HSET 4 $4
id 2 $2
book1 5 $5
book_description1 17 $17

Note: HSET command itself is one of the parameters as the protocol to send.

Protocol data structure constructed:

*4\r\n$4\r\nHSET\r\n$2\r\nid\r\n$5\r\nbook1\r\n$17\r\nbook_description1\r\n

格式化一下:

*4\r\n
$4\r\n
HSET\r\n
$2\r\n
idvvvv\r\n
$5\r\n
book1\r\n
$17\r\n
book_description1\r\n

RESP协议 bulk

Redis客户机使用一种称为RESP (Redis序列化协议)的协议与Redis服务器通信。

redis-cli pipe模式需要和nc命令一样快,并且解决了nc命令不知道何时命令结束的问题。

在发送数据的同时,它同样会去读取响应,尝试去解析。

一旦输入流中没有读取到更多的数据之后,它就会发送一个特殊的20比特的echo命令,标识最后一个命令已经发送完毕
如果在响应结果中匹配到这个相同数据后,说明本次批量发送是成功的。

使用这个技巧,我们不需要解析发送给服务器的协议来了解我们发送了多少命令,只需要解析应答即可。

在解析应答时,redis会对解析的应答进行一个计数,在最后能够告诉用户大量插入会话向服务器传输的命令的数量。也就是上面我们使用pipe模式实际操作的响应结果。

将输入数据源换成mysql

上面的例子中,我们以一个txt文本为输入数据源,使用了pipe模式导入数据。

基于上述协议的学习和理解,我们只需要将mysql中的数据按照既定的协议通过pipe模式导入Redis即可。

实际案例—从Mysql导入百万级数据到Redis

首先造数据

由于环境限制,所以这里没有用真实数据来实现导入,那么我们就先使用一个存储过程来造一百万条数据把。使用存储过程如下:

DELIMITER $$
USE `cb_mon`$$

DROP PROCEDURE IF EXISTS `test_insert`$$
CREATE DEFINER=`root`@`%` PROCEDURE `test_insert`()
BEGIN

        DECLARE i INT DEFAULT 1;
        WHILE i<= 1000000
            DO
            INSERT INTO t_book(id,number,NAME,descrition)
            VALUES (i, CONCAT("00000",i) , CONCAT('book',i)
            , CONCAT('book_description',i));    
            SET i=i+1;
        END WHILE ;
        COMMIT;
    END$$

DELIMITER ;

调用存储过程:

 CALL test_insert();

查看表数据:

按协议构造查询语句

按照上述redis协议,我们使用如下sql来构造协议数据

SELECT
  CONCAT(
    "*4\r\n",
    "$",
    LENGTH(redis_cmd),
    "\r\n",
    redis_cmd,
    "\r\n",
    "$",
    LENGTH(redis_key),
    "\r\n",
    redis_key,
    "\r\n",
    "$",
    LENGTH(hkey),
    "\r\n",
    hkey,
    "\r\n",
    "$",
    LENGTH(hval),
    "\r\n",
    hval,
    "\r"
  )
FROM
  (SELECT
    "HSET" AS redis_cmd,
    id AS redis_key,
    NAME AS hkey,
    descrition AS hval
  FROM
    cb_mon.t_book
  ) AS t limit 1000000

并将内容保存至redis.sql 文件中。

编写脚本使用pipe模式导入redis

编写shell脚本。由于我在主机上是通过docker安装的redis和mysql,以下脚本供参考:

image.png

#!/bin/bash

starttime=`date +'%Y-%m-%d %H:%M:%S'`

docker exec -i 899fe01d4dbc mysql --default-character-set=utf8   
--skip-column-names --raw < ./redis.sql
| docker exec -i 4c90ef506acd redis-cli --pipe

endtime=`date +'%Y-%m-%d %H:%M:%S'`
start_seconds=$(date --date="$starttime" +%s);
end_seconds=$(date --date="$endtime" +%s);

echo "脚本执行耗时: "$((end_seconds-start_seconds))"s"

执行截图:

image.png

可以看到百万级的数据导入redis,只花费了7秒,效率非常高。

注意事项

如果mysql表特别大,可以考虑分批导入,或者将表拆分,否则在导入过程中可能会发生

lost connection to mysql server during query

由于max_allowed_packed和超时时间限制,查询数据的过程中,可能会造成连接断开,所以在数据表的数据量特别大的时候,需要分页或者将表拆分导入。

总结

This article focuses on the next Mysql data on the order of one million, how to efficiently migrate to Redis go, and gradually achieve the goal of the course, summed up the following points

  1. Run redis single thread, the thread switching time is avoided consumed, but in order of large data, transmits, in response to receiving the delay can not be ignored.

  2. Nc network scenarios command, and import data in the presence of disadvantages.

  3. Redis RESP understanding and application of the agreement.

  4. Redis a mega Mysql data quickly import case.


In order to facilitate study and discussion, I have created a java mutual tackling difficult family, and other traditional learning different exchanges. This group focuses on solving difficult problems in the project, the project in the face of difficult to solve.

When questions, they can ask for help in this big family.


No reply [public] to enter the answer to the question: Basic data in java wrapper class Integer type?

If you have experienced problems encountered in the project, can not start,

There may be others can give you some ideas and views, one hundred people have one hundred kinds of ideas,

Similarly, if you are willing to help others, that solve the problems encountered by others, too, it is a kind of exercise for you.

Xia dream of notes to guide the development of attention .gif

Guess you like

Origin blog.51cto.com/9740301/2470534