Analysis of MySQL bulk inserts and notes

table of Contents

  1. Background

  2, in two ways Comparative

    2.1, once inserted into a data

    2.2, a plurality of data inserted

  3, expand it

  4、Other

1. Background

  The basic situation we will encounter bulk insert data into the DB at work, this time we need to choose different strategies depending on the situation.

  Once you know sql, you should know, the command inserts the data table, at least insert and replace the two, which one to use command, and their business-related;

  In this paper, insert bulk insert for elaborate and share a few considerations based on their own experience.

Comparison 2, in two ways

  Even the insert command, he also has a variety of inserted data. But here is not in-depth understanding of the underlying insert is how to do, and that is beyond the scope of my knowledge, ha ha.

  But we can roughly understand the beginning when executing commands MySQL slightly steps:

  1, first establish a connection (connection Socke);

  2, sql command Client will be executed over a TCP connection, send Server;

  Client, we can be understood as written in various languages ​​project program (client);

  Server, that database Server, responsible for the implementation.

3, after receiving the data Server database (SQL), parses sql, then treated;

4, the processing result returned to the client.

With the above process, we began to speak two insert in insert mode difference, the following table is used for testing:

CREATE TABLE `user` (
  `Id` int (11) NOT NULL AUTO_INCREMENT COMMENT 'number',
  `Name` varchar (40) NOT NULL COMMENT 'name',
  `Gender` tinyint (1) DEFAULT '0' COMMENT 'Gender: Male 1-; 2- F',
  `Addr` varchar (40) NOT NULL COMMENT 'address',
  `Status` tinyint (1) DEFAULT '1' COMMENT 'is valid: 1 - valid; 2- invalid',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4

2.1, once inserted into a data

  Initially learning database, we know that can be achieved using insert insert data, such as inserting data into a user table:

mysql> insert into user (id, name, gender, addr, status) values (1, 'aaa', 1, 'beijing', 1);
Query OK, 1 row affected (0.00 sec)

mysql> select * from user;
+----+------+--------+---------+--------+
| id | name | gender | addr    | status |
+----+------+--------+---------+--------+
|  1 | aaa  |      1 | beijing |      1 |
+----+------+--------+---------+--------+
1 row in set (0.00 sec)

  This is the easiest way, of course, this is on the command line, of course, is also a command-line client;

  If you are a client of our program code, such as Java use jdbc to perform sql, also passed to insert commands executed MySQL Server above;

  The above insert command is indeed able to insert data, that is, each command to execute an insert, you need to send commands to the MySQL Server running resolved through the network, if there are tens of millions of rows of data need to be inserted, then it is not the need for thousands of million times connectionless transport it? Although you can now use the connection pool, but the number of transmissions that is the escape of.

  Use insert time you insert a data in this way, the vast majority are using this way, to insert a small amount of data ! ! !

  If a large amount of data storage in this way, ha ha, spend a lot of time to drink a cup of coffee.

2.2, a plurality of data inserted

  As already said, and insert a piece of data is the main drawback: the need to establish a connection N times, then N transport connection, because there is a connection pool, you can ignore the connection time-consuming, but time-consuming transmission N times, should not be overlooked, so we can consider optimizing this respect.

  For example, a worker in charge of the bricks 100 to move from Point A Point B, each conveying a brick, a cost per unit time, then to move those bricks 100, 100 needs a unit time (not considering round trip);

  If the first move 5 bricks, then only 20 units of time, is not it a lot faster?

  Similarly, we can also use the insert bulk insert data:

insert into user 
	(id, name, gender, addr, status) 
values 
	(2, 'bbb' 0, 'Shanghai', 1),
	(3, 'ccc', 1, 'hangzhou', 0),
	(4, 'ddd', 0, 'chongqing', 0);

  This data can be inserted into the disposable three.

  For the client only needs to be spliced sql statement can then sql after stitching a lump sum MySQL Server on it.

  Note, SQL to be spliced, instead of saying pre- ! ! !

  Effect of pretreatment is to avoid frequent SQL compiler, SQL injection; pretreatment using bulk insert, each cycle using the provided placeholder, and this is equivalent to one insertion of a command, as in the following examples, in fact, performed a recording section 3 is inserted:

<?php
    $pdo = new PDO("mysql:host=localhost;dbname=test","root","root");
    $sql = "insert into user (id, name, gender, addr, status) values (?,?,?,?,?)";
    $stmt = $pdo->prepare($sql);

    $stmt->execute(array("5", "eee", "1", "PEK", 1));
    $stmt->execute(array("6", "fff", "0", "SHA", 0));
    $stmt->execute(array("7", "ggg", "1", "LNL", 1));
 ?>

  Right way:

<?php
    $pdo = new PDO("mysql:host=localhost;dbname=test","root","root");
    $sql = 'insert into user (id, name, gender, addr, status) values ';

    // You can use the cycle sql splicing
    $sql .= '("5", "eee", "1", "PEK", 1),';
    $sql .= '("6", "fff", "0", "SHA", 0),';
    $sql .= '("7", "ggg", "1", "LNL", 1)'; 

    $pdo->exec($sql);
 ?>

  If you are using native Java JDBC, stitching were the same as above, do not write code;

  If you use Java Mybatis, you can use <foreach> tag,

<insert id="batchInsert" parameterType="list">
    insert ignore into user (id, name, gender, addr, status) values
    <foreach collection="list" item="item" separator=",">
        (
	        #{item.id,jdbcType=INT}, 
	        #{item.name,jdbcType=VARCHAR}, 
	        #{item.gender,jdbcType=BIT},
	        #{item.addr,jdbcType=VARCHAR}, 
	        #{item.status,jdbcType=BIT}
        )
    </foreach>
</insert>

3, expanding it

  Bulk insert, the amount of each insert is how much better?

  Workers moving bricks at the example above, the first move 5 bricks require 20 units of time, it would mean that once moved 100 bricks, only one unit of time? This is logical, but this is not enough, need to see the actual situation! ! !

  What the actual situation is it? Hard to say, for example, a relatively strong worker, a 100 bricks, is not difficult; not so strong if workers turn, a 100 bricks, may direct the workers to dry down, a brick can not move, this time can be more than 100 per unit time.

  In addition, B point put the brick, it is not able to receive the first 100 bricks, which is a problem.

  The above example, the analogy to insert bulk insert, you need to pay attention to:

  1, the amount of data a bulk insert, the large amount of data transmission in the network events the longer, the greater the problem may also be set according to the situation;

  2, in addition to the network, but also the machine configuration, MySQL Server Configuration worse, sql written better, efficiency is not too high;

  3, another batch of the word refers to insert multiple pieces of data, we note that in addition to the number of pieces of data, but also pay attention to a size of the data, for example: a record of the amount of data such as data 1M, 10 records amount to 10M, then a plug 100, 100M data, hey, you give it a try! ! So, how much data is inserted once, must pass several tests before deciding, others once a plug 100 best, you might once plug 10 was the best, there is no absolute optimal value ( bulk inserts are not always specific high efficiency single insert ).

  4, there is a parameter database, the max_allowed_packet, i.e. each packet (sql) command size, default is 1M, then the length is greater than 1M sql error is reported. You might say, let's put this parameter is set to 10M, 100M is not on the list? ? ? That's right, no wrong, but you are a DBA do? You have access to it? Even turn up this argument, you want to know the impact of this you can be more than a table, but the entire DB Server, but that the impact of many libraries, many tables.

  5, not bulk insert the sooner the better, we might hope the sooner the better, this is normal, save time thing. But we must know, the database points to read and write, there are clusters, which means that need to be synchronized ! ! ! If you watched the case library sub-table partition, if you insert a short time too much data, database synchronization will be more likely to get lost, read and write data inconsistencies inevitable circumstances, it may be because the bulk insert a table , affecting the entire sync DB service group, but also consider concurrency issues, ha ha ha. 

4、Other

  Can pay attention to what I wrote above insert statement, the basic commands are written each insert fields, as follows:

insert into user (id, name, gender, addr, status) values (1, 'aaa', 1, 'beijing', 1);

  In fact, I know the order of the fields in the table, can omit field names, as follows:

insert into user values (1, 'aaa', 1, 'beijing', 1);

  Efficiency of these two methods, here is not talked about, but the first way, there are advantages in certain scenes, for example: for example, the user table increase create_time, update_time:

CREATE TABLE `user` (
  `Id` int (11) NOT NULL AUTO_INCREMENT COMMENT 'number',
  `Name` varchar (40) NOT NULL COMMENT 'name',
  `Gender` tinyint (1) DEFAULT '0' COMMENT 'Gender: Male 1-; 2- F',
  `Addr` varchar (40) NOT NULL COMMENT 'address',
  `Status` tinyint (1) DEFAULT '1' COMMENT 'valid',
  `create_time` timestamp  DEFAULT CURRENT_TIMESTAMP,
  `update_time` timestamp  DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

  If there is no mandatory create_time and update_time must be received from the client, you can use the default value, insert the following statement when not:

--- create_time force transfer value and use the client update_time
insert into user 
	(id, name, gender, addr, status, create_time, update_time) 
values 
	(2, 'bbb' 0, 'Shanghai', 1 '09/11/2019 18:00:00 "," 18:00:00 9/11/2019'),
	(3, 'ccc', 1, 'hangzhou', 0, '2019-11-09 18:00:00', '2019-11-09 18:00:00'),
	(4, 'ddd', 0, 'chongqing', 0, '2019-11-09 18:00:00', '2019-11-09 18:00:00');

--- and the create_time update_time client does not need to enforce transfer values, default values ​​may be used
insert into user 
	(id, name, gender, addr, status) 
values 
	(2, 'bbb' 0, 'Shanghai', 1),
	(3, 'ccc', 1, 'hangzhou', 0),
	(4, 'ddd', 0, 'chongqing', 0);

  Similarly, there are default values ​​for some fields, and the batch inserted, use the default value, this field is omitted, because splicing sql time may be less splicing point, a data network transmission of a little less, can improve the point is that it, this will have to see the actual situation.

Guess you like

Origin www.cnblogs.com/xyy2019/p/11826680.html