Mysql on duplicate key update usage and advantages and disadvantages

In practical applications, the function of importing data is often encountered. When the imported data does not exist, it is added, and when it is modified, it is updated.

　　When I first encountered it, the general idea was to divide its implementation into two parts, namely judging the increase and judging the update. Later, I found that there is ON DUPLICATE KEY UPDATE in mysql, which can be completed in one step (Mysql's unique syntax).

ON DUPLICATE KEY UPDATE single add update and batch add update sql

In the MySQL database, if the ON DUPLICATE KEY UPDATE clause is followed by the insert statement, and the row to be inserted has a duplicate value in the unique index or primary key of the existing record in the table, then the update of the old row will occur; if If the inserted row data does not duplicate the unique index or primary key recorded in the existing table, a new record insert operation is performed.

In layman's terms, when a certain record exists in the database, executing this statement will update it, and when this record does not exist, it will insert it.

important point:

　　Because this is an insert statement, the where condition cannot be added.

　　If it is an insert operation, the value of the affected row is 1; if it is an update operation, the value of the affected row is 2; if the updated data is the same as the existing data (it is equivalent to no change, all values remain unchanged), affected Affected rows have a value of 0.

This statement is used based on a unique index or primary key. For example, a field a is added with a unique index, and there is already a record in the table with a value of 1.

The following two statements have the same effect:

INSERT INTO table (a,b,c) VALUES (1,2,3)  
  ON DUPLICATE KEY UPDATE c=c+1;  
  
UPDATE table SET c=c+1 WHERE a=1;

Multiple fields can be placed after ON DUPLICATE KEY UPDATE, separated by commas.

To reproduce an example:

INSERT INTO table (a,b,c) VALUES (1,2,3),(4,5,6)  
      ON DUPLICATE KEY UPDATE c=VALUES(a)+VALUES(b);

Two records will be changed (added or modified) in the table.

The way to add or modify sql in mybatis is as follows:

<insert id="insertOrUpdateCameraInfoByOne" paramerType="com.pojo.AreaInfo">
    insert into camera_info( cameraId,zone1Id,zone1Name,zone2Id,zone2Name,zone3Id,zone3Name,zone4Id,zone4Name）
    VALUES(
        #{cameraId},#{zone1Id},#{zone1Name}, #{zone2Id},
        #{zone2Name}, #{zone3Id}, #{zone3Name},
        #{zone4Id}, #{zone4Name},)
    ON DUPLICATE KEY UPDATE 
    cameraId = VALUES(cameraId),
    zone1Id = VALUES(zone1Id),zone1Name = VALUES(zone1Name),
    zone2Id = VALUES(zone2Id),zone2Name = VALUES(zone2Name),
    zone3Id = VALUES(zone3Id),zone3Name = VALUES(zone3Name),
    zone4Id = VALUES(zone4Id),zone4Name = VALUES(zone4Name)
</insert>

The sql for batch addition or modification in mybatis is:

<insert id="insertOrUpdateCameraInfoByBatch" parameterType="java.util.List">
      insert into camera_info(
          zone1Id,zone1Name,zone2Id,zone2Name,zone3Id,zone3Name,zone4Id,zone4Name,
          cameraId
          )VALUES
           <foreach collection ="list" item="cameraInfo" index= "index" separator =",">
             (
                #{cameraInfo.zone1Id}, #{cameraInfo.zone1Name}, #{cameraInfo.zone2Id},
                #{cameraInfo.zone2Name}, #{cameraInfo.zone3Id}, #{cameraInfo.zone3Name},
                #{cameraInfo.zone4Id}, #{cameraInfo.zone4Name}, 
                #{cameraInfo.cameraId}, 
             )
           </foreach>
           ON DUPLICATE KEY UPDATE
               zone1Id = VALUES(zone1Id),zone1Name = VALUES(zone1Name),zone2Id = VALUES(zone2Id),
               zone2Name = VALUES(zone2Name),zone3Id = VALUES(zone3Id),zone3Name = VALUES(zone3Name),
               zone4Id = VALUES(zone4Id),zone4Name = VALUES(zone4Name),
               cameraId = VALUES(cameraId)
    </insert>

The operation of data in the project can sometimes be a headache, and there is a requirement:

The data needs to be synchronized from the a data table of the A database to the b data table of the B database (the ab table has the same structure, but not a master-slave relationship... just synchronized in the past)

After the first synchronization, the table b is empty, and the synchronization is very simple.

But when some data in table a is updated and new data is added , it will be a bit troublesome to synchronize the two tables. (If table b is cleared and resynchronized, it will take too long if the amount of data is too large, which is not a good solution)

Think about whether you can update according to the time period. If there is new data during this period, you will insert the data, and if there is new data, you will update the data. Let me talk about my thoughts first:

step:

　　1. First, I take out the data of a certain period of time from table a (segmented update)

　　2. Put the data in the b table, and judge whether the b table already has this record according to the primary key. If there is no such data, insert it. If there is a record, compare whether the data is the same.

At this time, using this statement can meet the needs, but several issues should be paid attention to:

It is best to ensure one unique key or primary key in the updated content, otherwise the statement cannot be guaranteed to be executed correctly (if any unique key is repeated, it will be updated. Of course, if the updated statement also has duplicate fields in the table, then it is also No error will be reported if the update is successful, and a new record will be inserted only when the statement does not have any unique key duplicates); try not to use this statement for a table with multiple unique keys to avoid possible data confusion.
Do not use this statement in the case of an insert statement that may have concurrent transaction execution, which may result in a death lock.
It is not recommended to use this statement if the id of the data table is automatically incremented; the id is not continuous, and if there are many previous updates, the next new entry will jump correspondingly.
This statement is a unique syntax of mysql, and it should be used with caution if it may be designed to cross databases with other database languages .

The solution to the discontinuous auto-increment of the primary key

Source quoted from: https://www.linuxidc.com/Linux/2018-01/150427.htm

A recent project needs to implement such a function: count the usage time of each software for each person, and the client sends a message. If the user’s software already exists, add and update the usage time. If not, add a new record. The code is as follows:


<update id="saveApp" parameterType="java.util.List">
<foreach collection="appList" item="item" index="index" separator=";">
insert into app_table(userName,app,duration)
values(#{userName},#{item.app},#{item.duration})
on duplicate key update duration=duration+#{item.duration}
</foreach>
</update>

For the sake of efficiency, on duplicate key update is used to automatically judge whether it is an update or a new addition. After a period of time, it is found that the primary key id of the table (which has been set to continuous auto-increment) is not continuous auto-increment, but always jumps, which causes The id self-increment is too fast, and it has almost exceeded the maximum value. Through searching the information, it is found that on duplicate key update has a feature that the id will also increase by 1 every time it is updated. For example, the maximum value of the id is now 5. , and then an update operation is performed, and when an insert operation is performed again, the value of id becomes 7 instead of 6.
In order to solve this problem, there are two ways, the first is to modify the mode in innodb_autoinc_lock_mode, the second It is to split the statement repair into two actions: update and operation

The first way: there are 3 modes in innodb_autoinc_lock_mode, 0, 1 and 2, the default configuration of mysql5 is 1,

0 means that the table will be locked every time the auto-increment id is allocated.

1. The table will be locked only during bulk insert, and only one light-weight mutex will be used for simple insert, which has higher concurrency performance than 0

2. I didn't look carefully, it seems that there are many guarantees... not very safe.

When the default value of the database is 1, the above phenomenon will occur. Every time insert into .. on duplicate key update is used, the simple auto-increment id will be increased, regardless of whether insert or update occurs

Due to the large amount of data in the code and the amount of data that needs to be updated and added at the same time, the 0 mode cannot be used, and the database code can only be split into two steps of updating and inserting. The first step is to update the usage time according to the user name and software name. ,code show as below:

<update id="updateApp" parameterType="App">
update app_table
set duration=duration+#{duration}
where userName=#{userName} and appName=#{appName}
</update>

Then according to the return value, if the return value is greater than 0, it means that the update is successful and no longer needs to insert data. If the return value is less than 0, the data needs to be inserted. The code is as follows:

<insert id="saveApp" keyProperty = "id" useGeneratedKeys = "true" parameterType="App">
insert into app_table(userName,appName,duration)
values(#{userName},#{appName},#{duration})
</insert>

Generate death lock principle

insert ... on duplicate key When executing, the innodb engine will first judge whether the inserted row has a duplicate key error. If so, add an S (shared lock) lock to the existing row. If the row data is returned to mysql, then mysql executes the update operation after the duplicate, then adds X (exclusive lock) to the record, and finally writes the update.

If two transactions execute the same statement concurrently, a death lock will be generated, such as:

Reference article:

https://www.cnblogs.com/zjdxr-up/p/8319982.html

INSERT ... ON DUPLICATE KEY UPDATE produces a death lock deadlock principle_Personal Technology Blog-Pan Minlan's Blog-CSDN Blog