Using snowflake id or uuid as Mysql primary key, I was scolded by my daughter-in-law

Foreword: When designing tables in mysql, mysql officially recommends not to use uuid or snowflake id (long and unique) that is not continuous and non-repeating, but recommends continuous and self-incrementing primary key id. The official recommendation is auto_increment, so why not It is recommended to use uuid, what are the disadvantages of using uuid? In this blog, we will analyze this problem and explore the internal reasons.

First, mysql and program examples

1.1 To illustrate this problem, we first create three tables

They are user_auto_key, user_uuid, and user_random_key, which represent the auto-growing primary key, uuid as the primary key, random key as the primary key, and the rest we keep completely unchanged. According to the control variable method, we only use different strategies to generate the primary key of each table, and other fields are exactly the same, and then test the insertion speed and query speed of the table:

Note: The random key here actually refers to the non-consecutive, non-repeating, and irregular id calculated by the snowflake algorithm: a string of 18-bit long values

The id table is automatically generated:

user uuid table

Random primary key table:

1.2 The theory alone is not enough, go directly to the program, and use spring's jdbcTemplate to realize the additional inspection test:

Technical framework:

springboot+jdbcTemplate+junit+hutool

The principle of the program is to connect to its own test database, and then write the same amount of data in the same environment to analyze the insertion time to synthesize its efficiency. In order to achieve the most realistic effect, all data are randomly generated. , such as name, email, and address are randomly generated.

package com.wyq.mysqldemo;
import cn.hutool.core.collection.CollectionUtil;
import com.wyq.mysqldemo.databaseobject.UserKeyAuto;
import com.wyq.mysqldemo.databaseobject.UserKeyRandom;
import com.wyq.mysqldemo.databaseobject.UserKeyUUID;
import com.wyq.mysqldemo.diffkeytest.AutoKeyTableService;
import com.wyq.mysqldemo.diffkeytest.RandomKeyTableService;
import com.wyq.mysqldemo.diffkeytest.UUIDKeyTableService;
import com.wyq.mysqldemo.util.JdbcTemplateService;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.util.StopWatch;
import java.util.List;
@SpringBootTest
class MysqlDemoApplicationTests {


    @Autowired
    private JdbcTemplateService jdbcTemplateService;


    @Autowired
    private AutoKeyTableService autoKeyTableService;


    @Autowired
    private UUIDKeyTableService uuidKeyTableService;


    @Autowired
    private RandomKeyTableService randomKeyTableService;




    @Test
    void testDBTime() {


        StopWatch stopwatch = new StopWatch("执行sql时间消耗");




        /**
         * auto_increment key任务
         */
        final String insertSql = "INSERT INTO user_key_auto(user_id,user_name,sex,address,city,email,state) VALUES(?,?,?,?,?,?,?)";


        List<UserKeyAuto> insertData = autoKeyTableService.getInsertData();
        stopwatch.start("自动生成key表任务开始");
        long start1 = System.currentTimeMillis();
        if (CollectionUtil.isNotEmpty(insertData)) {
            boolean insertResult = jdbcTemplateService.insert(insertSql, insertData, false);
            System.out.println(insertResult);
        }
        long end1 = System.currentTimeMillis();
        System.out.println("auto key消耗的时间:" + (end1 - start1));


        stopwatch.stop();




        /**
         * uudID的key
         */
        final String insertSql2 = "INSERT INTO user_uuid(id,user_id,user_name,sex,address,city,email,state) VALUES(?,?,?,?,?,?,?,?)";


        List<UserKeyUUID> insertData2 = uuidKeyTableService.getInsertData();
        stopwatch.start("UUID的key表任务开始");
        long begin = System.currentTimeMillis();
        if (CollectionUtil.isNotEmpty(insertData)) {
            boolean insertResult = jdbcTemplateService.insert(insertSql2, insertData2, true);
            System.out.println(insertResult);
        }
        long over = System.currentTimeMillis();
        System.out.println("UUID key消耗的时间:" + (over - begin));


        stopwatch.stop();




        /**
         * 随机的long值key
         */
        final String insertSql3 = "INSERT INTO user_random_key(id,user_id,user_name,sex,address,city,email,state) VALUES(?,?,?,?,?,?,?,?)";
        List<UserKeyRandom> insertData3 = randomKeyTableService.getInsertData();
        stopwatch.start("随机的long值key表任务开始");
        Long start = System.currentTimeMillis();
        if (CollectionUtil.isNotEmpty(insertData)) {
            boolean insertResult = jdbcTemplateService.insert(insertSql3, insertData3, true);
            System.out.println(insertResult);
        }
        Long end = System.currentTimeMillis();
        System.out.println("随机key任务消耗时间:" + (end - start));
        stopwatch.stop();




        String result = stopwatch.prettyPrint();
        System.out.println(result);
    }

1.3 Program write result

user_key_auto writes the result:

user_random_key write result:

User_uuid table write result:

1.4 Efficiency Test Results

When the amount of existing data is 130W: Let's test inserting 10W data again to see what the result will be:

It can be seen that when the data volume is about 100W, the insertion efficiency of uuid is at the bottom, and when 130W of data is added later, the time of uuid plummets.

The overall efficiency ranking of time occupancy is: auto_key>random_key>uuid, uuid has the lowest efficiency, and in the case of a large amount of data, the efficiency plummets.

So why does this happen? With doubt, let's explore this question:

2. Comparison of index structure using uuid and auto-increment id

2.1 Internal structure using auto-increment id

The value of the auto-incrementing primary key is sequential, so Innodb stores each record after a record. When the maximum fill factor of the page is reached (the default maximum fill factor of innodb is 15/16 of the page size, leaving 1/16 of the space for future modifications):

①. The next record will be written to a new page. Once the data is loaded in this order, the primary key page will be filled with nearly sequential records, which increases the maximum fill rate of the page, and there will be no waste of pages.

②. The newly inserted row must be the next row of the original maximum data row. MySQL locates and addresses very quickly, and will not make additional consumption for calculating the position of the new row.

③. Reduced page splitting and fragmentation

2.2 Internal structure of index using uuid

Because the self-incrementing id of the relative order of uuid is irregular, the value of the new row is not necessarily larger than the value of the previous primary key, so innodb cannot always insert the new row at the end of the index. Instead, you need to find a new suitable location for the new line to allocate new space.

This process requires a lot of additional operations, and the randomness of the data will lead to scattered data distribution, which will lead to the following problems:

①. The target page written is likely to have been flushed to disk and removed from the cache, or has not been loaded into the cache, innodb has to find and read the target page from disk into memory before inserting, which will cause a lot of random IO

②.  Because the writes are out of order, innodb has to frequently do page splitting operations to allocate space for new rows. Page splitting causes a large amount of data to be moved, and at least three pages need to be modified for one insertion.

③.  Due to frequent page splits, pages will become sparse and filled irregularly, which will eventually lead to fragmentation of data

After loading the random values ​​(uuid and snowflake id) into the clustered index (the default index type of innodb), it is sometimes necessary to do an OPTIMEIZE TABLE to rebuild the table and optimize page filling, which will take a certain amount of time. .

Conclusion: Using innodb, you should insert as much as possible in the auto-incrementing order of the primary key, and use the monotonically increasing value of the cluster key to insert new rows as much as possible

2.3 Disadvantages of using auto-increment id

So is there absolutely no harm in using auto-incrementing ids? No, auto-incrementing id also has the following problems:

①. Once others crawl your database, they can get your business growth information based on the self-incrementing id of the database, and it is easy to analyze your business situation

②. For high concurrent loads, innodb will cause obvious lock contention when inserting by the primary key, and the upper bound of the primary key will become a hotspot for contention, because all insertions occur here, and concurrent insertions will lead to gap locks compete

③. The Auto_Increment lock mechanism will cause the snatch of the auto-increment lock, and there will be a certain performance loss

Attached:

Auto_increment lock contention problem, if you want to improve, you need to tune the configuration of innodb_autoinc_lock_mode

3. Summary

This blog first asks questions from the beginning, builds tables to use jdbcTemplate to test the performance of different id generation strategies in data insertion of large amounts of data, and then analyzes the different id mechanisms in MySQL's index structure, advantages and disadvantages, and in-depth explanations Why the performance loss of uuid and random unique id in data insertion, explains this problem in detail.

In actual development, it is better to use auto-increment id according to the official recommendation of mysql. MySQL is broad and profound, and there are still many points worthy of optimization that we need to learn.

Guess you like

Origin blog.csdn.net/m0_63437643/article/details/123794003