Generation Scheme of Primary Key ID in Database


There is usually a primary key id in the database table as the unique identifier of this piece of data. The primary key must be unique. The self-incrementing primary key provided by the mysql database can be used as the primary key ID. The global uniqueness of UUID is also a solution. So which solution should you choose, and what are the advantages and disadvantages of each solution?

Based on UUID

UUID is composed of a group of 32 hexadecimal numbers, divided into five segments by hyphens, 32 characters in the form of 8-4-4-4-12, such as: 550e8400-e29b-41d4-a716-446655440000. Therefore, the theoretical total number of UUIDs is 16^32 = 2^128, which is approximately equal to 3.4 x 10^38. That is to say, if 1 trillion UUIDs are generated every nanosecond, it will take 10 billion years to use up all UUIDs.

A UUID consists of a combination of the following parts:

  • The current date and time, the first part of UUID is related to time, if you generate another UUID after a few seconds after generating a UUID, the first part is different, and the rest are the same. clock sequence.
  • The globally unique IEEE machine identification number. If there is a network card, it is obtained from the MAC address of the network card. If there is no network card, it is obtained in other ways.

advantage:

  • JDK comes with local generation, no network consumption.

  • Because of its globally unique nature, there is no problem in this case for database migration.

shortcoming:

  • The IDs generated each time are out of order. The bottom layer of innodb uses B+ tree, which may cause page splitting every time it is inserted, and the inserted pages are not in the cache, causing a lot of random IO.
  • A hexadecimal number occupies 4 bits, which is half a byte, so the UUID is 16 bytes. If the UUID is too long, each leaf node of the secondary index will carry a primary key, which wastes a lot of space and consumes database performance.

Applicable scene:

  • It can be used to generate scenarios such as token tokens, which are unrecognizable enough, readable out of order, and long enough.
  • It can be used in scenarios where there is no pure number requirement, disorderly increment, and no readability requirement.

Auto-increment based on database primary key

The auto-increment primary key will exist in each table, even if it is not defined, it will be automatically generated. The auto-increment primary key allows the primary key index to be inserted in increasing order as much as possible, avoiding page splitting, so the index is more compact.

Different engines have different storage strategies for auto-increment.

  • The auto-increment values ​​for the MyISAM engine are stored in data files.
  • The self-increment value of the InnoDB engine is actually stored in the memory, and after MySQL 8.0, it has the ability of "self-increment persistence", that is, "if a restart occurs, the self-increment value of the table can be restored to The value before MySQL restart", the specific situation is:
  • In MySQL 5.7 and earlier, auto-increment values ​​were stored in memory and not persisted. After each restart, when the table is opened for the first time, it will find the maximum value of self-increment max(id), and then use max(id)+1 as the current self-increment value of the table.  For example, if the largest id in the current data row of a table is 10, AUTO_INCREMENT=11. At this time, we delete the row with id=10, and AUTO_INCREMENT is still 11. But if the instance is restarted immediately, the AUTO_INCREMENT of this table will become 10 after the restart.  In other words, MySQL restart may modify the value of AUTO_INCREMENT of a table.
  • In version 8.0 of MySQL, the self-incremental change is recorded in the redo log, and the value before the restart is restored by relying on the redolog when restarting.

In MySQL, if the field id is defined as AUTO_INCREMENT, when inserting a row of data, if the id field is specified as 0, null or an unspecified value when inserting data , then fill the current AUTO_INCREMENT value of the table into the auto-increment field; If the id field specifies a specific value when inserting data , use the value specified in the statement directly.

According to the size relationship between the value to be inserted and the current auto-increment value, the change result of auto-increment value will also be different. Assume that the value to be inserted is X, and the current auto-increment value is Y. If X<Y, then the self-increment value of this table remains unchanged; if X≥Y, the current self-increment value needs to be modified to a new self-increment value.

The new auto-increment value starts from auto_increment_offset (auto-increment initial value), takes auto_increment_increment as the step size, and continues to superimpose until the first value greater than id is found as the new auto-increment value.

When both auto_increment_offset and auto_increment_increment are 1, if X≥Y, the new auto-increment value is "X+1"; otherwise, the auto-increment value remains unchanged.

But when these two parameters are both set to 1, the auto-increment primary key id cannot be guaranteed to be continuous. This is because the auto-increment value will not be rolled back when the insert statement fails due to a unique key conflict or rollback.

In addition, for batch insertion of data, Mysql has a strategy for batch application auto-increment id. 1 is allocated for the first time, 2 are allocated after use up, 4 are allocated for the third time after use up, and so on. For example, when inserting 5 pieces of data in batches, after the ids of the first two applications are used up, 4 are allocated for the third time, but only 2 are actually used. At this time, the allocated id will not be taken back, so the auto-increment primary key id will also be discontinuous.

Therefore, there are three reasons why the auto-increment primary key is not necessarily discontinuous:

  • After the data insertion fails, the self-increment value will not be rolled back
  • After the transaction is rolled back, self-increment does not roll back
  • When inserting data in batches, IDs will be provided in batches, and will not be recycled if they are not used up

advantage:

  • The implementation is simple, just rely on the database, and the cost is small.

shortcoming:

  • When you need to merge tables, there will be a primary key conflict.
  • Every time you get an ID, you have to read and write to the database once, which affects performance

Applicable scene:

  • Small-scale business scenarios with a small amount of data access.
  • There is no high concurrency scenario, and the insertion record is controllable.

Based on Snowflake-like algorithm

The snowflake snowflake algorithm is an ID generation algorithm adopted by Twitter's internal distributed projects, which is used to generate unique IDs on different machines. This algorithm generates a 64bit number as a distributed ID, ensuring that the ID is self-incrementing and globally unique. The generated 64-bit ID structure is as follows:
insert image description here

  • The first bit is 0 and not used : the first bit is 0, which means that the distributed ID we generated is a positive number
  • Timestamp 41 bits : 41 bits can identify 2 ^ 41 - 1 millisecond value, converted into a year means 69 years, from 1970 to 2039.
  • 10 digits of working machine ID : record the working machine id, which means that this service can be deployed on up to 2^10 machines, that is, 1024 machines. 5 bits in 10 bits represent the computer room ID, and 5 bits represent the machine ID. It means that it can represent up to 2 ^ 5 computer rooms (32 computer rooms), and each computer room can represent 2 ^ 5 machines (32 machines), which can be split at will, for example, take out 4 digits to identify the business number, and the other 6 digits as the machine number. Can be combined at will
  • 12-bit serial number : The largest positive integer that 12 bits can represent is 2 ^ 12 - 1 = 4096, which can distinguish 4096 different ids within the same millisecond. That is, the maximum number of IDs generated by the same machine within the same millisecond is 4096. If this value is reached, the program waits for the next millisecond by spinning and then generates a new distributed ID.

Generation process:
If you want to generate a globally unique id, you can send a request to the system deployed with the SnowFlake algorithm, and the SnowFlake algorithm system will generate a unique id. This SnowFlake algorithm system must first know the machine number where it is located, (then after the SnowFlake algorithm system receives this request, it will first generate a 64-bit long-type id by means of binary bit operations, the first of the 64 bits One bit is meaningless. Then use the current timestamp (unit to millisecond) to occupy 41 bits, and then set the machine id with 10 bits. Finally, judge again, within this millisecond of the machine in the current computer room, This is the number of requests. Add a sequence number to the request to generate the id this time as the last 12 bits.

advantage:

  • Generate a monotonically increasing unique ID, sequential insertion in InnoDB's b+ number table will not cause page splitting, and the performance is high.
  • Generate a 64-bit id, only occupying 8 bytes to save storage space.

shortcoming:

  • The local time of each database must be set the same, otherwise the global will not increase
  • If the clock is set back, a duplicate id will be generated.

Several ideas to solve the time callback problem of the snowflake algorithm:

  1. Multiple clocks
    Take 2 bits each from the working machine ID and the serial number, and use it for a 4-bit clock ID. Every time the time callback is found (that is, the timestamp of the last generated ID is less than or equal to the current timestamp), the clock ID will be increased by 1, similar to the serial number
  2. After the process is started using "historical time"
    , we will use the current time (actual processing starts with a delay of 10ms) as the start time field in the timestamp of the machine process of this business. Subsequent auto-increment means that when the serial number auto-increases to the maximum value, the timestamp increases by 1, and the serial number returns to 0, which means that the timestamp and serial number are auto-incremented as a large value, but the initialization is different.

Snowflake algorithm demo:

package com.example.demo;

import java.net.Inet4Address;
import java.net.UnknownHostException;
import java.util.Random;

/**
 * @Author: yzp
 * @Date: 2020-7-27 15:32
 * @description
 */
public class SnowflakeIdWorker {
    
    

    /** 时间部分所占长度 */
    private static final int TIME_LEN = 41;
    /** 数据中心id所占长度 */
    private static final int DATA_LEN = 5;
    /** 机器id所占长度 */
    private static final int WORK_LEN = 5;
    /** 毫秒内存序列所占长度 */
    private static final int SEQ_LEN = 12;

    /** 定义起始时间 2020-07-27*/
    private static final long START_TIME = 1595835560497L;
    /** 上次生成ID的时间戳 */
    private static long LAST_TIME_STAMP = -1L;
    /** 时间部分向左移动的位数 22 */
    private static final int TIME_LEFT_BIT = 64 - 1 - TIME_LEN;

    /** 自动获取数据中心id(可以手动定义0-31之间的数) */
    private static final long DATA_ID = getDataId();
    /** 自动机器id(可以手动定义0-31之间的数) */
    private static final long WORK_ID = getWorkId();
    /** 数据中心id最大值 31 */
    private static final int DATA_MAX_NUM = ~(-1 << DATA_LEN);
    /** 机器id最大值 31 */
    private static final int WORK_MAX_NUM = ~(-1 << WORK_LEN);
    /** 随机获取数据中心id的参数 32 */
    private static final int DATA_RANDOM = DATA_MAX_NUM + 1;
    /** 随机获取机器id的参数 32 */
    private static final int WORK_RANDOM = WORK_MAX_NUM + 1;
    /** 数据中心id左移位数 17 */
    private static final int DATA_LEFT_BIT = TIME_LEFT_BIT - DATA_LEN;
    /** 机器id左移位数 12 */
    private static final int WORK_LEFT_BIT = DATA_LEFT_BIT - WORK_LEN;

    /** 上一次毫秒内存序列值 */
    private static long LAST_SEQ = 0L;
    /** 毫秒内存列的最大值 4095 */
    private static final long SEQ_MAX_NUM = ~(-1 << SEQ_LEN);

    /**
     * 获取字符串S的字节数组,然后将数组的元素相加,对(max+1)取余
     * @param s 本地机器的hostName/hostAddress
     * @param max 机房/机器的id最大值
     * @return
     */
    private static int getHostId(String s, int max) {
    
    
        byte[] bytes = s.getBytes();
        int sums = 0;
        for (int b : bytes) {
    
    
            sums += b;
        }
        return sums % (max + 1);
    }

    /**
     * 根据 host address 取余, 发送异常就返回 0-31 之间的随机数
     * @return 机器ID
     */
    private static int getWorkId() {
    
    
        try {
    
    
            return getHostId(Inet4Address.getLocalHost().getHostAddress(), WORK_MAX_NUM);
        } catch (UnknownHostException e) {
    
    
            return new Random().nextInt(WORK_RANDOM);
        }
    }

    /**
     * 根据 host name 取余, 发送异常就返回 0-31 之间的随机数
     * @return 机房ID(数据中心ID)
     */
    private static int getDataId() {
    
    
        try{
    
    
            return getHostId(Inet4Address.getLocalHost().getHostName(), DATA_MAX_NUM);
        }catch(Exception e){
    
    
            return new Random().nextInt(DATA_RANDOM);
        }
    }

    /**
     * 获取下一不同毫秒的时间戳
     * @param lastMillis
     * @return 下一毫秒的时间戳
     */
    private static long nextMillis(long lastMillis) {
    
    
        long now = System.currentTimeMillis();
        while (now <= lastMillis) {
    
    
            now = System.currentTimeMillis();
        }
        return now;
    }

    /**
     * 核心算法,需要加锁保证并发安全
     * @return 返回唯一ID
     */
    public synchronized static long getUUID() {
    
    
        long now = System.currentTimeMillis();

        // 如果当前时间小于上一次ID生成的时间戳,说明系统时钟回退过,此时因抛出异常
        if (now < LAST_TIME_STAMP) {
    
    
            throw new RuntimeException(String.format("系统时间错误! %d 毫秒内拒绝生成雪花ID", START_TIME));
        }

        if (now == LAST_TIME_STAMP) {
    
    
            LAST_SEQ = (LAST_SEQ + 1) & SEQ_MAX_NUM;
            if (LAST_SEQ == 0) {
    
    
                now = nextMillis(LAST_TIME_STAMP);
            }
        } else {
    
    
            LAST_SEQ = 0;
        }

        // 上次生成ID的时间戳
        LAST_TIME_STAMP = now;

        return ((now - START_TIME) << TIME_LEFT_BIT | (DATA_ID << DATA_LEFT_BIT) | (WORK_ID << WORK_LEFT_BIT) | LAST_SEQ);
    }

    /**
     * 主函数测试
     * @param args
     */
    public static void main(String[] args) {
    
    
        long start = System.currentTimeMillis();
        int num = 300000;
        for (int i = 0; i < num; i++) {
    
    
            System.out.println(getUUID());
        }
        long end = System.currentTimeMillis();

        System.out.println("共生成 " + num + " 个ID,用时 " + (end - start) + " 毫秒");
    }
}

Guess you like

Origin blog.csdn.net/weixin_44153131/article/details/129025408