Comparison of four generation strategies for globally unique ID after sub-database sub-table

After sub-database sub-table, how to deal with ID primary key?

When the business volume is large and the amount of data in the database is too large, it is necessary to sub-database and sub-table. Then after sub-database sub-table, you will inevitably face a problem, that is, how to generate ID? Because after being divided into multiple tables, if you still use the self-incrementing ID of each table, it means that each table is accumulated from 1, which is definitely wrong. A globally unique ID is required for support. So this is also a problem that you must consider in your actual production environment. A global ID generator generally needs to meet the following characteristics:

uniqueness, high availability, incrementality, security, high availability

Common primary key ID generation strategies are as follows:

Database self-increment ID

principle:

If you use this method, it means that every time you get an ID in your system, you need to insert a piece of data that has no business meaning into a table in a library, and then get a database auto-incremented ID. After getting the ID, write it to the corresponding sub-database and sub-table.

 

The advantages and disadvantages of this approach are as follows:

Advantages: very simple, orderly increment, convenient for paging and sorting.

shortcoming:

a. After sub-database and sub-table, the auto-increment ID of the data table is easy to repeat and cannot be used directly (although the step size can be set, but the limitation is obvious);

b. The overall performance throughput is relatively low. If you design a separate database to achieve data uniqueness in distributed applications, even if you use a pre-generated solution, it will be prone to single-point bottlenecks in high-concurrency scenarios due to transaction issues.

Usage scenarios: table ID of a single database instance (including master-slave synchronization scenarios); some serial numbers counted by day, etc.

It is not used in the scenario of split table or globally unique ID.

Redis production global ID

principle:

Through the INCR/INCRBY auto-increment atomic operation command of Redis, it can ensure that the ID produced must be a unique serial number, and the implementation method is essentially the same as that of the database.

 

Advantages and disadvantages of using Redis to generate global IDs:

Advantages: The overall throughput is higher than that of the database. Because the throughput performance of Redis is higher than that of database

Disadvantage: After the Redis instance or cluster goes down, it is troublesome to retrieve the latest ID value. However, the algorithm for producing unique IDs can be optimized to avoid this situation.

Usage scenario: more suitable for computing scenarios. For example, user visits, order serial number (date + serial number), etc.

Kaige recommended article: Redis combat 9-globally unique ID

UUID, GUID generated ID

Advantages and disadvantages:

Pros: Very high performance. Generated locally, no network consumption;

Disadvantages: UUID is too long, takes up a lot of space, and has poor performance as a primary key;

Since the UUI is not ordered, it will cause too many random write operations when the B+ tree index is written.

Usage scenario: If you want to randomly generate a file name, number, etc., you can consider using UUID, but if it is used as the primary key of the database, it is not recommended to use UUID.

Snowflake algorithm (snowflake)

The snowflake algorithm comes from Twitter and is implemented in the Scala language. The characteristics of the snowflake algorithm are orderly, unique and require high performance and low latency (each cluster generates at least 10K pieces of data per second, and the response time is within 2MS). It is used in multi-mode environment (multi-cluster, cross-computer room). Therefore, the ID obtained by the snowflake algorithm is composed of segments.

a. The time difference with the specified date (time difference to millisecond level), 41 digits, can be used for 69 years;

b. Machine ID+cluster ID, 10 digits, supports up to 1024 machines;

c. Serial number, 12 digits. Each machine can produce up to 4096 serial numbers per millisecond.

The core idea of ​​the snowflake algorithm is:

The distributed ID is fixed as a long type number, and a long type occupies 8 bytes, that is, 8*8=64 bits. Therefore, the format of the snowflake algorithm is as follows:

The snowflake algorithm is segmented, and the meaning of each segment:

The first paragraph: that is, the highest bit is the sign bit. The fixed value is 0, indicating that all IDs are positive integers.

The second paragraph: The next 41 bits identify the timestamp. The unit is milliseconds. The number identified by 41bits corresponds to 2^41 power-1. That is, the 41 power of 2-1 millisecond value can be identified. Converted to an adult year is the time to mark 69 years;

The third paragraph: the next 10 digits identify the machine ID. If there are remote deployments, multi-cluster configurations can also be configured. It is necessary to plan in advance the numbers of computer rooms, clusters, and instance IDs in various places offline. It includes a 5-digit machine id and a 5-digit cluster id. Up to 2^10 machines can be deployed. That is 1024 units.

Fourth paragraph: The last 12 digits are the serial number. It is used to record different IDs generated within the same millisecond. The maximum positive integer that can be represented by 12 bits is 2^12-1=4096. That is to say, these 12 bits can be used to represent numbers to distinguish 4096 different IDs within the same millisecond. ID.

The advantages and disadvantages of this algorithm are as follows:

Advantages and disadvantages of the snowflake algorithm:

Advantages: the number of milliseconds is in the high position, and the auto-increment sequence is in the low position, so the entire ID shows an increasing trend;

It does not rely on third-party systems such as databases, and uses service deployment, which has higher stability and the performance of generating IDs is also very high;

Bits can be allocated according to their own business characteristics, which is very flexible.

shortcoming:

Too much reliance on the clock of the cluster, if the machine clock is dialed back, it may cause duplication or service unavailability.

conclusion

If you have any questions about the operation, please leave a message on my  personal blog (www.kaigejava.com) or WeChat public account (Kaige Java) to communicate.
 

Hello everyone, I am Kaige Java (kaigejava), and I am happy to share technical articles. Welcome everyone to pay attention to "Kaige Java" and learn more in time. Let's learn Java together. You are also welcome to come and chat with Brother Kai if you have anything to do~~~


 

Guess you like

Origin blog.csdn.net/kaizi_1992/article/details/128843995