How to solve the unique primary key problem after storage split?

We talked about sub-databases and tables before. Now we are considering this issue: in a single database and a single table, the business ID can be realized by relying on the auto-increment primary key of the database. Now we have split the storage into multiple places. If we still use the auto-increment of the database Primary key will inevitably lead to duplication of primary keys.

So how should we solve the primary key problem? This article will look at the knowledge related to generating unique primary keys.

What are the options for generating primary keys?

If you use the simplest way to generate a unique primary key, what can you do? One of the most direct solutions is to use a separate auto-increment data table. After the storage is split, create a single-point data table. For example, if we need to generate an order ID now, we create the following data table:

CREATE TABLE IF NOT EXISTS `order_sequence`(
   `order_id` INT UNSIGNED AUTO_INCREMENT,
   PRIMARY KEY ( `order_id` )
)ENGINE=InnoDB DEFAULT CHARSET=utf8;

Every time you need to generate a unique ID, add a new record to the corresponding data table and use the returned auto-incrementing primary key ID as the business ID.

This solution is simple to implement, but the problem is also obvious. First of all, performance cannot be guaranteed. When concurrency is relatively high, if you create an auto-increment ID through such a data table, generating the primary key can easily become a performance bottleneck. Second, there is a single point of failure. If the database that generates the auto-increment ID hangs, it will directly affect the creation function.

In actual development, there are many options to implement a unique primary key. Here are some common implementation ideas, including using UUID, using the Snowflake algorithm, and configuring the way in which the auto-increment interval is allocated in memory.

Is it possible to use UUID?

Everyone is familiar with UUID. In the Java language, there is a built-in tool class implementation of UUID, which can easily generate a UUID:

public String getUUID(){
   UUID uuid=UUID.randomUUID();
   return uuid.toString();
}

So can UUID be used to generate a unique primary key?

Although UUID satisfies the requirement of global uniqueness very well, it is not suitable as the unique primary key for database storage. Let's output a UUID and take a look, for example: 135c8321-bf10-46d3-9980-19ba588554e8, which is a 36-bit string.

First, UUID is too long as the primary key of the database, which will cause relatively large storage overhead. On the other hand, UUID is unordered. If UUID is used as the primary key, the writing performance of the database will be reduced.

Taking MySQL as an example, MySQL recommends using auto-incrementing ID as the primary key. We know that the MySQL InnoDB engine supports indexes, and the underlying data structure is a B+ tree. If the primary key is an auto-incrementing ID, then MySQL can write in the order of the disk; if the primary key Non-auto-incrementing IDs require a lot of additional data movement during writing to place each inserted data in the appropriate location, resulting in page splits and reduced data writing performance.

Based on Snowflake algorithm

Snowflake is Twitter's open source distributed ID generation algorithm. It consists of 64-bit binary numbers and is divided into 4 parts. The following is a schematic diagram:

image (6).png

in:

  • Bit 1 is not used by default. As the sign bit, it is always 0 to ensure that the value is a positive number;

  • A 41-bit timestamp represents the number of milliseconds. Let’s calculate that the 41-digit number can represent 241 milliseconds. When converted to an adult, the result is a little more than 69 years. Generally speaking, this number is enough for use in business;

  • 10-digit worker machine ID, supports 210 or 1024 nodes;

  • The 12-digit serial number is used as the current timestamp and the serial number of the machine. Each node supports an interval of 212 per millisecond, which is 4096 IDs. When converted into seconds, it is equivalent to allowing 4.09 million QPS. If it is within this interval If it exceeds 4096, it will wait until the next millisecond for calculation.

Twitter gave an example of the Snowflake algorithm. The specific implementation uses a large number of bit operations. You can click on the specific code library to view it.

The Snowflake algorithm can be used as a separate service and deployed on multiple machines. The generated IDs have an increasing trend and do not need to rely on third-party systems such as databases. The performance is very high. In theory, 4.09 million QPS is a very impressive number. , which can meet most business scenarios. The machine ID part can be allocated according to business characteristics, which is relatively flexible.

The Snowflake algorithm has many advantages, but one disadvantage is that there is a clock rollback problem. What is clock rollback?

Because the server's local clock is not absolutely accurate, in some business scenarios, such as e-commerce's hourly sales, in order to prevent different users from accessing different server times, it is necessary to keep the server time synchronized. In order to ensure that the time is accurate, it will be calibrated through the NTP mechanism. NTP (Network Time Protocol) refers to the Network Time Protocol, which is used to synchronize the network The time of each computer.

If the server has inconsistencies in synchronizing NTP and clocks are rolled back, SnowFlake may have duplicate IDs in its calculations. In addition to NTP synchronization, leap seconds will also cause clock rollback on the server. However, clock rollback is a small probability event and can generally be ignored when concurrency is low. Regarding how to solve the clock dialback problem, you can delay and wait until the server time catches up. Interested students can check the relevant information to learn more.

Database maintenance interval allocation

Below we introduce a strategy based on database maintenance of self-increasing ID intervals and memory allocation. This is also the primary key generation strategy used by database middleware such as Taobao's TDDL.

The steps to use this method are as follows.

  • First, create a sequence table in the database. Each row in it is used to record the maximum value of the ID range that is currently occupied by a certain business primary key.

The main fields of the sequence table are name and value, where name is the name of the current business sequence, and value stores the maximum value of the ID that has been allocated.

CREATE TABLE `sequence` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT COMMENT 'Id',
  `name` varchar(64) NOT NULL COMMENT 'sequence name',
  `value` bigint(32) NOT NULL COMMENT 'sequence current value',
   PRIMARY KEY (`id`),
  UNIQUE KEY `unique_name` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8; 
  • Next, a row record is inserted. When the primary key needs to be obtained, each server host takes the corresponding ID range from the data table and caches it locally, and at the same time updates the maximum value record in the sequence table.

Now we create a new record, for example, set an order update rule and insert a row of records as follows:

INSERT INTO sequence (name,value) values('order_sequence',1000);i

When the server obtains the primary key growth section, it first accesses the sequence table of the corresponding database, updates the corresponding record, and occupies a corresponding interval. For example, here we set the step size to 200, the original value is 1000, and the updated value becomes 1200.

  • After obtaining the corresponding ID range, it is allocated within the server. The concurrency issues involved can be solved by relying on mechanisms such as optimistic locking.

With the corresponding ID growth interval, you can use AtomicInteger and other methods to allocate ID locally.

The IDs assigned by different machines at the same time may be different. The unique ID generated in this way does not guarantee strict time sequence increment, but it can ensure the overall trend increase, and has many applications in actual production.

In order to prevent single points of failure, the database where the sequence table is located is usually configured with multiple slave databases to achieve high availability.

In addition to the above solutions, Redis can also be used as a solution in actual development, that is, through the Redis Incr command. Interested students can learn about it.

Summarize

This article mainly shares several ideas for implementing unique primary keys, which are what we usually call distributed issuers. They mainly use UUID, use Snowflake algorithm, and database storage range combined with memory allocation.

Now to summarize, what features should a primary key generator usable in a production environment have?

The first is that the generated primary key must be globally unique and no duplicate IDs can appear. This is the most basic requirement for primary keys.

Second, orderliness needs to be met, that is, monotonically increasing, or it can also satisfy increasing over a period of time. This is due to business considerations. On the one hand, ordered primary keys can ensure writing performance when writing to the database. On the other hand, primary keys are often used to perform some business processing, such as sorting by primary keys. If the generated primary keys are out of order, they cannot reflect the order of creation over a period of time.

Another one is the performance requirement, which requires generating the primary key as quickly as possible while meeting high availability. Because after the storage is split, business writing strongly relies on the primary key generation service. If the service that generates the primary key is unavailable, new orders, product creation, etc. will be blocked. This is absolutely unacceptable in actual projects.

You can contact the actual work. In the modules that involve unique primary keys in the projects you are responsible for, whether these features have also been considered and how they are implemented, please leave a message to share.

Guess you like

Origin blog.csdn.net/caryxp/article/details/135023437