Talking about database primary key strategy

Many children's shoes are very familiar with the primary key of the database table. The primary key is the Primary Key, or PK for short.

The role of the database primary key is to uniquely identify a record, so in the same table, the primary key of any record is unique, otherwise, the database system cannot directly locate the record based on the primary key.

Although the database system itself has no special requirements for the primary key, when writing a program, it is necessary to consider clearly what type of primary key to use. Using the primary key correctly is half the battle for storing data, and using the primary key incorrectly can lead to an application crashing gradually.

Primary key is not modifiable

For the database, the primary key can actually be modified, as long as it does not conflict with other primary keys. However, for applications, if a record is to modify the primary key, it will be a big problem.

Because the second role of the primary key is to allow the foreign keys of other tables to refer to themselves, thus realizing the relational structure. Once the primary key of a table changes, it will cause all data referencing the table to modify the foreign key. Many Web application databases are not strongly constrained (only refer to the primary key but do not set foreign key constraints), and modifying the primary key will directly destroy data integrity.

Business fields are not available for primary keys

All business-related fields, whether or not they appear to be unique, must never be used as primary keys. For example, the Email field of the user table is unique, but if it is used as the primary key, it will cause other tables to reference the Email field everywhere, thereby leaking user information.

In addition, modifying Email is actually a business operation, which directly violates the previous principle.

So, which field should be used for the primary key?

The primary key must use a separate field with no business meaning at all, that is, the primary key itself has no business meaning except for the two responsibilities of unique identification and immutability.

Similarly, seemingly unique usernames, ID numbers, etc. cannot be used as primary keys. For these unique fields, unique index constraints should be added.

What type of primary key should be used

Should the primary key be an integer or a string? (If you use floating point numbers, please recharge your IQ consciously)

I strongly recommend using strings.

Why?

Let's first look at the problem of using integers.

There are two options for using integers: database auto-increment and self-generated.

Self-generated is actually self-incrementing, which is nothing more than saving the last used value somewhere, and continuing to self-increment the next time it is used. A common practice is to store the last used maximum value in a separate table. This method is complicated to implement and has low reliability, and it is not as good as database auto-increment.

The biggest problem of database self-increasing is not that the single point of the database cannot be divided horizontally, because most companies fail to support the situation that the business needs to be divided into databases.

The biggest problem with self-incrementing primary keys is that they completely expose the key operational data of the company's business to competitors and VCs. For example, if the user table uses an auto-incrementing primary key, you only need to register a user every Monday morning, compare the ID registered last week with the ID registered this week, and immediately know the number of new users of the company in a week. If the website claims to have added 100,000 users, but the ID has only increased by 1,000, it can only be hehe.

Because the essence of the primary key is to ensure unique records, it does not require the primary key to be continuous. In fact, it is better to be discontinuous, which not only avoids operational data leakage, but also creates obstacles for hackers to predict ID, and has higher security.

This problem does not exist with string primary keys. If we use a UUID as the primary key, that is, varchar(32), in addition to taking up more storage space, the string primary key is unpredictable.

Some people think that UUID is completely random, and the primary key itself is not incremented by time, which is not conducive to direct primary key sorting. In fact, solving this problem is very simple.

Method 1: Construct a primary key directly with timestamp + UUID, and add 0 to the timestamp, so that the generated primary key is sorted by time. This method is simple and rude, but the disadvantage is that the primary key is longer.

The second method is to customize an algorithm, put the timestamp in the high position, the serial number in the low position, and keep the machine bit, and then use base32 encoding to control the length within 20 characters.

One may ask, according to method two, is it feasible to construct a 64-bit integer containing a timestamp and a sequence number as the primary key?

It is theoretically possible because the timestamp 0xffffffff can represent the year 2100. But the remaining bits are not ffffffff but only fffff. If ff is assigned to the machine as an identifier, then only a maximum of 0xfff+1=4096 primary keys can be generated per second, which is not enough for some large applications.

Why 64-bit integers can only use the following fffff bits to remove the timestamp? This is because JavaScript's Number type is 56-bit precision, and the largest integer it can represent is 0x1ffffffffffffff, and we will use REST to deal with JavaScript sooner or later, so we must limit the range of 64-bit integers to 0x1ffffffffffffff, otherwise interacting with JavaScript will error.

Although it is theoretically ok for a 64-bit integer to be the primary key of the timestamp + serial number, in practice, it is impossible to bypass the interaction with JavaScript. Considering it all, the string primary key is the most reliable.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326308922&siteId=291194637