[Transfer] About business primary keys and logical primary keys

 

About business primary keys and logical primary keys

http://www.cnblogs.com/sparkbj/p/6015690.html

In the past few days, I have thought about logical primary keys, business primary keys and composite primary keys, and I have also searched for related discussions on the Internet. For related discussions, see the reference link at the bottom. The following are some summaries based on SQL Server, and other databases (Oracle, MySQL , DB2, ...) should be similar. This is just my own temporary thinking, please let me know if it is inappropriate, and reconsider and then correct it.

 

Definition (part of the definition from SQL Server Books Online):

PRIMARY KEY : A table usually has a column or set of columns that contain a value that uniquely identifies each row in the table. Such one or more columns are called the table's primary key (PK) and are used to enforce the entity integrity of the table.

FOREIGN KEY : A foreign key (FK) is one or more columns used to establish and strengthen the link between two table data. In foreign key references, a link is created between two tables when a column of one table is referenced as a column of the primary key value of another table. This column becomes the foreign key to the second table.

Clustered Index : A clustered index sorts and stores rows of data within a table based on their key values. There can only be one clustered index per table, because the data rows themselves can only be stored in one order.

Nonclustered Index : A nonclustered index contains the index key value and a row locator that points to where the table data is stored. Multiple nonclustered indexes can be created on a table or indexed view. Typically, nonclustered indexes are designed to improve the performance of frequently used queries that do not have a clustered index.

Autonumber and identifier columns : For each table, you can create an identifier column that contains a system-generated ordinal value that uniquely identifies each row in the table.

Business primary key (natural primary key): In the database table, the field with business logic meaning is used as the primary key, which is called "Natural Key".

Logical primary key (surrogate primary key): A field in the database table that has nothing to do with the logical information in the current table is used as its primary key, which is called "surrogate primary key".

Composite primary key (joint primary key): The combination of two or more fields is used as the primary key.

 

Principle analysis:

The main reason for using the logical primary key is that once the business primary key is changed, the modification of the part of the system associated with the primary key will be inevitable, and the more references, the greater the change. The use of logical primary keys only needs to modify the business logic related to the corresponding business primary keys, which reduces the scope of influence on the system due to changes in business primary keys. Changes in business logic are unavoidable, because "change is what never changes." No company is immutable, and no business is immutable. The most typical example is the business change of upgrading the ID card and replacing the driver's license number with the ID number. And in reality, the ID card number has indeed been duplicated , so if the ID card number is used as the primary key, it will also bring difficulties to deal with. Of course, in response to changes, there can be many solutions. One of the solutions is to make a new system to keep pace with the times, which is indeed a good thing for software companies.

Another reason for using a logical primary key is that the business primary key is too large, which is not conducive to transmission, processing and storage. I think generally if the business primary key exceeds 8 bytes, you should consider using the logical primary key, because int is 4 bytes, bigint is 8 bytes, and the business primary key is generally a string, which is also 8 bytes bigint and 8 Byte strings are naturally more efficient in transmission and processing than bigint. Just imagine the difference in assembly code with code == "12345678" and id == 12345678. Of course, the logical primary key is not necessarily int or bigint, and the business primary key is not necessarily a string, it can also be int or datetime and other types, and the transmission is not necessarily the primary key, which needs to be analyzed in detail, but the principle is similar, here is just Discuss the usual situation. At the same time, if other tables need to reference the primary key, they also need to store the primary key, so the cost of this storage space is also different. Moreover, the reference field of these tables is usually a foreign key, or an index is usually built to facilitate search, which will also cause different storage space costs, which also requires specific analysis.

使用逻辑主键的再一个原因是,使用 int 或者 bigint 作为外键进行联接查询,性能会比以字符串作为外键进行联接查询快。原理和上面的类似,这里不再重复。

使用逻辑主键的再一个原因是,存在用户或维护人员误录入数据到业务主键中的问题。例如错把 RMB 录入为 RXB ,相关的引用都是引用了错误的数据,一旦需要修改则非常麻烦。如果使用逻辑主键则问题很好解决,如果使用业务主键则会影响到其他表的外键数据,当然也可以通过级联更新方式解决,但是不是所有都能级联得了的。

使用业务主键的主要原因是,增加逻辑主键就是增加了一个业务无关的字段,而用户通常都是对于业务相关的字段进行查找(比如员工的工号,书本的 ISBN No. ),这样我们除了为逻辑主键加索引,还必须为这些业务字段加索引,这样数据库的性能就会下降,而且也增加了存储空间的开销。所以对于业务上确实不常改变的基础数据而言,使用业务主键不失是一个比较好的选择。另一方面,对于基础数据而言,一般的增、删、改都比较少,所以这部分的开销也不会太多,而如果这时候对于业务逻辑的改变有担忧的话,也是可以考虑使用逻辑主键的,这就需要具体问题具体分析了。

使用业务主键的另外一个原因是,对于用户操作而言,都是通过业务字段进行的,所以在这些情况下,如果使用逻辑主键的话,必须要多做一次映射转换的动作。我认为这种担心是多余的,直接使用业务主键查询就能得到结果,根本不用管逻辑主键,除非业务主键本身就不唯一。另外,如果在设计的时候就考虑使用逻辑主键的话,编码的时候也是会以主键为主进行处理的,在系统内部传输、处理和存储都是相同的主键,不存在转换问题。除非现有系统是使用业务主键,要把现有系统改成使用逻辑主键,这种情况才会存在转换问题。暂时没有想到还有什么场景是存在这样的转换的。

使用业务主键的再一个原因是,对于银行系统而言安全性比性能更加重要,这时候就会考虑使用业务主键,既可以作为主键也可以作为冗余数据,避免因为使用逻辑主键带来的关联丢失问题。如果由于某种原因导致主表和子表关联关系丢失的话,银行可是会面临无法挽回的损失的。为了杜绝这种情况的发生,业务主键需要在重要的表中有冗余存在,这种情况最好的处理方式就是直接使用业务主键了。例如身份证号、存折号、卡号等。所以通常银行系统都要求使用业务主键,这个需求并不是出于性能的考虑而是出于安全性的考虑。

使用复合主键的主要原因和使用业务主键是相关的,通常业务主键只使用一个字段不能解决问题,那就只能使用多个字段了。例如使用姓名字段不够用了,再加个生日字段。这种使用复合主键方式效率非常低,主要原因和上面对于较大的业务主键的情况类似。另外如果其他表要与该表关联则需要引用复合主键的所有字段,这就不单纯是性能问题了,还有存储空间的问题了,当然你也可以认为这是合理的数据冗余,方便查询,但是感觉有点得不偿失。

使用复合主键的另外一个原因是,对于关系表来说必须关联两个实体表的主键,才能表示它们之间的关系,那么可以把这两个主键联合组成复合主键即可。如果两个实体存在多个关系,可以再加一个顺序字段联合组成复合主键,但是这样就会引入业务主键的弊端。当然也可以另外对这个关系表添加一个逻辑主键,避免了业务主键的弊端,同时也方便其他表对它的引用。

综合来说,网上大多数人是倾向于用逻辑主键的,而对于实体表用复合主键方式的应该没有多少人认同。支持业务主键的人通常有种误解,认为逻辑主键必须对用户来说有意义,其实逻辑主键只是系统内部使用的,对用户来说是无需知道的。

 

结论或推论:

1、尽量避免使用业务主键,尽量使用逻辑主键。

2、如果要使用业务主键必须保证业务主键相关的业务逻辑改变的概率为0,并且业务主键不太大,并且业务主键不能交由用户修改。

3、除关系表外,尽量不使用复合主键。

 

 

使用逻辑主键的最佳实践指南:

1. Just enough. The life cycle used by the system is limited to 100 years. The logical primary key data type adopts the rules in the following table. If it is uncertain, the int type is used.

The amount of data type of data data size Generation frequency Remark
< 128 tinyint 1 byte 1/year The frequency is too low, not very reliable, not recommended
<30,000 smallint 2 bytes 27/month Low frequency, use with caution
< 2.1 billion int 4 bytes 40 pieces/min Can meet most situations
< 92.2 billion bigint 8 bytes 2.92 million/ms Can satisfy most situations 
>= 92.2 billion uniqueidentifier 16 bytes

10 billion users generate 1 billion records every millisecond at the same time, which can be continuously generated for 1 billion years

Can be used for distributed, high-concurrency applications

2. Generally, the self-increasing method or the NewID() method is used.

3. The primary key field name generally adopts the "table name ID" method, which is convenient for identification and table connection.

4. If the table has distributed applications, you can consider using different starting values, and the same synchronization method will automatically increase. For example, there are 3 libraries deployed in different places, you can design as follows:

starting value step size
1 10
2 10
3 10

The step size is uniformly set to 10 to facilitate future expansion, so that the uniqueness of the primary key can be maintained between different libraries, and it is also easy to merge.

5. If there are high concurrency requirements or data table migration requirements, consider using the uniqueidentifier type and use the NewID() function.

6. Consider establishing a unique index on the business primary key to meet the business requirements of the uniqueness of the business primary key.

7. If you need to consider the performance requirements of the business primary key, you can establish a clustered index for the business primary key, and only establish the primary key constraint and non-clustered index for the logical primary key.

8. The composite primary key method can be considered for the relational table, and the composite primary key is not used for the entity table.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326437626&siteId=291194637