Database sharding series (2) Global primary key generation strategy

This article will mainly introduce some common global primary key generation strategies, and then focus on a very good global primary key generation scheme used by flickr. For the splitting strategy and implementation details of sharding, please refer to the previous article in this series: Database sharding series (1) Splitting implementation strategy and example demonstration  The original link of this article:  http ://blog.csdn.net/bluishglc/article/details/7710738  , please indicate the source for reprinting!

 

Part 1: Some Common Primary Key Generation Strategies

 

Once the database is split into multiple physical nodes, we can no longer rely on the database's own primary key generation mechanism. On the one hand, the self-generated ID of a partitioned database cannot be guaranteed to be globally unique; on the other hand, the application needs to obtain the ID before inserting data for SQL routing. At present, several feasible primary key generation strategies are:
1. UUID: Using UUID as the primary key is the simplest solution, but the disadvantages are also very obvious. Because the UUID is very long, in addition to occupying a lot of storage space, the main problem is the index. There are performance problems when creating an index and querying based on the index.
2. Maintain a Sequence table in combination with the database: The idea of ​​​​this scheme is also very simple. Create a Sequence table in the database. The structure of the table is similar to:

 

[sql]  view plain copy  
 
  1. CREATETABLE `SEQUENCE` (   
  2.     `tablename` varchar(30) NOTNULL,   
  3.     `nextid` bigint(20) NOTNULL,   
  4.     PRIMARYKEY (`tablename`)   
  5. ) ENGINE=InnoDB   

Whenever an ID needs to be generated for a new record of a certain table, the nextid of the corresponding table is taken out from the Sequence table, and the value of the nextid is incremented by 1 and then updated to the database for the next use. This scheme is also simpler, but the disadvantages are equally obvious: because all inserts need to access the table, the table can easily become a system performance bottleneck, and it also has a single point of problems, once the table database fails, the entire application will not work . Some people propose to use Master-Slave for master-slave synchronization, but this can only solve the single-point problem, and cannot solve the access pressure problem with a read-write ratio of 1:1.

In addition, there are some solutions, such as partitioning IDs for each database node, and some ID generation algorithms on the Internet. Because of the lack of operability and practical tests, this article does not recommend them. In fact, what we will introduce next is a primary key generation scheme used by Fickr. This scheme is the best one I know so far, and it has been tested by practice and can be used for reference by most application systems.

 

 

Part II: An excellent primary key generation strategy

 

flickr开发团队在2010年撰文介绍了flickr使用的一种主键生成测策略,同时表示该方案在flickr上的实际运行效果也非常令人满意,原文连接:Ticket Servers: Distributed Unique Primary Keys on the Cheap 这个方案是我目前知道的最好的方案,它与一般Sequence表方案有些类似,但却很好地解决了性能瓶颈和单点问题,是一种非常可靠而高效的全局主键生成方案。

图1. flickr采用的sharding主键生成方案示意图(点击查看大图)

 

flickr这一方案的整体思想是:建立两台以上的数据库ID生成服务器,每个服务器都有一张记录各表当前ID的Sequence表,但是Sequence中ID增长的步长是服务器的数量,起始值依次错开,这样相当于把ID的生成散列到了每个服务器节点上。例如:如果我们设置两台数据库ID生成服务器,那么就让一台的Sequence表的ID起始值为1,每次增长步长为2,另一台的Sequence表的ID起始值为2,每次增长步长也为2,那么结果就是奇数的ID都将从第一台服务器上生成,偶数的ID都从第二台服务器上生成,这样就将生成ID的压力均匀分散到两台服务器上,同时配合应用程序的控制,当一个服务器失效后,系统能自动切换到另一个服务器上获取ID,从而保证了系统的容错。

关于这个方案,有几点细节这里再说明一下:

1. flickr的数据库ID生成服务器是专用服务器,服务器上只有一个数据库,数据库中表都是用于生成Sequence的,这也是因为auto-increment-offset和auto-increment-increment这两个数据库变量是数据库实例级别的变量。
2. flickr的方案中表格中的stub字段只是一个char(1) NOT NULL存根字段,并非表名,因此,一般来说,一个Sequence表只有一条纪录,可以同时为多张表生成ID,如果需要表的ID是有连续的,需要为该表单独建立Sequence表

3. 方案使用了MySQL的LAST_INSERT_ID()函数,这也决定了Sequence表只能有一条记录。
4. 使用REPLACE INTO插入数据,这是很讨巧的作法,主要是希望利用mysql自身的机制生成ID,不仅是因为这样简单,更是因为我们需要ID按照我们设定的方式(初值和步长)来生成。

5. SELECT LAST_INSERT_ID()必须要于REPLACE INTO语句在同一个数据库连接下才能得到刚刚插入的新ID,否则返回的值总是0
6. 该方案中Sequence表使用的是MyISAM引擎,以获取更高的性能,注意:MyISAM引擎使用的是表级别的锁,MyISAM对表的读写是串行的,因此不必担心在并发时两次读取会得到同一个ID(另外,应该程序也不需要同步,每个请求的线程都会得到一个新的connection,不存在需要同步的共享资源)。经过实际对比测试,使用一样的Sequence表进行ID生成,MyISAM引擎要比InnoDB表现高出很多!

7. The operation of the Sequence table can be realized by using pure JDBC, so as to obtain higher efficiency. Experiments show that the performance of only using spring  JDBC is not as fast as that of pure JDBC!

 

To implement this solution, the application also needs to do some processing, mainly in two aspects:


1. Automatically balance the access of the database ID generation server
2. Ensure that in the event of a database ID generation server failure, the request can be forwarded to other servers for execution.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326217662&siteId=291194637