Rpm: program database sub-library sub-table, so that automatically extends the database data and avoid hot spots

from : https://www.toutiao.com/i6678174710284419588/?group_id=6678174710284419588

table of Contents

  1. Foreword
  2. Ideas program
  3. Design
  4. The core main flow
  5. How expansion
  6. system design

Foreword

An article you know how sub-library sub-table it? How to do and never migrate data and avoid hot too?  We introduced the regular program sub-library sub-table, each with advantages and disadvantages:

modulo hash scheme: no hot spots, but require data migration.

range scope of the program: no data migration, but there are hot issues.

So what the program can do a combination of both the advantages of it?

In fact, there is a real demand, whether based on server performance and storage level, even adjust properly store it?

How to never migrate data and avoid hot spots? Allocate the amount of data (Secret papers) based on server metrics

 

Ideas program

hash can be solved uniform data issues, range data migration problem can be solved, then we can not do a combination of both? Both take advantage of the characteristics of it?

We consider the expansion of data represents the value of the routing key (such as id) becomes larger, this is certain, that we first ensure that the data becomes great, let fall to a range of data inside a range scheme. Id again later this larger, it does not require previous data is migrated.

But also taking into account uniform data, it is not possible uniform data within a certain range of it? Because every time we design the expansion will certainly advance the expansion of the scope of good size, as long as we ensure uniformity of data within this range is not ok.

How to never migrate data and avoid hot spots? Allocate the amount of data (Secret papers) based on server metrics

 

Design

We first define the concept of a group group, which group which contains a number of sub-library and a sub-table, following FIG.

How to never migrate data and avoid hot spots? Allocate the amount of data (Secret papers) based on server metrics

 

The figure there are a few key points:

1) id = 0 ~ 4000 million certainly fall group01 group

2) group01 group 3 DB, how to route it a id to which DB?

3) According to hash modulo positioning DB, that the modulus of how much? To modulus of this group all the group number table in the DB, the total number of the table 10 in FIG. Why is the total number of tables to go? Rather than the total number of DB 3 it?

4) as id = 12, id% 10 = 2; that is 2, which DB library fell it? It is designed to be pre-configured, how to set it?

5) Once the design orientation which DB, you need to determine which falls DB tables in it?

How to never migrate data and avoid hot spots? Allocate the amount of data (Secret papers) based on server metrics

 

The core main flow

How to never migrate data and avoid hot spots? Allocate the amount of data (Secret papers) based on server metrics

 

按照上面的流程,我们就可以根据此规则,定位一个id,我们看看有没有避免热点问题。

我们看一下,id在【0,1000万】范围内的,根据上面的流程设计,1000万以内的id都均匀的分配到DB_0,DB_1,DB_2三个数据库中的Table_0表中,为什么可以均匀,因为我们用了hash的方案,对10进行取模。

上面老顾也提了疑问,为什么对表的总数10取模,而不是DB的总数3进行取模?我们看一下为什么DB_0是4张表,其他两个DB_1是3张表?

在我们安排服务器时,有些服务器的性能高,存储高,就可以安排多存放些数据,有些性能低的就少放点数据。如果我们取模是按照DB总数3,进行取模,那就代表着【0,4000万】的数据是平均分配到3个DB中的,那就不能够实现按照服务器能力适当分配了。

按照Table总数10就能够达到,看如何达到

How to never migrate data and avoid hot spots? Allocate the amount of data (Secret papers) based on server metrics

 

上图中我们对10进行取模,如果值为【0,1,2,3】就路由到DB_0,【4,5,6】路由到DB_1,【7,8,9】路由到DB_2。现在小伙伴们有没有理解,这样的设计就可以把多一点的数据放到DB_0中,其他2个DB数据量就可以少一点。DB_0承担了4/10的数据量,DB_1承担了3/10的数据量,DB_2也承担了3/10的数据量。整个Group01承担了【0,4000万】的数据量。

注意:小伙伴千万不要被DB_1或DB_2中table的范围也是0~4000万疑惑了,这个是范围区间,也就是id在哪些范围内,落地到哪个表而已。

上面一大段的介绍,就解决了热点的问题,以及可以按照服务器指标,设计数据量的分配。

How to never migrate data and avoid hot spots? Allocate the amount of data (Secret papers) based on server metrics

 

如何扩容

其实上面设计思路理解了,扩容就已经出来了;那就是扩容的时候再设计一个group02组,定义好此group的数据范围就ok了。

How to never migrate data and avoid hot spots? Allocate the amount of data (Secret papers) based on server metrics

 

Because it is a new group01 group, so there is no data migration concept is entirely new group group, and this group prevents the group still hot, that is, [40 million, 55 million] data are evenly distributed to three of table_0 DB table, {55,000,000} ~ 70000000 table_1 evenly distributed to the data table.

system design

How to never migrate data and avoid hot spots? Allocate the amount of data (Secret papers) based on server metrics

 

Ideas determine the design is relatively simple, just three tables, to establish a good relationship between the group, DB, table on the line.

How to never migrate data and avoid hot spots? Allocate the amount of data (Secret papers) based on server metrics

The relationship between the group and DB

How to never migrate data and avoid hot spots? Allocate the amount of data (Secret papers) based on server metrics

The relationship between the table and db

Associated with the above table is actually relatively simple, as long as the principle of thinking straightened out, ok. Small partners in the development of not always related to query three tables can be saved to the cache (local jvm cache), this will not affect performance.

How to never migrate data and avoid hot spots? Allocate the amount of data (Secret papers) based on server metrics

 

Once capacity is needed, the junior partner is not to increase it group02 association, application services that need to restart it?

Simple point, then it is configured in the morning, restart the application service on the line. But if it is a large company that is not allowed, because there are orders in the morning. then what should we do? Local jvm cache how update it?

In fact, there are many programs, old Gu recommended zookeeper, specific small partners can search the Internet, will be out after the old Gu article describes.

So far, the overall scenario describes the end, we want to help small partners. Thank you! ! !

A key point here implies that routing key (such as: id) value is critical, it must be ordered requirements, increment, and this involves distributed to id the only program next old Gu article will explain how to design a distributed master key

Guess you like

Origin www.cnblogs.com/liuqingsha3/p/11584429.html