Distributed Design and Development (4)------Data Split

Original link

 

In a large-scale system, performance and availability problems are most likely to occur in the database. Therefore, an important area of ​​distributed design and development is how to make the data layer scalable. The expansion of the database is divided into Scale Up and Scale Out. And Scale Up, to put it bluntly, is done by upgrading the server configuration, so it is not considered in the distributed design. Scale Out is to improve processing power by adding machines. Generally, the following two issues need to be considered:

  • data split
  • Database High Availability Architecture

Data splitting is the first thing that comes to mind. The principle is very simple. When the data of a table cannot be processed, it needs to be split into multiple tables. It is simple to say, and there are many points when it is really used in the project. It requires in-depth research, generally divided into:

  • segmentation strategy
  • Integration strategy with application side

segmentation strategy

The segmentation strategy is generally divided into vertical segmentation, horizontal segmentation and a mix of the two.

1) vertical cut

Vertical segmentation is to divide tables into different databases by modules, which is very common in the evolution of large websites. When a website is still very small, only a small number of people develop and maintain it, and all modules and tables are together. There is a need to divide the table by modules and functions. As shown below:

In fact, compared to vertical segmentation, service-oriented transformation is a step further. To put it simply, it is to split the original strongly coupled system into multiple weakly coupled services, and to meet business needs through invocation between services. After it comes out, it needs to be exposed in the form of services, instead of directly calling the tables of different modules. Taobao is constantly evolving in its architecture. The most important part is the transformation of services, which extracts the core concepts of users, transactions, stores, and treasures into Independent services are also very conducive to local optimization and governance to ensure the stability of core modules. Such a split also comes at a cost:

  • Table association cannot be done at the database level
  • There is still a performance bottleneck for large data volumes in a single table
  • Transaction guarantees are more complicated
  • Application-side complexity increases

The above problems are obvious. The key to dealing with them is how to decouple the different modules. This is a technical problem, but in fact it is a business design problem. Only when the business is loosely coupled, can it be technically designed. isolated. Without coupling, there is no need for table associations and transactions. In addition, the big data bottleneck problem can be seen in the horizontal segmentation below.

2) Horizontal segmentation

As mentioned above, vertical segmentation only divides tables into different databases according to modules, but does not solve the problem of large data volume in a single table, while horizontal segmentation is to divide the data of a table into different tables or databases according to certain rules. For example, in a billing system, it is more appropriate to divide the table by time, because the system processes the data of a certain period of time. For SaaS applications, it is more appropriate to divide data according to the user dimension. Because of the isolation between users, there is generally no case of processing multiple user data. The following is a relatively simple example of horizontal segmentation by user_id. :

Horizontal segmentation does not destroy the relationship between tables, and it is possible to put related tables in a library, so that it does not affect the business requirements of the application side, and such segmentation can fundamentally solve the problem of large data volume. Its problems are also obvious:

  • When the segmentation rules are complex, it increases the difficulty of application-side calls
  • Data maintenance is relatively difficult. When the splitting rules change, the data needs to be migrated

For the first question, you can refer to how to integrate the application side and the database side to be discussed later. For the second question, you can refer to the algorithm of consistent hashing, and use some mapping strategies to reduce the cost of data maintenance. Please refer to the previous blog post Distributed Design and Development (2) ------ Several distributions that must be understood formula

3) Vertical and horizontal joint segmentation

It can be seen from the above that vertical segmentation can make module division clearer, differentiated governance, and horizontal segmentation can solve the performance bottleneck problem of large data volume. Therefore, the two are often used in combination, which is a common strategy in large websites. The advantages of both can be combined. Of course, the disadvantage is that it is more complicated and costly, and it is not suitable for small websites. The following is a combination of the previous two examples:

Integration strategy with application side

Cutting out data is only the first step. The key lies in how the application side can easily access the data. The application side cannot access data incorrectly or extremely complicated because of data splitting. Generally speaking, there are three strategies from front to back:

  • Database routing on the application side
  • Add a proxy server to the application side and server side for routing
  • The database side does its own routing

1)  The application side does database routing

应用端做数据库路由实现起来比较简单,也就是在数据库调用的点通过工具包的处理,给每次调用数据库加上路由信息,也就是分析每次调用,路由到正确的库。这种方式多多少少没有对应用端透明,如果路由策略有更改还需要修改应用端,并且这种更改很难做到动态更改。最关键的是应用端的连接池设计会比较复杂,池里的连接就不是无状态了,不利于管理和扩展。

2)在应用端和服务器端加一个代理服务器做路由

通过代理服务器来做服务器做路由可以对客户端屏蔽后端数据库拆分细节,增强了拆分规则的可维护性,一般而言proxy需要提供以下features:

  • 对客户端和数据库服务端的连接管理和安全认证
  • 数据库请求路由可配置性
  • 对调用命令和SQL的解析
  • 调用结果的过滤和合并

现在有些开源框架提供了类似功能,比如ameoba,在以前博文设计与开发应用服务器(一)------常见模式 中介绍过ameoba的大致结构,在构建高性能web之路------mysql读写分离实战 介绍过如何实战ameoba,有兴趣的朋友可以参考一下。

3)数据库端自行做路由

例如MySQL就提供了MySQL Proxy的代理产品可以在数据库端做路由,结构如下所示:

这种方式的最大问题就是拆分规则配置的灵活性不好,不一定能满足应用端的多种划分需求。

以上介绍了些数据拆分的策略和相关支撑策略,随后会研究一下前面谈到的数据库高可用架构。

(很资料来源于简朝阳的《MySQL性能调优与架构设计》,需要深入学习的朋友可以参考这本书)

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326645542&siteId=291194637