And sub-sub-table library

Foreword

  When we gradually expanded amount of data, a data amount becomes very large table to the time that affect the performance, we need a means of splitting the table into a number of tables to improve performance, this is the sub-table. When the amount of data we continue to expand our stand-alone database already exists in a lot of sub-tables, and the database server can not afford such a large number of requests, we need a means of pressure to several stand-alone database sharing database, which is sub-libraries .

I. Introduction

  Part table: the table is divided according to a predetermined rule into a plurality of part tables, the query may be positioned directly into the part table in accordance with the rules of a sub-table, to improve the search efficiency.
  Sub-libraries: After the points table, will produce a very large database tables, database tables too much can affect the performance of the database, and then a large amount of data the server is unable to meet the needs of the application, so the need to follow certain rules the division of the database, the table will conform to the rules for migration to the database.
  Sub-library sub-table can be split into horizontal and vertical split, horizontal split line is split according to the different rows in different tables; vertical splitter column is split by the different columns in different tables . Partition table can be found with similar points. In fact, they like the idea, but the practical application of different scenarios led to its realization different.

ps: The reason the excessive number of database tables that affect database performance is as follows:
table_definition_cache . Table definition information buffer is used to store information table definition when used in our MySQL table more when the cache needs frequent switching performance impact. Note that this setting is the number of table definition information can be cached, rather than the size of the memory space.
table_open_cache is the number of cache of open tables, instead of defining memory size, but how much can define the file handle cache open tables of information. If the definition is too small, we need to open a new table in MySQL when we should continue to have close open table and open the need to open the table, the performance will be affected.

 Second, the way

Table 1. Vertical
  i.e. "small table split big table", based on the fields of the column. Usually more fields in the table, will not be used, the larger the data, the longer the length of the split (such as the type of text field) to the "extension table." Generally when hundreds of columns for the kind of large table, but also to avoid the query, the amount of data caused by too much "spread" problem. After the amount of data continues to expand, will still bring a single table data overload affecting performance problems.
Library 2. Vertical
  Vertical library is a system for different services split, then split the database can be on a different server, to obtain higher processing capability. Vertical split to some extent, can enhance the bottleneck caused by hardware resources, this simple way, architecture complexity is not high, but if the data continues to swell, causing an increase in database tables, still cause performance degradation.
3. The level of sub-table
  for the huge amount of data in a single table (such as order table), according to certain rules, cut assigned to multiple tables inside. But these same table or in a database, so database-level database operations are still IO bottleneck.
4. The horizontal component library
  for a huge amount of data in the database, according to certain rules, cut into a plurality of databases, the database structure after separation cutting as on different servers, data is stored according to the rules for segmentation database, get more hardware resources.

 

Third, the general idea of ​​sub-library sub-table

  The most important sub-library sub-table is split key choice, when we use the split key queries, you can quickly navigate to the corresponding data table, excellent split key allows query efficiency improved significantly, while the poor district key will cause the query efficiency decreased.
1. Split single bond

  As the name suggests, the sub-library sub-table in accordance with a key split, but sometimes we do not use the split keys to find the time, all tables will be scanned and inefficient. There are two solutions: full-scale redundancy and redundancy relational tables.
Full-scale redundancy
  for example, an order form t_order split into three tables t_order, t_user_order, t_merchant_order. They were used in three separate sharding column, i.e. order_id, user_id, merchant_code. Each table is a full-volume data, data synchronization can be used binlog.
Redundant table
  is only a table is full scale, the other table is a table, for example, an order form t_order split into three tables t_order, t_user_order, t_merchant_order, t_order full amount of stored data, order_id as resolving key, t_user_order storage user_id and order_id, as the split key user_id.

Full-scale redundancy redundant relations VS table
  speed comparison: full-scale redundancy faster, the need for secondary redundant query relational tables, even with the introduction of the cache, or one more network overhead;
  storage costs: full-scale redundancy required times redundant storage costs in relational tables;
  maintenance costs: full-scale redundancy to maintain a higher price, it comes to data changes, several tables should be modified.

2. Multi-split keys
  such as login, you need to use the user name and password, which is the user name and password to split the key.

3. 拆分键+ES
  上面提到的都是条件中有拆分键的SQL执行。但是,总有一些查询条件是不包含拆分键的,同时,我们也不可能为了这些请求量并不高的查询,无限制的冗余分库分表。那么这些查询条件中没有拆分键的SQL怎么处理?以sharding-jdbc为例,有多少个分库分表,就要并发路由到多少个分库分表中执行,然后对结果进行合并。这种条件查询相对于有拆分键的条件查询性能很明显会下降很多。
  更有甚者,尤其是有些运营系统中的模糊条件查询,或者上十个条件筛选。例如淘宝我的所有订单页面,筛选条件有多个,且商品标题可以模糊匹配,这即使是单表都解决不了的问题,更不用谈分库分表了。
  拆分键+ES的模式,将分库分表所有数据全量冗余到es中,将那些复杂的查询交给es处理。

 四、分库分表带来的问题

1. 事务问题
  在分库分表之后,数据存储在了不同的库上,所以本地事务失效,需要使用分布式事务解决,如果使用数据库的分布式事务支持则其效率相当低下,如果由程序控制则会存在编程上的负担并且会侵入到业务逻辑代码。
2. join问题
  数据分散在了不同库上之后,join查询变得不可用,原本一次查询就可以做的事情,最后需要多次查询才能够完成。
3. 数据库额外的负担
  在我们执行group by,order by,limit等等操作时,需要所有数据库节点同时执行并且最后还要聚合所有结果,会造成额外的负担。

五、什么时候使用分库分表

  分库分表不是最优解,反而应该是最后的解决方案。分库分表会带来相当复杂的架构以及相当大的开发、维护成本,并且如果说架构设计出现问题,其带来的问题是巨大的。只有在所有方法都不能解决性能问题时再采用分库分表,一般可以考虑的优化包括:
  设置远程数据库。如果使用的是一个整体应用程序,其中所有组件都位于同一个服务器上,那么可以通过将数据库移到它自己的机器上来提高数据库的性能。由于数据库的表保持不变,因此这不会增加分片的复杂性。
  实现缓存。如果应用程序的读取性能较低,那么缓存是一种可以优化这个问题。缓存涉及临时存储已在内存中请求的数据,以后可以快速的从缓存访问。
  读写分离。另一种有助于提高读取性能的策略,包括将数据从一个数据库服务器(主服务器)复制到一个或多个从服务器。在此之后,每次新的写操作在复制到从服务器之前都要先到主服务器,而读操作只对从服务器进行。像这样分发读写可以防止任何一台机器承担过多的负载,从而有助于防止速度下降和崩溃。请注意,创建读副本需要更多的服务器资源,因此花费更多的钱,这对一些人来说可能是一个很大的限制。
  升级到更大的服务器。在大多数情况下,将一个数据库服务器扩展到具有更多资源的计算机比分片需要更少的工作量。与读写分离一样,具有更多资源的服务器升级可能会花费更多的钱。

Guess you like

Origin www.cnblogs.com/ouhaitao/p/11117546.html