When IO and CPU encounter bottleneck solutions

First, the database bottleneck

Whether it is an IO bottleneck or a CPU bottleneck, it will eventually lead to an increase in the number of active connections in the database, and then approach or even reach the threshold of the number of active connections the database can carry. In terms of business services, there are few or no available database connections. Then you can imagine it (concurrency, throughput, crash).

1. IO bottleneck

The first one: disk read IO bottleneck, too much hot data, database cache can not be placed, each query will generate a lot of IO, reduce the query speed-> sub-library and vertical sub-table.

The second kind: network IO bottleneck, too much requested data, insufficient network bandwidth-> sub-library.

2. CPU bottleneck

The first type: SQL problems, such as SQL contains join , group by, order by, non-indexed field condition query, etc., increase the operation of CPU operations-> SQL optimization, establish a suitable index, and perform business calculations in the business Service layer.

The second type: the amount of data in a single table is too large, too many rows are scanned during query, the SQL efficiency is low, and the CPU is the first to have a bottleneck-> horizontal table division.

2. Sub-library and sub-table

1. Horizontal sub-library

Concept: Based on the field, according to a certain strategy (hash, range, etc.), the data in one library is split into multiple libraries.

result:

  • The structure of each library is the same;

  • The data of each library is different, there is no intersection;

  • The union of all libraries is full data;

Scenario: The absolute concurrency of the system is up, it is difficult to solve the problem fundamentally, and there is no obvious business attribution to divide the library vertically.

Analysis: With more libraries, the pressure of io and cpu can naturally be relieved by multiples.

2. Level table

Concept: Based on the field, according to a certain strategy (hash, range, etc.), split the data in one table into multiple tables.

result:

  • The structure of each table is the same;

  • The data of each table is different, there is no intersection;

  • The union of all tables is full data;

Scenario: The absolute concurrent volume of the system has not come up, but the amount of data in a single table is too much, which affects the efficiency of SQL and increases the burden on the CPU, so that it becomes a bottleneck. Recommendation: Analysis of the optimization principle of a SQL query

Analysis: The amount of data in the table is reduced, and the efficiency of single SQL execution is high, which naturally reduces the burden on the CPU.

3. Vertical sub-library

Concept: Based on the table, according to the different business attribution, split the different tables into different libraries.

result:

  • The structure of each library is different;

  • The data of each library is also different, there is no intersection;

  • The union of all libraries is full data;

Scenario: The system's absolute concurrent volume is up, and a separate business module can be abstracted.

Analysis: At this stage, it can basically be serviced. For example, with the development of the business, there are more and more common configuration tables, dictionary tables, etc. At this time, these tables can be split into separate libraries, and can even be serviced. Furthermore, with the development of the business, a set of business models has been hatched. At this time, the related tables can be disassembled into a separate library, or even serviced.

4. Vertical table

Concept: Based on the field, according to the activity of the field, the fields in the table are split into different tables (main table and extended table).

result:

  • The structure of each table is different;

  • The data of each table is also different. Generally speaking, the fields of each table have at least one column intersection, which is generally the primary key, which is used to correlate data;

  • The union of all tables is full data;

Scenario: The absolute concurrency of the system has not come up. There are not many records in the table, but there are many fields, and hot data and non-hot data are together. The storage space required for a single row of data is large. As a result, the number of data rows cached by the database is reduced, and a large amount of random read IO will be generated when reading the disk data during the query, resulting in an IO bottleneck.

Analysis: List pages and detail pages can be used to help understanding. The principle of vertical table splitting is to put hotspot data (data that may be frequently queried together) as the main table, and non-hotspot data as the extended table. In this way, more hotspot data can be cached, thereby reducing random read IO. After dismantling, to get all the data, you need to associate two tables to fetch the data.

But remember, do not use join, because join will not only increase the burden of CPU and will talk about two tables coupled together (must be on a database instance). Related data, you should make an article in the Business Service layer, get the main table and extended table data separately and then use the related fields to get all the data.

3. Sub-database and sub-table tools

  • sharding-sphere: jar, formerly sharding-jdbc;

  • TDDL:jar,Taobao Distribute Data Layer;

  • Mycat: middleware.

Note: The pros and cons of the tool, please do your own research, the official website and community priority.

4. Sub-library and table steps

According to the capacity (current capacity and growth), evaluate the number of sub-libraries or sub-tables-> select key (even)-> table rules (hash or range, etc.)-> execute (generally double write)-> expansion problem (reduce as much as possible) Data movement).

Extension: MySQL: The difference and thinking of sub-database sub-table and partition

Fifth, the problem of sub-library and table

1. Non-partition key query problem

Based on the horizontal sub-library and sub-table, the split strategy is the commonly used hash method.

In addition to the partition key, there is only one non-partition key as a conditional query

Mapping method

Genetic method

Note: When writing, the genetic method generates user_id, as shown in the figure. Regarding the xbit gene, for example, there are 8 tables, 23 = 8, so x takes 3, which is the 3bit gene. According to the user_id query, it can directly take the modular route to the corresponding sub-library or sub-table.

 

When querying based on user_name, first generate user_name_code through the user_name_code generation function and then take the modulo route to the corresponding sub-library or sub-table. The id generation commonly used snowflake algorithm.

In addition to the partition key, there is more than one non-partition key as a conditional query

Mapping method

Joke

Note: Route to db_o_buyer library when querying by order_id or buyer_id, and route to db_o_seller library when querying by seller_id. It feels a bit upside down! Is there any other good way? What about changing the technology stack?

In addition to the partition key, there are various non-partition key combination condition queries in the background

NoSQL method

Joke

2. Non-partition key cross-database cross-table paging query problem

Based on the horizontal sub-library and sub-table, the split strategy is the commonly used hash method.

Note: NoSQL method (ES, etc.).

3. Capacity expansion

Based on the horizontal sub-library and sub-table, the split strategy is the commonly used hash method.

Horizontal expansion library (upgrade from library method)

Note: Expansion is doubled.

Horizontal expansion table (double-write migration method)

  • The first step: (synchronous double write) modify the application configuration and code, plus double write, deployment;

  • The second step: (synchronous double write) copy the old data in the old library to the new library;

  • The third step: (Synchronous Double Write) Proofread the old data in the new library based on the old library;

  • The fourth step: (synchronous double write) modify the application configuration and code, remove double write, deploy;

Note: Double-write is a general scheme.

Six, summary of sub-library and sub-table

  • To divide the library into tables, you must first know where the bottleneck is, and then you can split it reasonably (sub-library or table? Horizontal or vertical? How many?). And can not be split in order to divide the library and table.

  • Choosing a key is very important, not only to consider the split evenly, but also to consider non-partition key queries.

  • As long as the requirements can be met, the simpler the splitting rule, the better.

Published 7 original articles · won 5 · visited 4120

Guess you like

Origin blog.csdn.net/blue_heart_/article/details/105505361