About the general idea and practice of sub-library and sub-table

Cause: The data volume of the company's project is too large, which has exceeded 20T, and the query speed is slow. In order to ensure user experience and improve efficiency, the database needs to be optimized.

Project: Distributed project, a single system has been clustered, the average daily query volume is about 2000W, and the transaction volume is about 800W

Features: Large amount of data, large amount of concurrency

*** (Because the project you are in belongs to the core system part and interacts with the database, other systems call the core system interface, so I will not elaborate, only record the practice of this system)

Database: Oracle

Language: Java

Technology: zookeeper+dubbo+spring+mybatis+internal framework

Sub-library and sub-table: mycat + internal middleware

=============================================Separation line===== ===============================================

Before explaining, let me explain:

1. The project runs stably online. To ensure the service of users, the operation of the project cannot be stopped.

2. The version switching should be done quickly and in the least amount of time

3. The modification of the code should be compatible with the old system and other systems

4, can not affect the business

5. Smooth transition

=============================================Separation line===== ===============================================

database part

Old version: The original system uses a single oracle database, and all businesses point to the same database

New version: Split a single database into several parts according to the composition of the business, as shown below

For example, in the original system, all services point to the same database

Such as

Old system business = (order + product + user + ....) DB (one DB)

New system business = order DB + product DB + user DB + .... (multiple DBs)

 

The above is the idea of ​​splitting the database

=============================================Separation line===== ===============================================

Related question section:

Some people will say that the original code, SQL points to the same data source, then after splitting, pointing to multiple different data sources, then there will be problems

for example:

1. For a huge system that has been running stably, old code and new code, how do you know which SQL will have problems when sub-database and sub-table?

2. When querying multiple tables, how can you query multiple tables if it is not the same data source?

3. How to switch between multiple data sources and maintain transactions?

4. How to change the original business code logic?

5. How to solve concurrency and anti-replication?

First, do the preparatory work

=============================================Separation line===== ===============================================

initial preparation work

1. For a huge system that has been running stably, old code and new code, how do you know which SQL will have problems when sub-database and sub-table, and how do you know which SQL needs to be modified?

Solution: Monitoring System

Idea: A simple monitoring system is composed of filters or interceptors, and mybatis itself can be configured to print sql

Practical approach: In fact, our approach is to add a monitoring system (of course it will be more complicated), adopt a period of monitoring (need to evaluate), collect logs every day, organize, summarize, and use python scripts.

Sort out the SQL used in the project, and at the same time, the project is divided into several large pieces according to business logic, and assigned to different developers to sort out, summarize, compare with the results of the monitoring system, and sort out the relevant SQL.

 

2. When querying multiple tables, how can you query multiple tables if it is not the same data source?

Scheme: SQL split + interface transformation

Idea: Split SQL across data sources into multiple SQLs, and rearrange business code

Practical practice:

a, According to the split of the business, assign it to different developers to sort out and summarize the business interface,

b. According to different interfaces, according to the business and SQL in the interface, compare and sort out the SQL in use

c. According to the sub-database logic, split the SQL associated with multiple tables across data sources into a single SQL. If you do not cross data sources, you can still query multiple tables.

 

3. How to switch between multiple data sources and maintain transactions?

Scenario: Single data source, single transaction

Idea: SQL across data sources has been split as a premise, and SQL of multiple single data sources is their own transaction

Practical practice: Due to the internal framework, we encapsulated the jar package ourselves, manually opened, committed, and rolled back the transaction

for example:

Old system: No matter how many businesses, no matter what business, there is one database, and there are five tables A, B, C, D, and E in the database. Then you can directly associate the five tables when you query, and you need to update it. Just update it, just do whatever you want

New system: 5 libraries, 5 businesses, public area (brand) + orders + users + transactions + points, a total of 5 businesses, the tables in different libraries are different (explain the relationship between them)

a, public area library: brand table A

b, order library: order table B

c, user library: user table C

d, transaction library: transaction table D

e, Integral library: Integral table E

For example, a business: you need to query the order records of a certain brand in a certain period of time and the user information of these orders

<1>, first query the brand to be queried from Table A

<2>, according to the result in <1>, pass it as a parameter to the order table for query, and bring the time parameter to obtain the order information

<3>, pass the result in <2> as a parameter to the user table to query and obtain user information

<4>, Assemble the required result

Another example: a user places an order to buy multiple items, and points are added to the user according to the amount of the user's order

<1>, the user places an order, and the order table adds data

<2>, transaction table record flow

<3>, query the total amount of this user's order

<4>, pass the result of <3> as a parameter to the integral table to calculate the integral

Different libraries open and submit their own transactions respectively, and control each small transaction through the program

 

The issue of dirty data is actually monitored by another monitoring system.

=============================================Separation line===== ===============================================

4. How to change the original business code logic

In general, the database is divided into tables. If the table will be split or some tables will be abolished, then you need to look at the code, and refactor the code of the abolished table and the code of the split table,

In principle, it is necessary for multiple people to collaborate and look at the code carefully to do this, division of labor and cooperation

5. About concurrency and anti-duplication

In fact, there are many solutions online

The approach we take is

1. Optimistic locking

2. Pessimistic lock

3. Idempotency check

4. A small amount of java code locks such as synchronized and lock locks (generally the first three are used, which consumes performance)

=============================================Separation line===== ===============================================

Before completing the switching process, there is actually a transition period. What is the transition period?

1. The code is written bit by bit, not all at once

2. Multiple teams collaborate, not your version is online, others will be online

3. Different teams are responsible for different things

 

Reality:

1. The actual sub-database sub-table part of the database is the responsibility of the DBA

2. In the process of interacting with the database, the middleware team is responsible for

3. The core system part is actually just modifying its own business code and SQL statements to match

For example, if we want to split a library into 5 areas, each area has 10 libraries, then there are 50 libraries in total, how can it be excessive?

 

<1>, first of all, the core system needs to switch the configuration. First, the files of 50 libraries must be configured first, but the path points to the same library, which is in transition

<2>, at the time agreed with the middleware team, the middleware of the middleware team will route each library

<3>, the actual switch is not to switch to 50 libraries at the same time, but to cut one by one, each about 5 minutes, in the process of cutting the library, only the query function is allowed to appear, and all operations that write to the library are stopped.

 

=============================================Separation line===== ===============================================

The above plan is actually still in the process of optimization, but it is already going in this direction. This project is huge, calculated over the years, and cannot be completed in one or two months. After a certain step, the next step may be All need to be optimized and improved.

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325293864&siteId=291194637