"Sub-library sub-table"? Selection process and be careful, otherwise they will be out of control [Program guide]

Congratulations, your company has finally grown to a certain size, high availability need to be considered, even a sub-library sub-table. But if you know which sub-library sub-table elements need? Resolution process is complex, plan ahead, do not wait for the real start, all kinds of unexpected work followed, as well as out of control.

This article intention to open the breadth of database middleware, regardless of the depth achieved, as the database table vertical and horizontal division of the concept and reason, do not do too much explanation. So this article is intended for certain research and development experience, we are looking for process selection and split professionals.

Cut levels

Hereinafter, the scoping JAVAand MySQLthe. We first look at the sub-library sub-table cut level.

① coding layer

Creating a plurality of data sources in the same project, the use if elsemode, direct routing code according to the conditions. Spring abstract classes in the dynamic switching data sources, see in particular
AbstractRoutingDataSource.

If the project is not very large, using this method to quickly divide library. But the disadvantage is obvious, you need to write a lot of code, take care of each branch. When it comes to cross-database query, aggregation, and merges the results need to loop the scene, a huge amount of work.

If the project fission, most of such code is not shared by most of the shared copy. Over time, the code will not yards.

② framework layer

This situation for the company ORM框架unified situation, but in many cases is not realistic. Mainly modify or enhance existing ORM框架functionality in the SQLadd some custom primitives or hintto achieve.

By implementing interceptors (such as Mybatisin Interceptorcontrolling the flow of data interfaces), add custom resolver, although the effect is better, but will change some existing programming experience.

In many cases to modify the source code framework, it is not recommended.

③ driver layer

Based on the shortcomings coding layer and the frame layer of the cut, at least from a real database middleware layer starts to drive. What does that mean? In fact, a rewrite JDBCof the drive, maintain a routing list in memory, and then forwards the request to the real database connection.

Like TDDL, ShardingJDBCetc., are cut in this layer.

Comprising Mysql Connector / J of Failover protocol
(specifically "load balancing", "replication" , "farbic" , etc.),
but also be modified directly on the drive.

Such a request is typically flows:

④ agent layer

Database middleware proxy layer, disguised himself as a database, to accept the business end of the link. Then load the requested service terminal, or forward to resolve the real database.

Like MySQL Router, MyCatetc., are cut in this layer.

Such a request is typically flows:

⑤ implementation layer

A special version of SQL support, such as Mysql clusterin support for various features, mariadb galera clustersupport for other double main, Greenplumsupport fragmentation and the like.

Storage needs change, the general is the solution, not a discussion list.

Technology will eventually converge, choose any, are feasible. But the impact of the final selection, by the developer familiarity, community activity, the company meet degree, degree official maintenance, scalability, as well as the company's existing database products such as multi-faceted factors. Select or develop a suitable, small partners will be a lot of happiness.

Contrast agent layer and the driving layer

Through the above-level description, it is clear that we select or develop middleware, concentrated in the driver layer and agent layers. The two layers can be made greater control and a more careful management of database connections and routes. But the difference between them is obvious.

Driver Layer Features

Only supports JAVA, support rich DB

Driven middleware layer supports only Java development language, it supports all the back-end relational databases. If your development language fixed, rich back-end data source type, it is recommended to use this program.

Occupy more database connection

Driven middleware layer to maintain a lot of database connection. For example, a 10 sub- library table, each of the java ConnectionTo maintain the database 10 is connected. If the project is too much, the connection explosion occurs (we count, if the total number of connections for each project six instances, the connection pool is equal to 5,3 minIdle projects is 10 * 6 * 5 * 3 = 900 months). Like Postgresthis one process per connection database, the pressure will be great.

In performing data aggregation service instance

Data aggregation, for example count sum, is by multiple queries, and then polymerized in the memory service instance.

Present in the service route table memory square instance, by a route to update the routing table polling or passive notification.

Centralized Management

Configuration management of all clusters are concentrated in one place, a small operation and maintenance burden, DBA to complete the related operations.

A typical implementation

Agent Layer Features

Heterogeneous support, DB support limited

Acting opposite the middleware layer. It supports only one back-end relational database, but supports a variety of development languages. If your system is heterogeneous, and has the same SLA requirement, it is recommended to use this program.

Operation and maintenance burden of large

A database proxy layer need to maintain a limited number of connections ( MySQL Routerthat adhesive attachment excluded). But as a stand-alone service, it is necessary to consider separately the deployment, but also consider availability, will increase the number of additional nodes, not to mention the company with the shadow of a node.
Further, the proxy layer is a request for a unique entry, high stability requirements, if there is a high consumption of aggregate queries the memory node out collapse, disastrous accident.

A typical implementation

Common

Space is limited, do not do too much discussion. Each access middleware propaganda page, you can see a long list of Feature, which is the white list; we could see a long list of restrictions, which is blacklisted. Defines how you play in the enhanced distributed capabilities, sub-library sub-table itself is a castrated database.

Limitations

Ensure that the data equalization split database data as uniform as possible, such as user library uneven divided by province, will press userid modulo relatively uniform
without deep paging all paging without prior deep segmentation key, will remove all pages taken library data sorting calculation in memory. Likely to cause memory overflow.
Reduce subquery subquery can cause disorder SQL parsing, parsing errors, minimize SQL subqueries.
The principle of minimum transaction to minimize transaction involving single library area, i.e., to reduce as much as possible Kwaku operation, the same operation library / min with table
data balancing principle data split database as uniform as possible, such as user library unevenly divided by province, press userid modulus will be relatively uniform
special functions distinct, having, union, in, or the like, is generally not supported. Or supported, after use can increase the risk, in need of rehabilitation.

product

Recommendations focused on MyCatand ShardingJDBCon. In addition, there are numerous other middleware, unfamiliar recommended not frivolous.
Database middleware is not good maintenance, you'll find plenty of half-dead project.

The following list, in alphabetical order, are only a few HA function, there is no split function:

Atlas、Kingshard、DBProxy、mysql router、MaxScale、58 Oceanus、ArkProxy、Ctrip DAL、Tsharding、Youtube vitess、网易DDB、Heisenberg、proxysql、Mango、DDAL、Datahekr、MTAtlas、MTDDL、Zebra、Cobar、Cobar

Khan, almost every maker has its own database middleware (also found several open source components plus companies like to take a prefix as a product), but we do not give with nothing.

Process Solutions

No matter what level is the use of cut is divided library sub-table, we are faced with the following working process.

collect message

Statistical impact of operations and projects

The larger the scope of the project, the higher the difficulty of sub-libraries. Sometimes, a complex SQL can involve four or five business side, this is the need to focus on SQL.

Determine the size of sub-library sub-table, which is only a few points table, or all involved. The more points, the greater the amount of work, almost linear.

Some projects are led by a launch body. For example, the following procedure will not only link the impact of the sub-library so simple.

Determine participants

In addition to technical support personnel sub-library sub-table components, most should be involved in that system, the existing code is most familiar with a few people. Only they can determine what the discarded SQL, SQL and other side effects.

Determine the sub-library sub-table strategy

Determining sub-library sub-table and the segmentation key dimensions. Segmentation key (that is, column routing data) is established, can not be modified, so the early architecture design, should first be set down in order to carry out follow-up work; multi-dimensional data means that there are different segmentation key to achieve different conditions query results. This involves data redundancy (write, data synchronization), it will be more complicated.

Preparatory

Structured data

Library table structure does not meet the requirements, needs regular advance. For example, the field names of the different key segmentation or different types. In implementing the sub-library sub-table strategy, these personalized strategy will result in too much bad maintenance.

Scan all SQL

All SQL project scanned, whether individually determined according to normal operation segmentation key.
In the judgment process will certainly be a large number of non-compliant SQL, you will need to give reform program, which is one of the major workload.

Verification Tool Support

And make changes directly in the original validation project is feasible, but will encounter many problems, mainly low efficiency. First, I tend to design some validation tools to validate SQL input or a list, and then print the results to determine routing information.

Technical preparations

Various suggestions mentioned below, are looking for an example of the experience, and then estimated the difficulty according to their team.

The following:
the middleware does not support all types of SQL
finishing likely to cause the collapse of the Notes
does not support SQL processing gives way
to consider a common primary key generator
of how segmentation is not considered key SQL processing
Regardless of the timing tasks such as how to sweep the whole library traversal
how to consider cross-database queries across tables transformation
prepare some tools

Implementation phase

data migration

Sub-library sub-table will be re-influence the distribution of data, whether full or incremental amount, will be involved in data migration, so Databus is necessary.

An ideal state is that all messages are additions and deletions can be double written by subscribing MQ.

In general, however, still need to simulate this state, such as the use of Canal components.

How to ensure data security switch, we divided the other chapters are discussed.

Adequate test

Sub-library sub-table must be adequate test, each sentence SQL must go through rigorous verification. If you have unit tests or automated testing tools, complete coverage is necessary. Once the data have been misrouted, especially the additions and deletions, will create a lot of trouble.

During the test phase, the verification process is output to a separate log file, if the log file review adequacy test erroneous data flow.

SQL reinspection

Unified SQL is strongly recommended to conduct a re-inspection. Mainly based on the functional description to determine the correctness of SQL, which is often said that the review.

drill

Drill several times on the program in a non-line environment, sure.

Development of new SQL specification

After the sub-library sub-table, the project will increase the shackles of SQL can not be free to write up. Many ordinary Supported operating in split environment might not run. So before the line, SQL involved should have a confirmation process, even though they have passed enough tests.

Digression

Do not live without the support of then, be accomplished.

Sub-library sub-table is a strategic technology programs, in many cases can not be rolled back or rolled back program complex. If you want to split the database tables involving multiple business parties, company technicians complex, CTO person in command to coordinate, and a professional architect to supervise carefully. Unauthorized coordinator will fall into an awkward position, resulting in uncontrolled flow of project dystocia.

Really experienced people, will know that it's painful!

Guess you like

Origin blog.csdn.net/lycyingO/article/details/95164935