High concurrency --- sub-library sub-table

  The following text is derived from a Github article, is hardly original.

Why sub-library sub-table? (High concurrent design the system, how to design the database level?)

  To put it plainly, sub-library sub-table are two different things children, we can not be confused, regardless of sub-library may be light table, it may be light points table, regardless of the library, are likely.

  I would let you throw out a scene.

  If we are a small start-up companies (or a BAT company had just begun a new department), now 200,000 registered users, active users every day to 10,000, the amount of data on a single table 1000 every day, then the peak per second Up to 10 concurrent requests. Days, such a system, just to find a few years of work experience, and with just a few out of the training, just dry as can be.

  The results did not think we could be so much good luck, run into a CEO took us down the broad road, business is developing rapidly, and after a few months, the number of registered users has reached 20 million! Number one million active users every day! Single table daily amount of data 100,000! The maximum peak of up to 1000 requests per second! The company also slide down two rounds of financing, fetched several hundred million yuan ah! The company reached a staggering valuation of hundreds of millions of dollars! This is the rhythm of small unicorn!

  Well, all right, now I have been feeling the pressure a bit big, so why then? Because more than 100,000 data every day, a month more than 3 million data, and now we have millions of single table data, immediately breaking tens of millions. But barely able insisted. The peak of the request is now 1000, it deployed a line of several machines, load balancing out a bit, database support 1000QPS also okay. But we are now starting to feel a little worried, then zezheng it ......

  The next few months, my God, CEO too Niubi, the company has reached 100 million users, the company continues to finance billions of yuan ah! Valuation of the company reached a staggering billions of dollars, has become the star of this year's Best of domestic start-ups! Days, we are too lucky.

  But we are also, unfortunately, because the number of active users on a daily basis millions, every day new single-table data as many as 500,000 at present a table of the total amount of data have reached the two million! I could not carry ah! Database disk capacity continues to consume! The peak of concurrent staggering  5000~8000! No kidding, brother. I assure you that your system can not support now, it has been hung up!

  Okay, so you see here is almost sub-library sub-table to understand how it is children, and in fact this is to follow your company's business development to go, the better, the more users you business development, the amount of data large, the greater the volume of requests, then you certainly could not carry a single database.

Points table

  For example, you have tens of millions of single table data, you sure you can Kang Zhu it? Absolutely not, single-table data is too big, it will greatly affect the performance of your sql execution, to the back of your sql might run very slow. In general, with regard to my experience, single table to several million times, performance is relatively poor, and you score a table.

  Sub-table What do you mean? It is a table of data into multiple tables, and then query when you check a table. Such sub-table according to user id, to a user in a data table. Then, when the operation of the operation that you would like a table for a user. This can control the amount of data in each table in the controllable range, such as within each table is fixed 2,000,000.

Sub-libraries

  Sub-library What do you mean? Your library is a general our experience, to support up to 2000 concurrent, must be the expansion, and a healthy single database concurrency value you best kept at about 1000 per second, not too much. Then you can split the data into a database of multiple libraries, the time of the visit to visit a library better.

  This is called a sub-library sub-table, why should the sub-library sub-table? You get the idea.

# Former sub-library sub-table After the sub-library sub-table
Concurrent support case MySQL standalone deployment, could not carry high concurrency MySQL from single to multi-machine, able to withstand multiple concurrent increases
Disk Usage MySQL support single disk capacity is almost full Split into multiple libraries, database servers, disk usage greatly reduced
SQL execution performance Single table too much data, SQL more run more slowly Single table reduced amount of data, SQL efficiency improved significantly

What used sub-library sub-table middleware? Different sub-library sub-table middleware has what advantages and disadvantages?

  In fact, this is to look at what middleware sub-library sub-table you understand the advantages and disadvantages of each middleware is what? Then what middleware sub-library sub-table you used.

The more common include:

  • cobar
  • TDDL
  • atlas
  • sharding-jdbc
  • mycat

cobar

  Ali b2b team development and open source, belonging proxy layer program is between the application server and database server. JDBC driver to access the application through cobar cluster, cobar points based on SQL and library rules do break down for SQL, and then distributed to different MySQL Cluster database instance execution. You can also use the early years, but have not been updated in recent years, people with basically nothing, almost regarded as abandoned status bar. And does not support separate read and write, stored procedures, and cross-database join operations such as paging.

TDDL

  Taobao team development, belongs to the client layer scheme. It supports basic grammar and crud separate read and write, but not join, multi-table query syntax. Currently not much use, because also on the Taobao diamond configuration management systems.

atlas

  360 open source, belonging proxy layer program, some companies previously in use, but does have a big problem is the latest community to maintain in five years ago. So now the company with a few basic.

sharding-jdbc

  Dangdang open source program belonging client layer. Indeed before use is still relatively more, because SQL syntax support will be more, not too restrictive, and the current version 2.0 launched to support the sub-library sub-table, separate read and write, id distributed generation, flexible Affairs (best to send type of transaction, TCC affairs). And the company did previously used will be some more (in the official website of the company registration to use, can be seen from 2017 until now, many companies are in use), the current community also has also been developed and maintained, is still relatively active, personally I think that now can be regarded as a choice program.

mycat

  Based cobar transformation, belong proxy layer program support functions very well, and the current should be very fire and growing popular database middleware, the community is very active, there are some companies began to use the. But compared to sharding jdbc indeed, the younger, less experienced temper.

to sum up

  To sum up, in fact, recommended consideration now, is sharding-jdbc and mycat, both can use to consider.

  Sharding-jdbc advantage of this approach is that no client layer deployment, low operation and maintenance costs, secondary retransmission request does not require a proxy layer, high performance, but what if they need to upgrade the various systems are re-re-released an upgraded version, the system requires each coupling sharding-jdbc dependent;

  Mycat drawback of this approach is that proxy layer need to deploy their own set of middleware operation and maintenance, high operation and maintenance costs, but the benefits for each item that is transparent, if you encounter upgrade their own middleware and the like are out there on the line a.

  Generally speaking, these two programs can in fact be chosen, but I personally recommend small and medium sized companies to use sharding-jdbc, client layer scheme lightweight, and low maintenance costs, no additional increase manpower, but also the complexity of the system will be small and medium sized companies lower, the project did not so much; but the best choice mycat large companies such proxy layer program, because there may be a large corporate systems and projects is very large, a great team, adequately staffed, it is best to get a personal dedicated to research and maintenance mycat, then you can use a large number of projects directly transparent.

You exactly how the database how to split vertically or horizontally split?

  Split level meaning, is to get data from one table to multiple tables multiple libraries to go, but the table structure of each library are the same, but each library table to put the data is different, all libraries data tables add up all the data. Meaning split level, it is to put even more data library, and then use multiple libraries to carry higher concurrency, there is a library with more storage capacity to carry out the expansion.

database-split-horizon

  A vertical split means, is to have a lot of table field to be split into multiple tables, or a plurality of up library. The structure of each database table is different, each library table contains some of the fields. In general, the fewer will visit a high frequency field into a table to go, then more low access frequency field into another table to go. Because the database is cached, high frequency line field you access the less, it can cache more lines, the better the performance in the cache. This is generally done more some of the tables level.

database-split-vertically

  This is actually quite common, I do not necessarily say that we have done a lot of students may have their own, to open a large table, the Orders table, order a pay table, order merchandise table.

  There are split level table is sub-table, the table into a table of N, so that the amount of data in each table is controlled within a certain range to ensure performance of SQL. Otherwise, the greater the amount of single-table data, SQL performance worse. Usually about 2 million lines, not too much, but you also have to look at how specific operations, it could be 5 million, or 1 million. The more complex your SQL, it is best to let the fewer number of rows in a single table.

  Well, either the library or sub-sub-table, said above all those database middleware can support. Middleware is basically that you can do after sub-sub-table database, middleware based on a field value you specify, for example, userid, automatically routed to the corresponding library up, then automatically routed to the corresponding table to go .

  You have to think about how your project in the sub-library sub-table? In general, split vertically, you can do at the table level, particularly for some of the fields in the table to do some split; split level, you can say is complicated by not carry, or too much data, not bearing capacity you to the demolition, in what field to split, you want good; points table, you think about it, even if you split each library to go, and concurrent capacity are ok, but the table each library or too, then you score sheet to separate the table to ensure that the amount of data in each table is not great.

And here there are two sub-library sub-table manner:

  • One is in accordance with the range of points, each library is a contiguous data, such as by the general timeframe, but this is generally less, because it is prone to hot issues, they are playing a lot of traffic in the latest data on.
  • According to a field or hash look uniformly dispersed, the more commonly used.

  range of points, advantage is that when the expansion is very simple, because as long as you are ready, to prepare a library every month on it, to a new month, when, naturally, will write a new library; shortcomings, but most requests are access to the latest data. Actual production range, depends on the scene.

  hash distribution, benefit is that you can assign an average amount of data per library and request pressure; the downside is that said expansion is too much trouble, there is a data migration process, before the data need to be recalculated hash value is reassigned to a different library or table.

Guess you like

Origin www.cnblogs.com/peterxiao/p/10944180.html