36 Why did you interview series library sub-table

(1) Why did you divide the library table? (High concurrent design the system, how to design the database level?)

 

To put it plainly, sub-library sub-table are two different things children, we can not be confused, regardless of sub-library may be light table, it may be light points table, regardless of the library, are likely. I would let you throw out a scene.

 

If we are a small start-up companies (or a BAT company had just begun a new department), now 200,000 registered users, active users every day to 10,000, the amount of data on a single table 1000 every day, then the peak per second Up to 10 concurrent requests. . . Days, such a system, just to find a few years of work experience, and with just a few out of the training, just dry as can be.

 

The results did not think we could be so much good luck, run into a CEO took us down the broad road, business is developing rapidly, and after a few months, the number of registered users has reached 20 million! Number one million active users every day! Single table daily amount of data 100,000! The maximum peak of up to 1000 requests per second! The company also slide down two rounds of financing, tense several hundred million yuan ah! The company reached a staggering valuation of hundreds of millions of dollars! This is the rhythm of small unicorn!

 

Well, all right, now I have been feeling the pressure a bit big, so why then? Because more than 100,000 data every day, a month more than 3 million data, and now we have millions of single table data, immediately breaking tens of millions. But barely able insisted. The peak of the request is now 1000, we deployed a line of several machines, load balancing out a bit, database support 1000 QPS is also okay. But we are now starting to feel a little worried, then Zezheng it. . . . . .

 

The next few months, my God, CEO too Niubi, the company has reached 100 million users, the company continues to finance billions of yuan ah! Valuation of the company reached a staggering billions of dollars, has become the star of this year's Best of domestic start-ups! Days, we are too lucky.

 

But we are also, unfortunately, because the number of active users on a daily basis millions, every day new single-table data as many as 500,000 at present a table of the total amount of data have reached the two million! I could not carry ah! Database disk capacity continues to consume! Concurrent staggering peak of 5000 to 8000! No kidding, brother. I assure you that your system can not support now, it has been hung up!

 

Okay, so you see here you will understand almost sub-library sub-table is how is it children, in fact, this is to follow your company's business development go, the better your company's business development, the more users, the amount of data The larger, the greater the amount requested, then you certainly could not carry a single database.

 

For example, you have tens of millions of single table data, you sure you withstood it? Absolutely not, single-table data is too big, it will greatly affect the performance of your sql execution, to the back of your sql might run very slow. In general, with regard to my experience, single table to several million times, performance is relatively poor, and you score a table.

 

Sub-table What do you mean? It is a table of data into multiple tables, and then query when you check a table. Such sub-table according to user id, to a user in a data table. Then, when the operation of the operation that you would like a table for a user. This can control the amount of data in each table in the controllable range, such as within each table is fixed 2,000,000.

 

Sub-library What do you mean? Your library is a general our experience, to support up to 2000 concurrent, must be the expansion, and a healthy single database concurrency value you best kept at about 1000 per second, not too much. Then you can split the data into a database of multiple libraries, the time of the visit to visit a library better.

 

This is called a sub-library sub-table, why should the sub-library sub-table? You get the point

 

(2) sub-library which used the points table middleware? Different sub-library sub-table middleware has what advantages and disadvantages?

 

In fact, this is to look at what middleware sub-library sub-table you understand the advantages and disadvantages of each middleware is what? Then what middleware sub-library sub-table you used.

 

The more common include: cobar, TDDL, atlas, sharding-jdbc, mycat

 

cobar: Ali b2b team development and open source, belonging proxy layer program. You can also use the early years, but have not been updated in recent years, people with basically nothing, almost regarded as abandoned status bar. And does not support separate read and write, stored procedures, and cross-database join operations such as paging.

 

TDDL: Taobao team development, belongs to the client layer scheme. Does not support join, multi-table query syntax is the basic syntax of crud is ok, but supports read and write separation. Currently not much use, because also on the Taobao diamond configuration management systems.

 

atlas: 360 open source, belonging proxy layer program, some companies previously in use, but does have a big problem is the latest community to maintain in five years ago. So now the company with a few basic.

 

sharding-jdbc: Dangdang open source program belonging client layer. Indeed before use is still relatively more, because SQL syntax support will be more, not too restrictive, and the current version 2.0 launched to support the sub-library sub-table, separate read and write, id distributed generation, flexible Affairs (best to send type of transaction, TCC affairs). And the company did previously used will be some more (in the official website of the company registration to use, can be seen from 2017 until now, many companies are in use), the current community also has been developing and maintenance, but also be more active, personally I think that now can be regarded as a choice program.

 

mycat: Based cobar transformation, belong proxy layer program support functions very well, and the current should be very fire and growing popular database middleware, the community is very active, there are some companies began to use the. But compared to sharding jdbc indeed, the younger, less experienced temper.

 

In summary therefore, recommended that consideration now in fact, is sharding-jdbc and mycat, both can use to consider.

 

Sharding-jdbc advantage of this approach is that no client layer deployment, low operation and maintenance costs, secondary retransmission request does not require a proxy layer, high performance, but what if they need to upgrade the various systems are re-re-released an upgraded version, the system requires each coupling sharding-jdbc dependent;

 

Mycat drawback of this approach is that proxy layer need to deploy their own set of middleware and operation and maintenance, high operation and maintenance costs, but the benefits for each item that is transparent, if you encounter upgrade their own middleware and the like are out there on the line.

 

Generally speaking, these two programs can in fact be chosen, but I personally recommend small and medium sized companies to use sharding-jdbc, client layer scheme lightweight, and low maintenance costs, no additional increase manpower, but also the complexity of the system will be small and medium sized companies lower, not so much the project;

 

But the best selection of mycat large companies such proxy layer program, because there may be a large corporate systems and projects is very large, a great team, adequately staffed, it is best to get a personal dedicated to research and maintain mycat, then a large number of projects directly transparent you can use.

 

We, database middleware is self-development, but also used the proxy layer, and later used the client layer

 

(3) you exactly how the database how to split vertically or horizontally split?

 

Split level meaning, is to get data from one table to multiple tables multiple libraries to go, but the table structure of each library are the same, but each library table to put the data is different, all libraries data tables add up all the data. Meaning split level, it is to put even more data library, and then use multiple libraries to resist higher concurrency, there is a library with more storage capacity to carry out the expansion.

 

A vertical split means, is to have a lot of table field to be split into multiple tables, or a plurality of up library. The structure of each database table is different, each library table contains some of the fields. In general, the fewer will visit a high frequency field into a table to go, then more low access frequency field into another table to go. Because the database is cached, high frequency line field you access the less, it can cache more lines, the better the performance in the cache. This is generally done more some of the tables level.

 

This is actually quite common, I do not necessarily say that we have done a lot of students may have their own, to open a large table, the Orders table, order a pay table, order merchandise table.

 

There are split level table is sub-table, the table into a table of N, so that the amount of data in each table is controlled within a certain range to ensure performance of SQL. Otherwise, the greater the amount of single-table data, SQL performance worse. Usually about 2 million lines, not too much, but you also have to look at how specific operations, it could be 5 million, or 1 million. The more complex your SQL, it is best to let the fewer number of rows in a single table.

 

Well, whether it is a library or sub-sub-table, we say above all those database middleware can support. Middleware is basically that you can do after sub-sub-table database, middleware based on a field value you specify, for example, userid, automatically routed to the corresponding library up, then automatically routed to the corresponding table to go .

 

You have to think about how your project in the sub-library sub-table? In general, split vertically, you can do at the table level, particularly for some of the fields in the table to do some split; split level, you can say is complicated by not carry, or too much data, not bearing capacity you to the demolition, in what field to split, you want good; points table, you think about it, even if you split each library to go, and concurrent capacity are ok, but the table each library or too, then you score sheet to separate the table to ensure that the amount of data in each table is not great.

 

But also the embodiment here two kinds of sub-library sub-table, a method in accordance with the range of points, each library is a contiguous data, for example by the general range of time, but this is less generally used, because it is prone to hot issues, they are playing a lot of traffic on the most recent data; or hash look uniformly dispersed in accordance with a field that is more commonly used.

 

range of points, advantage is that the expansion of the back when it is easy, as long as you are ready, to prepare a library every month on it, to a new month, when, naturally, will write a new a library; shortcomings, but most requests are access to the latest data. Actual production range, look at the scene, not just your users access to the latest data, but even now access data and historical data

 

hash points system, the benefits that that amount of data that can be distributed equally to the library and did not request the pressure; the downside is that said expansion is too much trouble, there will be a data migration of such a process

 

Guess you like

Origin www.cnblogs.com/xiufengchen/p/11259293.html