Four sub-library sub-table interview the chain gun problem! I will not be miserable!

 


 

 

Several interview on sub-library sub-table interview, we often encounter the chain gun! Give us today introduced one by one! When we hope to be able to help interview!

Why sub-library sub-table?

What used sub-library sub-table middleware?

Different sub-library sub-table middleware has what advantages and disadvantages?

You exactly how the database how to split vertically or horizontally split?

First, face questions

Why should the sub-library sub-table ( designed highly concurrent systems, the database design level, how)? What used sub-library sub-table middleware? Different sub-library sub-table middleware has what advantages and disadvantages? You exactly how the database how to split vertically or horizontally split?

Second, the interviewer psychological analysis

In fact, this is definitely wander high concurrency, because the sub-library sub-table must be in order to support high concurrency, large amount of data to both questions. And now to tell the truth, especially Internet companies like interviews, so what will basically be divided library sub-table so common technical questions, do not ask it is not, but if you do not know that it is also justified!

Third, the surface profiling questions

3.1 Why sub-library sub-table? (High concurrent design the system, how to design the database level?)

To put it plainly, sub-library sub-table are two different things children, we can not be confused, regardless of sub-library may be light table, it may be light points table, regardless of the library, are likely.

I would let you throw out a scene.

If we are a small start-up companies (or a BAT company had just begun a new department), now 200,000 registered users, active users every day to 10,000, the amount of data on a single table 1000 every day, then the peak per second Up to 10 concurrent requests. Days, such a system, just to find a few years of work experience, and with just a few out of the training, just dry as can be.

The results did not think we could be so much good luck, run into a CEO took us down the broad road, business is developing rapidly, and after a few months, the number of registered users has reached 20 million! Number one million active users every day! Single table daily amount of data 100,000! The maximum peak of up to 1000 requests per second! The company also slide down two rounds of financing, fetched several hundred million yuan ah! The company reached a staggering valuation of hundreds of millions of dollars! This is the rhythm of small unicorn!

Well, all right, now I have been feeling the pressure a bit big, so why then? Because more than 100,000 data every day, a month more than 3 million data, and now we have millions of single table data, immediately breaking tens of millions. But barely able insisted. The peak of the request is now 1000, it deployed a line of several machines, load balancing out a bit, database support 1000QPS also okay. But we are now starting to feel a little worried, then zezheng it ......

The next few months, my God, CEO too Niubi, the company has reached 100 million users, the company continues to finance billions of yuan ah! Valuation of the company reached a staggering billions of dollars, has become the star of this year's Best of domestic start-ups! Days, we are too lucky.

But we are also, unfortunately, because the number of active users on a daily basis millions, every day new single-table data as many as 500,000 at present a table of the total amount of data have reached the two million! I could not carry ah! Database disk capacity continues to consume! Concurrent staggering peak of 5000 to 8000! No kidding, brother. I assure you that your system can not support now, it has been hung up!

Okay, so you see here is almost sub-library sub-table to understand how it is children, and in fact this is to follow your company's business development to go, the better, the more users you business development, the amount of data large, the greater the volume of requests, then you certainly could not carry a single database.

Points table

For example, you have tens of millions of single table data, you sure you can Kang Zhu it? Absolutely not, single-table data is too big, it will greatly affect the performance of your sql execution, to the back of your sql might run very slow. In general, with regard to my experience, single table to several million times, performance is relatively poor, and you score a table.

Sub-table What do you mean? It is a table of data into multiple tables, and then query when you check a table. Such sub-table according to user id, to a user in a data table. Then, when the operation of the operation that you would like a table for a user. This can control the amount of data in each table in the controllable range, such as within each table is fixed 2,000,000.

Sub-libraries

Sub-library What do you mean? Your library is a general our experience, to support up to 2000 concurrent, must be the expansion, and a healthy single database concurrency value you best kept at about 1000 per second, not too much. Then you can split the data into a database of multiple libraries, the time of the visit to visit a library better.

This is called a sub-library sub-table, why should the sub-library sub-table? You get the idea.


 

3.2, which used the sub-library sub-table middleware? Different sub-library sub-table middleware has what advantages and disadvantages?

This is actually a sub-library sub-table to see which of you understand middleware, middleware individual strengths and weaknesses is what? Then what middleware sub-library sub-table you used.

The more common include:

cobar

TDDL

atlas

sharding-jdbc

mycat

cobar

Ali b2b team development and open source, belonging proxy layer program is between the application server and database server. JDBC driver to access the application through cobar cluster, cobar points based on SQL and library rules do break down for SQL, and then distributed to different MySQL Cluster database instance execution. You can also use the early years, but have not been updated in recent years, people with basically nothing, almost regarded as abandoned status bar. And does not support separate read and write, stored procedures, and cross-database join operations such as paging.

TDDL

Taobao team development, belongs to the client layer scheme. It supports basic grammar and crud separate read and write, but not join, multi-table query syntax. Currently not much use, because also on the Taobao diamond configuration management systems.

atlas

360 open source, belonging proxy layer program, some companies previously in use, but does have a big problem is the latest community to maintain in five years ago. So now the company with a few basic.

sharding-jdbc

Dangdang open source program belonging client layer. Indeed before use is still relatively more, because SQL syntax support will be more, not too restrictive, and the current version 2.0 launched to support the sub-library sub-table, separate read and write, id distributed generation, flexible Affairs (best to send type of transaction, TCC affairs). And the company did previously used will be some more (in the official website of the company registration to use, can be seen from 2017 until now, many companies are in use), the current community also has also been developed and maintained, is still relatively active, personally I think that now can be regarded as a choice program.

mycat

基于 cobar 改造的,属于 proxy 层方案,支持的功能非常完善,而且目前应该是非常火的而且不断流行的数据库中间件,社区很活跃,也有一些公司开始在用了。但是确实相比于 sharding jdbc 来说,年轻一些,经历的锤炼少一些。

总结

综上,现在其实建议考量的,就是 sharding-jdbc 和 mycat,这两个都可以去考虑使用。

sharding-jdbc 这种 client 层方案的优点在于不用部署,运维成本低,不需要代理层的二次转发请求,性能很高,但是如果遇到升级啥的需要各个系统都重新升级版本再发布,各个系统都需要耦合sharding-jdbc 的依赖;

mycat 这种 proxy 层方案的缺点在于需要部署,自己运维一套中间件,运维成本高,但是好处在于对于各个项目是透明的,如果遇到升级之类的都是自己中间件那里搞就行了。

通常来说,这两个方案其实都可以选用,但是我个人建议中小型公司选用 sharding-jdbc,client 层方案轻便,而且维护成本低,不需要额外增派人手,而且中小型公司系统复杂度会低一些,项目也没那么多;但是中大型公司最好还是选用 mycat 这类 proxy 层方案,因为可能大公司系统和项目非常多,团队很大,人员充足,那么最好是专门弄个人来研究和维护 mycat,然后大量项目直接透明使用即可。

3.3、你们具体是如何对数据库如何进行垂直拆分或水平拆分的?

水平拆分的意思,就是把一个表的数据给弄到多个库的多个表里去,但是每个库的表结构都一样,只不过每个库表放的数据是不同的,所有库表的数据加起来就是全部数据。水平拆分的意义,就是将数据均匀放更多的库里,然后用多个库来扛更高的并发,还有就是用多个库的存储容量来进行扩容。


 

垂直拆分的意思,就是把一个有很多字段的表给拆分成多个表,或者是多个库上去。每个库表的结构都不一样,每个库表都包含部分字段。一般来说,会将较少的访问频率很高的字段放到一个表里去,然后将较多的访问频率很低的字段放到另外一个表里去。因为数据库是有缓存的,你访问频率高的行字段越少,就可以在缓存里缓存更多的行,性能就越好。这个一般在表层面做的较多一些。


 

这个其实挺常见的,不一定我说,大家很多同学可能自己都做过,把一个大表拆开,订单表、订单支付表、订单商品表。

还有表层面的拆分,就是分表,将一个表变成 N 个表,就是让每个表的数据量控制在一定范围内,保证 SQL 的性能。否则单表数据量越大,SQL 性能就越差。一般是 200 万行左右,不要太多,但是也得看具体你怎么操作,也可能是 500 万,或者是 100 万。你的SQL越复杂,就最好让单表行数越少。

好了,无论分库还是分表,上面说的那些数据库中间件都是可以支持的。就是基本上那些中间件可以做到你分库分表之后,中间件可以根据你指定的某个字段值,比如说 userid,自动路由到对应的库上去,然后再自动路由到对应的表里去。

你就得考虑一下,你的项目里该如何分库分表?一般来说,垂直拆分,你可以在表层面来做,对一些字段特别多的表做一下拆分;水平拆分,你可以说是并发承载不了,或者是数据量太大,容量承载不了,你给拆了,按什么字段来拆,你自己想好;分表,你考虑一下,你如果哪怕是拆到每个库里去,并发和容量都ok了,但是每个库的表还是太大了,那么你就分表,将这个表分开,保证每个表的数据量并不是很大。

而且这儿还有两种分库分表的方式:

一种是按照 range 来分,就是每个库一段连续的数据,这个一般是按比如时间范围来的,但是这种一般较少用,因为很容易产生热点问题,大量的流量都打在最新的数据上了。

或者是按照某个字段 hash 一下均匀分散,这个较为常用。

range 来分,好处在于说,扩容的时候很简单,因为你只要预备好,给每个月都准备一个库就可以了,到了一个新的月份的时候,自然而然,就会写新的库了;缺点,但是大部分的请求,都是访问最新的数据。实际生产用 range,要看场景。

hash 分发,好处在于说,可以平均分配每个库的数据量和请求压力;坏处在于说扩容起来比较麻烦,会有一个数据迁移的过程,之前的数据需要重新计算 hash 值重新分配到不同的库或表。

扩展阅读

面试官:分库分表之后,id 主键如何处理?

MySQL 分库分表方案总结

面试官:一个 TCP 连接可以发多少个 HTTP 请求?

从面试题看问题之JVM和内存

如何看出一个程序员的技术能力和水平?

作者:Yang Libin

来源:https://github.com/doocs/advanced-java

Guess you like

Origin www.cnblogs.com/javafirst0/p/11388768.html