[Reprint] Thoughts on the database middleware

Author: kimmking
link: https: //www.zhihu.com/question/352256403/answer/878523206
Source: know almost
copyrighted by the author. Commercial reprint please contact the author authorized, non-commercial reprint please indicate the source.

Main title, congratulations, you think deeply a number of important issues, as an open source enthusiast technician + data middleware, try to answer a few questions about the topic of the Lord:

  1. When do I need data middleware, the middleware can do
  2. The principle of data middleware, which open source middleware data
  3. Why are domestic open source, and mostly stopped updating
  4. What data is recommended to use middleware, what are the advantages

First, when you need data middleware, the middleware can do

Like the title Lord said, as the business development, MySQL, Oracle database table growing, after a year or two, 2 millions or even billions of records of the table will appear (generally considered to be simple table more complicated when, MySQL millions of tens of millions of times, Oracle tens of millions of times, there will be a complex query or change performance problems), this time may result in complex queries slow, slow insert and modify data , DDL execution is too slow leads to modify the table column type can not be modified or added to the index or add fields, and so on. How to do it? This time we made several approaches:

  • History Table: split time history table out, reduce the amount of data, the previously more common, in fact, is a special kind of split levels, the business intrusive
  • Vertical Split: columns, split into a wide table columns few hundreds plurality of columns (i.e. smaller amount of data per record) of the table, without reducing the number of records, but reduce the data amount of the entire table and index size
  • Split levels: by one or some of the hash values ​​of the columns, to evenly split the data library or a plurality of the same table, thus directly reduce the amount of data in one database single table, such as split 1024 is divided into sub-tables, it can reduce the amount of data in a single table three orders of magnitude, the original one hundred million table, now 100,000 single table data, do complex operations on a single table can be very fast up. The disadvantage is that the original only need to operate a table, you need to know to operate on that table now before the operation, such as a user table, table divided by uid: The original SQL1: select * from users where uid = 1025, and now have to know uid is 1025, then know 1025% 1024 == 1, SQL becomes SQL2: select * from users_0001 where uid = 1025, also looks at invasive type of business. How business can become transparent, which requires a middleware, to help us put SQL1 automatically become SQL2, making our points, regardless of the library, regardless of division table, divide the number of code are similar, not too much modification .
  • Separate read and write: such as MySQL's TPS / QPS are already high, and thousands of thousands, and has done 3 times from the main, we hope these four examples can share some of the pressure, especially in reading and writing less the case, if the reading of the pressure everyone equally, you can reduce the pressure of the main library to read, write and attention to get the main library. This time also requires a data request to middleware routed to different library.

If our business to the need to reduce the pressure in one database single table, or separate read and write, but not large R & D team, own this piece of technology accumulation is not enough to develop their own code to get some intermediate layer problem, as the main problem Like, we need to consider the introduction of a middleware data. Why are domestic large field of open source middleware data, the amount of data is not enough small companies, or technology is not enough, do not need to develop their own middleware, the amount of up later, if you use a simple scene, the use of open source technology is the most economical solution. Large companies have the ability to get data middleware, we now know that this is part of the open source inside out, especially in recent years, just as there is a primary answer to say, as we are engaged in a distributed database, distributed maximum capacity of the database is much larger than traditional relational database MySQL / Oracle, is to be considered part of the function middleware cure to the database, and these companies are less concerned about these issues. On the other hand, some data middleware, into the cloud system, become part of the RDS inside a closed-source.

Second, the realization of the principle of data middleware, which open source middleware data

Simply put, there are two principles:

  • JDBC client modes: as a middleware library jar package or the like, for example, a schematic figure, only directly referenced in the project, the sub-library sub-table configured rules, with the intermediate layer of the packaging JDBC data source, i.e., each time, the time of the call, JDBC wrapper classes automatically replaced in good SQL, and then call the actual JDBC and SQL, to complete the operation. However, since a single library and requires direct operator table, so there will be some limitations for SQL, must bring the determined sub-library sub-table conditions, the polymerization can not have too complicated operation.
  • Proxy Proxy mode:

Early mainstream open source middleware in the following figure:

Quoted from:

  1. Cobar: Alibaba B2B develop relational distributed systems, management of nearly 3000 MySQL instances. Ali withstood the test, the reasons behind off due cobar authors were not maintained, and Ali has also developed tddl alternative cobar.
  2. MyCAT:社区爱好者在阿里cobar基础上进行二次开发,解决了cobar当时存 在的一些问题,并且加入了许多新的功能在其中。目前MyCAT社区活 跃度很高,目前已经有一些公司在使用MyCAT。总体来说支持度比 较高,也会一直维护下去,
  3. OneProxy:数据库界大牛,前支付宝数据库团队领导楼总开发,基于mysql官方 的proxy思想利用c进行开发的,OneProxy是一款商业收费的中间件, 楼总舍去了一些功能点,专注在性能和稳定性上。有朋友测试过说在 高并发下很稳定。
  4. Vitess:这个中间件是Youtube生产在使用的,但是架构很复杂。 与以往中间件不同,使用Vitess应用改动比较大要 使用他提供语言的API接口,我们可以借鉴他其中的一些设计思想。
  5. Kingshard:Kingshard是前360Atlas中间件开发团队的陈菲利用业务时间 用go语言开发的,目前参与开发的人员有3个左右, 目前来看还不是成熟可以使用的产品,需要在不断完善。
  6. Atlas:360团队基于mysql proxy 把lua用C改写。原有版本是支持分表, 目前已经放出了分库分表版本。在网上看到一些朋友经常说在高并 发下会经常挂掉,如果大家要使用需要提前做好测试。
  7. MaxScale与MySQL Route:这两个中间件都算是官方的吧,MaxScale是mariadb (MySQL原作者维护的一个版本)研发的,目前版本不支持分库分表。MySQL Route是现在MySQL 官方Oracle公司发布出来的一个中间件。
  8. ShardingSphere,后起之秀,源于当当网架构部的ShardingJDBC框架。

上面都是提到了分布分表和读写分离的中间件,其实还有一些专注于分布式事务的、数据复制传输的等等,比如fescar,canal、outter等等。

其实淘宝早期开源了TDDL,淘宝分布式数据中间层,但是只开源了客户端jdbc模式,没有开源proxy代理模式。

三、为什么都是国内开源的,并且大都停止了更新

国内的开源,部分是大公司主导的技术影响力输出,部分是个人的兴趣之作贡献给社区,总而言之是没有直接的显著回报的。也就是说,这一块一直没有一个稳定可行的商业模式来支持,所以一直以来,大公司实际上也看不上,因为赚不了钱,而没有回报的事情就无法长久,所以自然就停止了更新。对于个别有云服务的公司,这一块技术发展好了,其实可以并到云里提供数据服务,或者进一步的发展成为分布式数据库,这样可以变现了,那就闭源,所以,现在活跃的开源数据中间件,已经不多了,下面就推荐一个活跃的项目。

四、推荐使用什么数据中间件--ShardingSphere

推荐使用近期加入Apache基金会的第一款数据中间件,也是国人开发的,ShardingSphere项目。可以直接在这个项目的github commits记录看到,非常活跃,每天都有提交记录,issue也一直在持续维护。为什么还活得这么好呢?因为有张亮团队的专职在开发、维护和推广。

详细文档和代码参见:

ShardingSphere​shardingsphere.apache.org图标 apache/incubator-shardingsphere​github.com图标

ShardingSphere是一套开源的分布式数据库中间件解决方案组成的生态圈,它由Sharding-JDBC、Sharding-Proxy和Sharding-Sidecar(计划中)这3款相互独立的产品组成。 他们均提供标准化的数据分片、分布式事务和数据库治理功能,可适用于如Java同构、异构语言、云原生等各种多样化的应用场景。

ShardingSphere定位为关系型数据库中间件,旨在充分合理地在分布式的场景下利用关系型数据库的计算和存储能力,而并非实现一个全新的关系型数据库。 它与NoSQL和NewSQL是并存而非互斥的关系。NoSQL和NewSQL作为新技术探索的前沿,放眼未来,拥抱变化,是非常值得推荐的。反之,也可以用另一种思路看待问题,放眼未来,关注不变的东西,进而抓住事物本质。 关系型数据库当今依然占有巨大市场,是各个公司核心业务的基石,未来也难于撼动,我们目前阶段更加关注在原有基础上的增量,而非颠覆。

稍后我推荐ShardingSphere项目的两个主要PMC,@张亮 和 @曹昊,来关注一下这个问题。

Guess you like

Origin www.cnblogs.com/jinanxiaolaohu/p/11801885.html