All it takes is one article to keep you on top of the current state of all open source databases

As the core of the business, the database is a very important part of the entire basic software stack. In recent years, new solutions and ideas have emerged in the community. Next, I will summarize some mainstream open source database solutions in recent years, their design ideas and applicable scenarios. Sorry for any omissions or mistakes. This sharing focuses on databases, both structured data storage, OLTP and NoSQL fields, and will not involve OLAP, object storage, and distributed file systems.
1 The rise of open source RDBMS and the Internet
For a long time, relational databases have been the patent of large companies, and the market is firmly dominated by enterprise databases such as Oracle/DB2. However, with the rise of the Internet and the development of the open source community, the release of MySQL 1.0 in the 1990s marked that the relational database community finally had an alternative. The first stand-alone RDBMS introduced by MySQL was MySQL
.
I believe that most of my friends are already very familiar with MySQL. Basically, the growth history of MySQL is the growth history of the Internet. The first MySQL version I came into contact with was MySQL 4.0, and later MySQL 5.5 was the classic - basically used by all Internet companies.
MySQL has also popularized the concept of "pluggable" engines. Choosing different storage engines for different business scenarios is an important way for MySQL tuning. For example, use InnoDB for scenarios with transaction requirements; MyISAM may be more suitable for scenarios with concurrent reading; but now I recommend using InnoDB in most cases, after all, it has become the official default engine after 5.6. Most of my friends basically know what scenarios are suitable for MySQL (almost all scenarios that require persistent structured data), so I won't go into details.
It is also worth mentioning that MySQL 5.6 introduces multi-threaded replication and GTID, which makes fault recovery and master-slave operation and maintenance more convenient. In addition, 5.7 (currently in the GA version) is a major update of MySQL, mainly due to the great progress in read and write performance and replication performance (the SCHEMA-level parallel replication was implemented in version 5.6, but it is of little significance, but MariaDB The multi-threaded parallel replication of MySQL is brilliant, and many people choose MariaDB because of this feature. MySQL 5.7 MTS supports two modes, one is the same as 5.6, and the other is multi-threaded replication based on binlog group commit. That is, the binlog submitted at the same time on the MASTER can also be applied at the same time on the SLAVE side to achieve parallel replication).
If you have a friend who has a stand-alone database technology selection, basically you only need to consider 5.7 or MariaDB, and after 5.6 and 5.7 are taken over by Oracle, the performance and stability have been significantly improved.
PostgreSQL
PostgreSQL also has a long history. Its predecessor was UCB's Ingres. Michael Stonebraker, who chaired this project, won the Turing Award in 2015. Later, the project was renamed Post-Ingres, and the project was open sourced under the BSD license. In 1995, several UCB students developed the SQL interface for Post-Ingres, officially released PostgreSQL95, and then gradually grew up in the open source community.
Like MySQL, PostgreSQL is also a stand-alone relational database, but unlike MySQL's over-expandable SQL grammar, PostgreSQL's SQL support is very powerful, whether it is built-in types, JSON support, GIS types, and complex queries. Support, PL/SQL, etc. are much more powerful than MySQL. And from the point of view of code quality, the code quality of PostgreSQL is better than that of MySQL. In addition, the SQL optimizer of PostgreSQL is much stronger than that of MySQL, and almost all queries are slightly complicated (of course, I have not compared MySQL 5.7, and this information may be outdated) PostgreSQL outperforms MySQL.
Judging from the trend in recent years, the momentum of PostgreSQL is also very strong. I think the inadequacy of PostgreSQL is that it does not have such a strong community and mass base as MySQL. After so many years of development, MySQL has accumulated a lot of operation and maintenance tools and best practices, but PostgreSQL, as a rising star, has better design and richer functions. Versions after PostgreSQL 9 are also stable enough, which is a good choice when making technical selections for new projects. In addition, there are many new database projects based on PostgreSQL source code for secondary development, such as Greenplum and so on.
In my opinion, the days of stand-alone databases will soon pass. There is always an upper limit to extracting the hardware dividends brought by Moore's Law. The data scale and traffic of modern business and the requirements of modern data science for databases are difficult to meet on a single machine. Network card disk IO and CPU always have bottlenecks, and online-sensitive business systems may have to bear the risk of SPOF (single point of failure). Is the master-slave replication model cut or uncut when the master dies? How to recover data after cutting? What if only the master-slave machine network partition problem occurs? Even the network partition problem in the monitoring environment? These are all problems.
So my point is, no matter how good the performance of the single machine is (a lot of the astonishing evaluation data are optimized for specific scenarios, and even some are not connected to the network locally, and in most cases the first bottleneck of the database occurs) In fact, network cards and concurrent connections...), with the vigorous development of the Internet, the emergence of the mobile Internet has ushered in the first distributed baptism of database systems.
2 The Distributed Era: The Renaissance of NoSQL and the Power of Model Simplification
Before introducing NoSQL, I would like to mention two companies, one is Google and the other is Amazon.
Google
Google should be the first company to apply distributed storage technology to large-scale production environments, and it is also the company with the deepest accumulation in distributed systems. It can be said that the engineering practices and ideas of distributed systems in the industry are mostly derived from Google. For example, GFS in 2003 created a distributed file system, and the Bigtable paper in 2006 created a distributed key-value system, which directly spawned the Hadoop ecosystem; as for Spanner and F1, which published papers in 2012, it is a relational database that indicates the future A milestone project in the development direction, we will talk about this later.
Amazon
Another company is Amazon. The Dynamo paper published in 2007 tried to introduce the concept of eventual consistency, the WRN model and the application of vector clocks, and at the same time integrated some very trendy technologies such as consistent HASH and merkle tree at the time, officially marking the birth of NoSQL— - The influence on the later industry is also great, including the later databases such as Cassandra, RiakDB, Voldemort, etc., which were developed based on the design of Dynamo.
new trend of thought
In addition, a more important trend of thought in this period (continuing around 2006 to the present) is that the database (persistence) and the cache began to have a clear separation - I think this trend started with memcached. With the increasing concurrency of business, the requirements for low latency are also getting higher and higher; another reason is that as memory becomes cheaper and cheaper, memory-based storage solutions are gradually becoming popular. Of course, the memory cache solution has also gone through a process from stand-alone to distributed, but this process is much faster than the evolution of relational databases.
This is because of another important sign of NoSQL - the change of the data model - most of NoSQL has abandoned the relational model and chose a simpler key-value or document type for storage. The data structure and query interface are relatively simple. Without the burden of SQL, the difficulty of implementation will be greatly reduced.
In addition, the design of NoSQL almost always chooses to sacrifice the support of complex SQL and ACID transactions in exchange for elastic expansion capabilities, which is also based on the actual situation of the Internet at that time: the simple business model, the massive concurrency brought by explosive growth, the explosion of the total amount of data, and the historical burden Small, strong engineers, etc. Most importantly, the business model is relatively simple.
Embedded storage engine
Before I start to introduce the complete open source solution, I would like to introduce the embedded storage engines.
With the development of NoSQL, not only cache and persistent storage have begun to be subdivided, but the storage engines have also begun to differentiate and come to the foreground. It was hard to imagine a storage engine providing services directly to the outside world independently of the database, just like you wouldn't use InnoDB or MyISAM or even a B-tree directly (except for the famous bdb, of course). People further encapsulate based on these open source storage engines, such as adding network protocol layers, adding replication mechanisms, etc., to build a complete NoSQL product with different styles step by step.
Here I select a few more famous storage engines to introduce.
TC
My first contact was with Tokyo Cabinet (TC). I believe many people have heard of TC. TC is a hybrid Key-Value storage engine developed and open sourced by Mixi, the largest social networking site in Japan, including the implementation of HASH Table and B+ Tree. However, one defect of this engine is that with the expansion of the data volume, the performance decline will be very obvious, and it is basically not maintained now, so please be careful when entering the pit. Tokyo Tyrant (TT) used in conjunction with TC is a network library that provides TC with a network interface to turn it into a database service. TT + TC should be an early attempt of NoSQL.
LevelDB
In 2011, Google open sourced the underlying storage engine of Bigtable: LevelDB. LevelDB is an embedded Key-Value storage engine developed using C++. The data structure adopts LSM-Tree. The specific algorithm analysis of LSM-Tree can be easily searched on the Internet, so I won't go into details. Its characteristics are that it is extremely friendly to writing, and the design of LSM avoids a large number of random writes; it can also achieve good performance for specific reads (hot data is in memory); in addition, LSM-Tree supports the same as B-tree. Orderly Scan; and LevelDB is from the hands of Jeff Dean. Friends of his deeds in distributed systems must know it. If you don't know, you can go to Google to search.
LevelDB has excellent write performance, thread safety, BaTCh Write and Snapshot and other features, making it easy to build MVCC system or transaction model in the upper layer, which is very important for databases.
It's also worth mentioning that Facebook maintains an active fork of LevelDB called RocksDB. RocksDB has made many improvements on LevelDB, such as multi-threaded Compactor, hierarchical custom compression, and multiple MemTables. In addition, RocksDB exposes a lot of configuration, which can be tuned according to different business forms; at the same time, Facebook is using RocksDB internally to implement a new MySQL storage engine: MyRocks, which is worthy of attention. The RocksDB community is very responsive and friendly. In fact, PingCAP is also a community contributor to RocksDB. I suggest that if a new project is torn between LevelDB and RocksDB, choose RocksDB decisively.
B-tree family
Of course , in addition to LSM-Tree, there are still many good engines in the B-tree family. First of all, most traditional stand-alone database storage engines choose B+Tree. B+Tree is more friendly to disk reads. The well-known pure B+Tree implementation of third-party storage engines is LMDB. First of all, LMDB chooses to implement B+Tree in the memory image file (mmap), and uses Copy-On-Write to realize the ability of MVCC to achieve lock-free reading of concurrent transactions, which is more friendly to scenarios of high concurrent reading; at the same time, because mmap is used So it has the ability to read across processes. Because I have not used LMDB in the production environment, I can't give some defects of LMDB, sorry.
hybrid engine
There are also some storage engines that choose a mix of multiple engines. For example, the most famous one should be WiredTiger, which was acquired by MongoDB last year and has now become the default storage engine of MongoDB. There are two implementations of LSM-Tree and B-tree in WiredTiger to provide a set of interfaces, which can be freely selected according to the business situation. Other storage engines with special data structures are very eye-catching in some special occasions, such as the extremely high compression ratio TokuDB, which uses a data structure called fractal tree, which can maintain an acceptable read and write pressure. times the compression ratio.
NoSQL
After talking about several well-known storage engines, let's talk about the more famous NoSQL. In my definition, NoSQL is the abbreviation of Not Only SQL, so it may include in-memory databases, persistent databases, etc. In short, it is a structured data storage system that is different from a stand-alone relational database.
Let's start with caching first.
Memcached
mentioned earlier that memcached should be the first large-scale cache database used in the industry. The implementation of memcached is extremely simple, which is equivalent to using memory as a large HASH Table, which can only be used for get/set/counter operations. Here A layer of network layer and text protocol (there is also a simple binary protocol) is encapsulated with libevent. Although some CAS operations are supported, in general, it is very simple.
However, the memory utilization of memcached is not too high. This is because memcached adopts the method of slab allocator implemented by itself in order to avoid the problem of memory fragmentation caused by frequent memory application. That is to say, memory is allocated piece by piece, and is finally stored in chunks of fixed length. The smallest allocation unit of memory is chunk. In addition, the performance of libevent is not optimized to the extreme, but it does not prevent memcached from becoming the de facto standard for open source caching at that time ( In addition, gossip, Brad Fitzpatrick, the author of memcached, is now at Google, if you use Golang, the official HTTP package of Go is written by this guy, who is a very prolific engineer).
Redis
If I remember correctly, around 2009, an Italian engineer, Antirez, opened up Redis. Since then, the cache market has been completely subverted. Now most cache businesses have used Redis, and memcached has basically withdrawn from the historical stage. The biggest feature of Redis is that it has rich data structure support, not only simple Key-Value, including queues, sets, Sorted Sets, etc., providing a very rich expressive power, and Redis also provides sub/pub and other beyond the scope of the database The convenient function of , made everyone fall into the arms of Redis almost overnight.
Twemproxy
, but with the gradual popularity of Redis, and the more and more use, and the cheaper the memory, people began to seek solutions to expand single-machine Redis. The earliest attempt was twemproxy, which was open sourced by twitter. twemproxy is a Redis middleware, basically only The simplest data routing function does not have dynamic scalability, but it is still sought after by many companies because there is really no solution. The subsequent Redis Cluster was also difficult to produce for a long time. After several years, 7 RC versions were released in the middle, and finally released;
At the end of 2014, we open sourced Codis to solve the data elastic scaling problem of Redis middleware. It is currently widely used in major domestic Internet companies. There are also many articles on the Internet, so I will not expand it. Therefore, in terms of caching, the open source community is now very unified, which is the extension scheme of Redis and its surrounding.
In the NoSQL family, MongoDB
is actually a heterogeneous. Most NoSQL abandons SQL in order to pursue more extreme performance and scalability, while MongoDB actively chooses documents as the external interface, which is very similar to JSON format. The Schema-less feature is of great significance for many lightweight businesses and fast-changing Internet businesses, and MongoDB is easy to use, basically out of the box, developers do not need to bother to study the table structure of the data, only Just need to save it, which has indeed attracted a large number of developers.
Although the early versions of MongoDB were unstable and the performance was not very good (the early Mongo did not have a storage engine and used mmap files directly), the cluster mode was still full of problems (for example, the cluster synchronization bandwidth that has not been solved so far takes up too much problem), but because it is so convenient, Mongo is a good choice in the rapid iteration of early projects.
But that's exactly the problem, I've heard more than once that teams end up going back to relational databases when projects get big or "serious". Anyway, after MongoDB acquired WiredTiger at the end of 2014, it was officially unveiled in version 2.8. At the same time, it was provided as the default storage engine after version 3.0, and the performance and stability have been greatly improved.
However, on the other hand, whether Schema-less is good or bad for software engineering is open to debate. I personally stand on the side of Schema, but using Mongo in some small projects or projects that need rapid development can indeed improve a lot of development efficiency, there is no doubt about it.
HBase
When it comes to NoSQL, I have to mention HBase. As an important product of Hadoop, HBase is the orthodox open source implementation of Google Bigtable. When it comes to HBase, we have to mention Bigtable. Bigtable is a distributed database widely used within Google, and the interface is not a simple Key-Value. According to the paper, it is called: multi-dimensional sorted map, that is, Value is divided according to columns . Bigtable is built on GFS, which makes up for the defects of distributed file system for insert, update and random read requests of massive, small and structured data.
HBase is the implementation of such a system, and the bottom layer relies on HDFS. HBase itself does not actually store data. Persistent logs and SST files (HBase is also the structure of LSM-Tree) are directly stored on HDFS. Region Server (RS) maintains MemTable to provide fast queries, and writes are written to logs , Compact is performed in the background, avoiding direct random reading and writing of HDFS.
Data is logically divided by Region, and load balancing is achieved by adjusting the Region ranges that each Region Server is responsible for. When a Region is too large, the Region will be split, and different RSs may be responsible for it later, but as mentioned earlier, HBase itself does not store data, the Region here is only logical, and the data is still stored in the form of files in HDFS Therefore, HBase does not care about Replication, horizontal expansion and data distribution, and it is all handled by HDFS.
Like Bigtable, HBase provides row-level consistency. Strictly speaking, it is a CP system in CAP theory, but unfortunately does not go further to provide ACID cross-row transactions. Needless to say the benefits of HBase, it is obvious that by extending RS, the throughput of the system can be improved almost linearly, and the horizontal expansion capability of HDFS itself.
But there are still disadvantages.
First of all, the software stack of Hadoop is Java, and the GC Tuning of JVM is a very annoying thing. Even if it is well tuned, the average delay is tens of milliseconds;
in addition, in terms of architecture design, HBase itself does not store data, so It may cause that the RS requested by the client does not know which HDFS DataNode the data exists on, and an RPC is added out of thin air;
third, HBase, like Bigtable, does not support cross-line transactions, and there are teams within Google that are based on Bigtable. Do support for distributed transactions, such as MegaStore, Percolator. Later, Jeff Dean also mentioned in an interview that he regretted not adding cross-bank transactions in Bigtable, but fortunately, this regret was made up in Spanner, which will be said later.
In general, HBase is still a very robust and time-tested system, but you need to have a deep understanding of Java and Hadoop before you can play it. This is also a problem of the Hadoop ecosystem. The ease of use is really not very good. Moreover, the evolution of the community is relatively slow, which is also due to the heavy historical burden.
Cassandra
When it comes to Cassandra (C*), although it is also an open source implementation of Dynamo, it doesn't feel like this. C* is indeed ill-fated. It was first developed and open sourced by Facebook in 2008. Almost all of the early C* were bugs. Later, Facebook simply stopped maintaining and turned to HBase, and a mess was thrown directly to the community. Fortunately, DataStax picked up the project and commercialized it for two years, and it finally became popular.
C* cannot be simply summarized as fast reading and slow writing, or slow reading and fast writing, because using the qourm model, adjusting the number of replicas and the number of reads can achieve different effects. For scenarios where the consistency is not particularly high, You can choose to read data from only one node to achieve the highest read performance. In addition, C* does not rely on the distributed file system, the data is directly stored on the disk, and each storage node maintains its own replication relationship, which reduces a layer of RPC calls, and has certain advantages compared to HBase in terms of latency.
But even using qourm's model does not mean that C* is a strongly consistent system. C* doesn't help you to resolve conflicts, even if you W(replicas for writes) + R(replicas for read requests) > N(total number of nodes), C* can't help you decide which replicas have newer versions, because The version of each data is an NTP timestamp or provided by the client itself. Each machine may have errors, so it may not be accurate, which is why C* is an AP system. However, one of the more friendly aspects of C* is that it provides CQL, a simple SQL dialect, which has obvious advantages in ease of use compared to HBase.
Even as an AP system, C* is already quite fast, but people's pursuit of higher performance will not stop. The release of ScyllaDB at the beginning of this year is a typical proof. ScyllaDB is a NoSQL database compatible with C*. The difference is that ScyllaDB is completely developed in C++ and uses black technology like DPDK. I will not expand the details. , If you are interested, you can go to Scylla's official website to have a look. BTW, Mogujie in China used ScyllaDB for the first time, and shared their solution on Scylla's official website, and the performance is still very good.
3 Middleware and sub-database and sub-table
NoSQL is introduced here first. Next, I want to talk about some middleware and sub-database and sub-table solutions based on stand-alone relational databases.
There is indeed a long history in this regard, and there is no way to choose. The relational database is not better than Redis, and it is not simply written a middleware similar to Twemproxy. The middleware of the database needs to consider a lot, such as parsing SQL, parsing the sharding key, then distributing the request according to the sharding key, and then merging; in addition, the database has transactions, and the session and transaction status need to be maintained at the middleware layer, and most solutions do not There is no way to support transactions across shards.
This inevitably leads to more troublesome business use, requiring code rewriting, and increasing the complexity of logic, not to mention dynamic expansion and contraction and automatic failure recovery. In the case of larger and larger clusters, the complexity of operation and maintenance and DDL increases exponentially.
Inventory of middleware projects The earliest project of
database middleware is probably MySQL Proxy, which is used to separate read and write. Later, Chinese people had many famous open source projects in this field, such as Ali's Cobar and DDL (not fully open source; later, the community based on Cobar's improved MyCAT, 360 open source Atlas, etc., all belong to this category of middleware products;
The open source project that has basically come to an end on the middleware solution should be Youtube Vitesse. Vitess is basically an integrated middleware product with built-in hot data cache, horizontal dynamic sharding, read-write separation, etc., but the cost is that the whole project is very complicated, and the documentation is not very good. About a year ago, we tried to build a complete Vitess cluster, but it was unsuccessful, which shows its complexity.
Another worth mentioning is the Postgres-XC project. Postgres-XC has great ambitions. The overall architecture is a bit like the earlier version of OceanBase. A central node handles and coordinates distributed transactions / resolves conflicts. Data is scattered in On each storage node, it should be the best distributed expansion solution for the PostgreSQL community at present. Not to mention the others.
4 Where is the future? NewSQL? In
a word, NewSQL is the future.
Google published Spanner's paper at OSDI in 2012, and published F1's paper at SIGMOD in 2013. These two papers gave the industry the first glimpse of the possibility of combining the scalability of relational models and NoSQL at very large cluster scales. Prior to this, it was widely believed that this was impossible, and even Google had experienced the failure of systems like Megastore.
Spanner Review
But the innovation of Spanner is to solve the problem of clock synchronization through hardware (GPS clock + atomic clock). In a distributed system, the clock is the most troublesome problem. Just mentioned why C* is not a strong C system, it is precisely because of the clock problem. The great thing about Spanner is that even if the two data centers are very far apart, no communication (because the cost of communication is too high, the fastest is the speed of light) can ensure that the clock error of the TrueTime API is within a very small range (10ms). In addition, Spanner follows a lot of Bigtable designs, such as Tablet / Directory, etc. At the same time, it uses Paxos replication at the Replica layer, and does not fully rely on the underlying distributed file system. However, the bottom layer of Spanner's design still uses Colossus, but the paper also says that it can be improved in the future.
Google's internal database storage business is mostly 3 to 5 copies, and the more important 7 copies are located in data centers on all continents around the world. Due to the widespread use of Paxos, the delay can be shortened to an acceptable range (Google's style has always been It is the pursuit of horizontal expansion of throughput rather than low latency, which can also be seen from the choice of pessimistic locks, because cross-data center replication is a must, and the latency cannot be low. For low-latency scenarios, the business layer can solve it by itself or rely on the cache. ).
In addition, the Auto-Failover capability brought by Paxos enables the entire cluster to be transparent and imperceptible even if the data center is paralyzed. In addition, F1 is built on Spanner and provides richer SQL syntax support. F1 is more like a distributed MPP SQL-F1 itself does not store data, but translates client SQL into MapReduce-like tasks and calls Spanner. to complete the request.
In fact, except for TrueTime, the entire system does not use any new algorithms, but the emergence of distributed system technologies Spanner and F1 in recent years marks the first NewSQL to provide services in a production environment.
There are several key points:
1. Complete SQL support, ACID transaction;
2. Elastic scalability;
3. Automatic failover and failover, multi-room remote disaster recovery.
The NewSQL feature is indeed very attractive, and within Google, a large number of businesses have switched from the original Bigtable to Spanner. I believe that in the next few years, the trend of the entire industry will be the same. Just like Hadoop in those years, Google's basic software technology trend is ahead of the community.
Community Response
After Spanner's paper was published, there were of course followers of the community who started implementing it (like us), the first team being CockroachDB in New York. The composition of CockroachDB's team is still very luxurious. The early team consisted of members of Google's distributed file system Colossus team; technically speaking, Cockroach's design is very similar to Spanner's, the difference is that it did not choose TrueTime but HLC ( Hybrid logical clock), that is, NTP + logical clock instead of TrueTime timestamp; in addition, Cockroach chose Raft instead of Paxos for replication and automatic disaster recovery, the underlying storage relies on RocksDB implementation, the entire project is developed in Go language, and the external interface uses PostgreSQL SQL Subset.
CockroachDB
CockroachDB's technology selection is more aggressive, such as relying on HLC for transaction timestamps. However, in the selection of waiting time in the Commit Wait phase of Spanner's transaction model, CockroachDB has no way to achieve a delay within 10ms; CockroachDB's Commit Wait needs to be specified by the user, but who can say how many milliseconds the clock error of NTP is within? Personally, I think there is basically only a hardware clock to deal with the problem of clock synchronization across intercontinental computer rooms. HLC is no way to solve it.
In addition, Cockroach uses gossip to synchronize node information. When the cluster becomes larger, the gossip heartbeat will be a very large overhead. Of course, the advantage brought by these technical choices of CockroachDB is very good ease of use, all logic is in a binary, out of the box, this is a very big advantage.
TiDB
is currently on a global scale, and another product that is moving towards Spanner/F1's open source implementation is TiDB (finally talking about our product). TiDB is essentially a more orthodox implementation of Spanner and F1. It does not choose to integrate SQL and Key-Value like CockroachDB, but chooses to separate it like Spanner and F1. This layering idea also runs through the entire TiDB project. It has advantages for testing, rolling upgrade, and complexity control of each layer; in addition, TiDB chooses the compatibility of MySQL protocol and syntax, and the ORM framework and operation and maintenance tools of the MySQL community can be directly applied to TiDB.
Like Spanner, TiDB is a stateless MPP SQL Layer, and the bottom layer of the entire system relies on TiKey-Value to provide support for distributed storage and distributed transactions. The distributed transaction model of TiKey-Value adopts the model of Google Percolator, but many optimizations have been made on it. The advantage of Percolator is that the degree of decentralization is very high. The entire cluster does not need an independent transaction management module. The information of transaction submission status is actually evenly distributed in the meta of each Key in the system. The only thing that the entire model relies on is a timing server.
On our system, in the extreme case, this timing server can allocate more than 400w monotonically increasing timestamps per second, which is basically enough in most cases (after all, there are not many scenes with Google magnitude); at the same time, in TiKey-Value , the timing service itself is highly available, and there is no single point of failure problem.
TiKey-Value, like CockroachDB, also chose Raft as the foundation of the entire database; the difference is that TiKey-Value is developed in Rust as a whole. As a language without GC and Runtime, it has greater potential for performance.
About the future
I think there will be several trends in the database in the future, which are also the goals pursued by the TiDB project: the
database will be cloud-based with the business, and all future business will run on the cloud, whether it is a private cloud or a public cloud, the operation and maintenance team will contact It may no longer be a real physical machine, but an isolated container or "computing resource". This is also a challenge for databases, because databases are inherently stateful, data is always stored on physical disks, and the cost of moving data may be much higher than the cost of moving containers.
Multi-tenant technology will become standard, a large database carries all the business, data is connected at the bottom layer, and the upper layer is isolated through technologies such as permissions and containers; however, the opening and expansion of data will become extremely simple, combined with the first point mentioned. Cloudization, the business layer can no longer care about the capacity and topology of the physical machine, just think that the bottom layer is an infinite database platform, and no longer need to worry about the capacity of a single machine and load balancing.
OLAP and OLTP will be further subdivided, and the underlying storage may share a set, but the implementation of the SQL optimizer layer must be very different. For users, if they can use the same set of standard syntax and rules to read, write and analyze data, they will have a better experience.
In the future distributed database system, the backward backup method such as master-slave log synchronization will be replaced by a stronger distributed consensus algorithm such as Multi-Paxos / Raft. Manual database operation and maintenance is impossible when managing large-scale database clusters. Yes, all failure recovery and high availability will be highly automated.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326099724&siteId=291194637