Thinking: Does a truly distributed database make the concept of "data lake" a thing of the past?

Original address: http://www.fromgeek.com/ai/152830.html

 

A recent article by Wu Ningchuan, "Amazing, Ant Financial! Created China's own database OceanBase" reported the cause and effect of OceanBase's birth. The content is very detailed and worth sharing. At the same time, I also share a few miscellaneous thoughts:

First, killing familiarity is not only a product of the era of big data

The case of ordering a hotel or a car-hailing trip that broke out from a certain online platform before. It shows that in the era of big data, each of us is in a state of being a novice and being slaughtered at any time.

In fact, this phenomenon exists in all fields. For example, technical barriers are also one of the conditions for killing. As mentioned in the article, when Wang Jian was in Ali in 2008, he proposed to go to IOE. It is because of the killing situation caused by technical barriers. Normally, IT procurement is a tool that promotes business efficiency. However, the purchase includes minicomputers, high-end storage, and databases . The more purchases are made, the cost increases geometrically. Its IT procurement is no longer a driving factor, and even seriously hinders the development of enterprises.

The cost of devices like IOE is getting higher and higher during the large-scale development of Alibaba Cloud's business. For Ali, the driving force behind its technology to promote production has been lost. Under such circumstances, the OceanBase database independently developed by Ant Financial.

Second, a real distributed database was born, breaking the traditional "data lake" concept.

What is the traditional "data lake" concept, which is to regard multiple physical disks as a virtual storage unit. Chen Mengmeng, the head of SQL development direction of OceanBase team, said that all databases see the same one. Data disk and shared data access can ensure that all data can be accessed, but high requirements are placed on hardware, that is, the underlying hardware itself must be stable and reliable. It can be seen that this concept is accepted by the vast majority of traditional companies and even Internet companies.

And Ali broke this concept. There are only two companies in the world that have broken this concept, one is Ali and the other is Google.

Chen Mengmeng believes that there are only two real distributed databases in the world, Ali's OceanBase and Google's self-developed Spanner distributed database cloud service released in February 2017.

Even the Aurora database launched by AWS, its design principle is closer to the shared disk design of traditional databases.

Specifically, when OceanBase processes data access, it is equivalent to "slicing" an original minicomputer or storage device vertically into many machines, and then distributing the data to these scattered machines. Personal understanding should be An overall "data lake" is divided into multiple smaller "data pools".

One of the basic design ideas of OceanBase is to store each piece of data on three different machines. If the probability of a PC server failure is one in a thousand, the probability of two failures at the same time may be one in a million. , the probability of three failures at the same time is one in a billion.

Third, can the OceanBase distributed database be combined with blockchain technology?

First of all, we saw that Wang Jian proposed that Ali wanted to build a distributed database at the same time as the Bitcoin white paper proposed by Satoshi Nakamoto. Here we can see that since 2009, Wang Jian has been considering a distributed database that is truly suitable for future Internet business. You can look at it from another angle. It was also during the same period that Satoshi Nakamoto proposed a peer-to-peer electronic currency system, behind which the blockchain (also dubbed "the slowest distributed database in history") technology was used.

The difference is that Oceanbase, as a commercial project, after several years of continuous development, simply looking at this database not only realizes distributed data storage, but also optimizes database queries. In real application scenarios, compared to traditional bank counters, it takes a lot of time to provide services through artificial windows. Ant Financial provides users with high-quality Internet service experience based on the Internet financial applications provided by Oceanbase.

For all the slow distributed database technology of blockchain, you can refer to Ali's Oceanbase or Google's Spanner database technology. In this way, it has a positive effect on the advancement of blockchain technology.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325106726&siteId=291194637