Understanding Blockchain with Database Thinking

As a veteran of the database industry, I see that under the upsurge of blockchain technology, traditional IT technology students have maintained a very rational, even exclusionary attitude. In fact, whether it is popular or rejected, from a bipolar point of view, I think we should discuss blockchain technology from an angle that IT people can understand. Because the essence of blockchain is very similar to database technology, many mechanisms are very intuitive and accurate to understand using the concept of database.

For blockchain and traditional data technology, I think the future development of blockchain technology, the theme is "fusion". We will interpret various technical points in the blockchain technology system from the perspective of database, and through the concept of "decentralized database", we will make a better integration of blockchain and database technology.

1. Status Quo of Blockchain Technology

In the current blockchain world, some people claim to be 1.0, 2.0, or even 3.0, but from the perspective of a product or technology polishing, I think the current blockchain is equivalent to the database in the 1980s. An era of endless ideas.

For technicians, this is the best era, with a lot of fresh ideas and ideas bursting out, bringing fresh breakthroughs in the dull technical field; at the same time, this is also the worst era, no product or direction is definitely the future mainstream, any fresh ideas may prove unfeasible after a few months.

Therefore, we must correctly understand the current change and development of blockchain technology, then let's compare the path that the database has taken in the past, and see how the blockchain world will develop in the future.

1. Technological evolution route

First of all, I think that the blockchain will definitely evolve from the current specialization to generalization. At present, basically all products for public chain are implemented and optimized for a specific scenario, but I think the future will not be one application and one chain, but a general development paradigm. Just like traditional databases, no matter what kind of applications you develop, you can use a limited number of general-purpose products to satisfy most business scenarios.

Second, the evolution to standardization. For blockchain technology, now each chain basically has its own development paradigm, and even many public chains imitate Ethereum and try to create a programming language by themselves, which is actually a sign that the industry is in its original period. How to judge an industry is beginning to mature? That is, the business model is basically fixed and the development method is basically fixed, so that a large number of programmers can be promoted.

Third, productization and modularization have been continuously strengthened. Whether it is Ethereum, Bitcoin or many other new public chains, most of the architectures are very tightly coupled. Compared with our Hadoop in the field of big data, basically each module can be configurable and customizable as an independent plug-in. Therefore, I believe that with the continuous maturity and stability of blockchain technology, there will be a mature product in the future, which can satisfy various consensus algorithms and security mechanisms through pluggable configurations and plug-ins.

Finally, performance and scalability improvements. In fact, this is also the path that the database has taken. The current blockchain world wants to cross the decades of changes in the database in a short period of time through mechanisms such as sidechains and sharding.

Next, I will introduce, from the perspective of the database, where the biggest bottleneck in the performance and scalability of the blockchain is, and how it should be optimized.

2. Development status

Let’s go back and look at the current state of the blockchain industry.

Aside from some applications and innovations in the financial field in the upper layer of the blockchain, from a technical point of view, I always think that its biggest innovation lies in the establishment of a peer-to-peer data storage mechanism.

In the database industry, everyone has always followed the master-slave architecture, and the complete "multi-active" system has been a legend since it was proposed decades ago, and no product has ever really achieved multi-active.

And when we look at the current blockchain technology in terms of innovative multi-active databases, we find three problems that need to be improved:

  • First of all, the architecture of the blockchain is very chaotic now. People have not classified it into modules such as transactions, stored procedures, authentication, and master-slave synchronization like traditional databases. Most people's cognition of blockchain still stays at Mysterious black box stage.
  • Second, the development language of blockchain is completely fragmented. After the beginning of the "Warring States Period", the database gradually used SQL to achieve the unification of the industry. The blockchain is obviously still in the "Warring States Era", and there is no unified standard for development and use.
  • Third, there are various requirements. Some requirements or business introductions in white papers are reliable, while others are completely whimsical and unintelligible. In fact, this is related to the brand-new business model brought by the blockchain. Many people are still exploring new business models, resulting in the lack of a standard paradigm for requirements.

2. Blockchain vs Database Technology: Similarities

From the perspective of database, blockchain technology is a decentralized multi-active database technology, and there is no essential difference between the two.

Here I list some of the more important technical points in the blockchain, and what form these technical points exist in the database field. The one-to-one correspondence between these concepts and the technical concepts in the database is as follows:

consensus mechanism

Consistency Control - Consensus Mechanism

Distributed databases are called consistency control, including traditional master-slave replication, a new generation of Raft, Paxos and other algorithms. In order to solve the additional Byzantine problem in the blockchain, the algorithm is improved to PBFT, PoW, PoS and other protocols.

storage mechanism

Database log - ledger

The blockchain structure is basically equivalent to the transaction log of the database. The main new additions include the Merkle Tree structure used to quickly verify the correctness of the data, but its essence is equivalent to the transaction log of the database. At the same time, the database will also include enterprise-level capabilities such as transaction control in the log, which is not available in the blockchain data structure.

smart contract

Smart Contracts - Stored Procedures

A smart contract, like a database stored procedure, is a piece of managed code. In essence, there is no difference between smart contracts and database stored procedures. They both execute a piece of code through external calls or virtual machines, and can share managed code with other users for calling.

Fragmentation

The database sharding mechanism has existed since the MPP database era. By dividing a large amount of data into different shards, the total amount of data in each shard is limited and the total throughput and storage space are improved.

Application Development Interface

The current blockchain is still in the early era of similar databases, and the interface is not uniformly standardized. Depending on the blockchain project, its interface can be defined in terms of database, object storage, API calls, or even PaaS platform standards.

Safety

The security mechanism of the blockchain has similarities with the database security mechanism. Database security is generally divided into two modules: authentication and authorization, which represent user login and access rights respectively. The blockchain currently only supports record-level write authorization, but it is fully shared for read operations. Therefore, the database is much better than the current blockchain in terms of security policy.

3. Blockchain vs Database Technology: Differences

Database and blockchain functional architecture diagram

1. Functional Architecture

The yellow parts are features that both blockchain and database architectures have. The white part is the unique function of the current database.

We also mentioned above that the SQL capability of the database is an important part of realizing its versatility. SQL is very important for the development of the blockchain to be fixed in the future.

Index management is mainly to improve the performance efficiency of data management and data query in the database. When specific application scenarios appear, performance will become an important part that needs to be improved in the next stage. Therefore, the index of the stored data becomes a very important component.

2. Mechanism

In terms of mechanism, the main differences between blockchain and database are as follows:

  • consistency

The biggest difference between the design idea of ​​blockchain and the traditional database design idea is that it is more active, that is, the difference between the consistency model brought by the decentralization system.

Traditional relational databases follow the ACID strong consistency model, and written records can be read immediately. And some new distributed databases use eventual consistency, that is, the BASE model. The written data may not be read temporarily, but it will eventually exist.

However, there are obvious differences in the design ideas of blockchain or decentralized databases, that is to say, there is no concept of "permanent confirmation" for any operation. Even if it is similar to Bitcoin, from the core principle, the content before 6 blocks is only "basically will not be rolled back".

To give an extreme example, if the wide area network between China and the United States is suddenly interrupted for three days for some reason, and then recovers, Bitcoin will definitely undergo a large-scale fork. If you want to restore a main chain, you must sacrifice a large number of people's transactions to achieve rollback.

Then, since there is no way to guarantee strong consistency in the peer-to-peer architecture, the consistency in the blockchain system is fundamentally different from the traditional database, which leads to a series of subsequent design differences.

In the final analysis, in any database model of traditional master-slave architecture, people will do everything possible to prevent "split brain" in the cluster, that is, two nodes in the same cluster think they are the master node.

But this problem may occur all the time under the peer-to-peer database system, and this phenomenon is called fork in the blockchain, which is very different from our traditional database consistency model.

  • lock mechanism

Among them, the lock mechanism can be said to be the biggest difference between the blockchain and the database in ensuring data consistency.

It is impossible for all students who have studied databases to have not heard of locks. When we do a transaction, all records changed by the session before committing must be locked and cannot be modified by other sessions.

In a decentralized database, since each ledger node operates local data, the change information will be transmitted asynchronously, so there is no global lock that can notify others when the record changes. Therefore, under the premise of no locks, how does a decentralized database, that is, a blockchain, ensure data consistency?

Bitcoin uses the UTXO structure, which is somewhat similar to the idea of ​​"optimistic locking" in the database, that is, it is not locked during operation, and only in the process of final submission to determine whether the record has changed.

Bitcoin determines whether there is a transaction conflict by the state of whether the coin has been spent. Ethereum, on the other hand, uses nonce as an incrementing counter for each record to determine whether there are duplicate transactions for an account, which is actually a row-level lock mechanism implemented in disguise.

  • Security Mechanism

Another thing that everyone in the blockchain industry talks about is the security mechanism.

First of all, I am not an encryption algorithm expert, so I will not discuss the specific encryption algorithm used here, but from the design of the security model of the entire storage system, to discuss how blockchain technology works under the system of full peer-to-peer architecture Ensure data security.

In my opinion, the security system of blockchain is divided into three levels, record level, block level and chain level.

Record-level security mainly judges whether a certain operation record is legal, and in some implementations, it also includes whether it is visible to different users for read and write.

At the block level, when a node receives a block sent by another node, how to judge that the block itself has not been tampered with, then it can be done through mechanisms such as Merkle tree and mining results.

Finally, how to ensure the integrity of the chain? For example, each data block needs to include the verification of the previous data block in the chain, and how to roll back when a fork occurs, all of which ensure the integrity of the entire chain structure.

4. Decentralized database architecture

What will be the result of the fusion of blockchain technology and database technology?

Can we organize the existing blockchain in a database structure and divide it into different modules such as kernel, runtime, plug-in, and SQL parsing and optimization?

Since the core essence of the database is still the immutable transaction log, this part is equivalent to the chain structure of the blockchain, then if we set up the SQL engine in the state store, or even let the SQL engine directly access the data in the chain, it is Doesn't that mean we have a common programming and access interface?

Another example: for security components, can we achieve column-level, row-level, table-level and node-level security authentication, and at the same time, we can specify which tables need to be digitally signed through configuration, and some fields of which tables are shared, but other fields are It needs to be encrypted by multi-signature, etc.

In addition, for consistency, can we specify that some tables are global shared tables and some tables are local tables, so that the current deployment method of blockchain and database can be replaced.

I think there will be a "decentralized database" that combines the two in the future.

Decentralized database basic functions

Basic features of a decentralized database:

  • Decentralization: The architecture is completely decentralized, there is no central control node, each node has the function of reading and writing, and the data of each node is consistent;
  • No global lock: Due to the peer-to-peer architecture on the WAN, it is impossible for a decentralized database to implement a global lock, so the system can only use a certain degree of weakened lock and consistency to meet the needs of high availability;
  • Non-fixed nodes generate logs: non-fixed nodes generate logs. The logs are the logs of the entire database. In the decentralized architecture, any node has the right to record logs, thus forming a decentralized architecture without a master node. Any node All have the opportunity to temporarily become an accounting node to produce blocks;
  • Asynchronous transaction confirmation: Since there is no global lock, some transaction mechanisms have to be adjusted compared to traditional databases. It may be a more feasible idea to make the commit and rollback of the transaction asynchronous;
  • Consistency strategy adjustment: In the multi-active blockchain state, the data consistency strategy will be different from the traditional database consistency mechanism;
  • Row-level security and triggers: For data security, a decentralized database will guarantee data security down to the row-level or even the column-level.

5. Blockchain and database technology integration: centralized database

For blockchain and traditional data technology, I think the future development of blockchain technology, the theme is "fusion"!

Now the business concept of blockchain is developing rapidly, but from the perspective of the technology itself, I think the current blockchain technology is still similar to the database technology stage in the "80s" of the last century, in the growth period of technology. As we mentioned above, blockchain technology still has a long way to go in terms of generality and standardization.

Based on the similarity of technical route and architecture design, the integration of database technology and blockchain technology is actually the general trend. Through the introduction of blockchain technology and mechanisms, decentralized databases may be an important direction for future technological development.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325040202&siteId=291194637