Distributed storage system development paper research based on blockchain

Distributed storage system development paper research based on blockchain (1)

Paper 1 "Research on Application System Development Methods Based on Blockchain" - Cai Weide

Paper cited: [1] Cai Weide, Yu Lian, Wang Rong, Liu Na, Deng Enyan. Research on application system development methods based on blockchain [J]. Journal of Software, 2017, 28(06): 1474-1487.

1. Introduction to blockchain

  Blockchain is a distributed data system involving multiple independent nodes. It can also be understood as a distributed ledger technologt (DLT). It is jointly maintained by these nodes. It is characterized by being difficult to tamper with, difficult to forge, and traceable . . The blockchain records all transaction information, the process is transparent, and the data is highly secure. Blockchain technology can be used in any application field that requires justice, fairness, and honesty.
Blockchain specifically divides data into different blocks. Each block is linked to the back of the previous block through specific information, and is connected sequentially to present a complete set of data. The block header of each block contains the hash value of the previous block (previous block Hash), which is obtained by calculating the hash function of the block header of the previous block. Simply put, a blockchain is a "linked list" that replaces ordinary pointers with hash pointers. As shown below.
Insert image description here

  Blockchain specifically divides data into different blocks. Each block is linked to the back of the previous block through specific information, and is connected sequentially to present a complete set of data. The block header of each block contains the hash value of the previous block (previous block Hash), which is obtained by calculating the hash function of the block header of the previous block. Simply put, a blockchain is a "linked list" that replaces ordinary pointers with hash pointers. As shown below.
From a technical perspective, the core elements of blockchain include the following three aspects:
(1) Block chain structure: each block uses the hash encryption information of the previous block to verify each transaction;
(2) Multiple independent copies Storage: Each node stores the same information and enjoys the same power; operates independently; is suspicious of each other and supervises each other;
(3) Byzantine fault tolerance: tolerates less than one-third of the nodes maliciously cheating or being attacked by hackers, ensuring that the system can still function normally Work.
  Element (1) points out that the blockchain is an "account book"; element (2) points out that the blockchain is a "distributed account book", especially that "enjoying the same rights" is crucial to the blockchain. Importantly, if this cannot be guaranteed, it cannot be called a blockchain. In addition, unlike the existing distributed storage method, the blockchain distributed ledger is carried out in a synchronous manner, rather than making multiple backups after one ledger is formed; element (3) points out that the blockchain is A “consistent synchronized distributed ledger”.
  Another important component in the blockchain is the consensus algorithm . The first-generation blockchain represented by the Bitcoin blockchain uses PoW (51% of the vote). As a representative of the second-generation blockchain, Ethereum’s private chain uses PBFT. As a representative of the third-generation blockchain, Beihang Chain uses CBFT, which improves performance.
There are two consensus algorithms of the Byzantine Generals model: serial and parallel.
  (1) Byzantine consensus protocol PBFT (practical byzantine fault tolerance): transactions and voting are serial, and the block building process requires 3 votes; (2)
  Concurrent Byzantine consensus protocol CBFT (concurrent byzantine fault tolerance): transactions and voting are parallel conduct.
At the application level, blockchain has the following characteristics:
  (1) Extremely difficult to tamper with. Once the data enters the blockchain, even workers within the system cannot make any changes in the blockchain. This stems from the mechanism of the blockchain itself - modifications need to go through the consensus algorithm, and malicious tampering will be rejected by good users in the chain (the premise is that the vast majority of users in the chain are non-malicious).
  (2) On-chain code. Contracts or legal documents loaded into the blockchain are executable. When the conditions are met, legal affairs will be automatically generated and become "smart contracts" in Ethereum; (3)
  Everyone participating in the activity has complete data . Everyone can make decisions based on their own data;
  (4) Everyone has complete historical data, so it is difficult to be deceived. Blockchain can build a trust network for people who do not trust each other, because everyone has complete data , everyone trusts their own data, and knows that their data has a consensus;
(5) Blockchain The architecture is shared, distributed, repetitive, and based on local materials;
according to the characteristics of different application fields, different types of blockchains can be selected, which are generally divided into public chains and permissioned chains.
  (1) Public chain: All nodes are neutral and open, and can vote, keep accounts, and build blocks. Because the entire network needs to vote, the speed is very slow.
  (2) Permitted chain: Only licensed nodes can participate in voting, accounting, and block building; including all non-shared chains such as private chains, alliance chains, and enterprise chains. The article predicts that permissioned chains will become the mainstream in commercial applications.

2 Requirements and architecture design of blockchain application systems

2.1 Requirements for blockchain application systems
2.1.1 Consistency requirements

  In a distributed environment, data needs to use a consistency protocol to ensure consistency. Public chains mainly use PoW (Proof of Work) and PoS (Proof of Stake) mechanisms; while permissioned chains mainly use PBFT and CBFT. Generally speaking, the more informed the blockchain system, the better, but consensus is expensive, and a lot of computing power and node communication are spent on the consensus mechanism. For example, PBFT requires 3 rounds of voting. Each round uses broadcast communication. Each communication requires signing and unsigning. In addition, each transaction requires signing and unsigning. Therefore, 80% of the computing power is spent on consensus processing. superior. PoW faces speed and scalability issues, while PBFT faces concurrency issues. PoW relies on the computing power of nodes to complete consensus, but PBFT does not.

2.1.2 Software design requirements

  When blockchain technology is applied to system development, it can eliminate many intermediate links and simplify the process. However, there is a new problem in designing blockchain application systems, that is, functions can be placed on the application system or on the blockchain. Executed using on-chain code. However, this will consume a lot of computing power. It is recommended that most functions should be in the application system.

2.1.3 Scalability requirements

  Some current solutions address scalability needs by giving up the definition of blockchain, such as giving up the need for multiple copies of the blockchain to increase transaction speed.
It is worth mentioning that the scalability of Beihang Chain is divided into three steps: (1) using the CBFT parallel algorithm for voting to increase the speed of building blocks; (2) ABC and TBC dual-chain architecture to protect privacy and parallel computing , save computing power and simplify application architecture; (3) Taking advantage of the characteristics of double chains, one chain can be split into two chains at runtime, and two different sets of hardware execute the two chains separately to increase speed. In this way, the original definition of the blockchain can be used, while still having high visibility and scalability.
Beihang Chain can be used as a reference for private chains.

2.1.4 Database requirements

  Although blockchain has become a distributed database, it is completely different from traditional databases. High-speed blockchains are also different from low-speed blockchains; in low-speed environments, transactions are processed serially, so consistency issues are easy to solve in low-speed environments; in high-speed environments, transactions and block building are done in parallel. So consistency is a new issue.
  Traditional databases use individual transactions, while blockchain uses blocks to maintain consistency. A significant difference is that traditional databases only allow at most one write operation, while blockchain requirements allow thousands of write operations to the same data in a block at the same time.

2.1.5 On-chain code requirements

  The code on the chain and the block building process interact with each other. How to resolve the conflict between the code on the chain and the block building process is a key issue.

2.2 System architecture of Beihang Chain

  Beihang Chain is a permissioned chain jointly developed by Beihang University and Peking University. Its original design is to serve public trust and finance. Beihang Chain abandons the P2P network and mining mechanism, takes scalability as the first goal, and pays attention to speed optimization. .In order to ensure system security, Beihang Chain has added a node credit system. This is the first time that a reputation system is used to identify cheating nodes. Once a node's cheating behavior is discovered, it will be immediately excluded from the voting nodes. Figure 2 is the Beihang chain architecture diagram.
Beihang Chain is a permissioned chain jointly developed by Beihang University and Peking University. Its original design is to serve public trust and finance. Beihang Chain abandons the P2P network and mining mechanism, takes scalability as the first goal, and pays attention to speed optimization. .In order to ensure system security, Beihang Chain has added a node credit system. This is the first time that a reputation system is used to identify cheating nodes. Once a node's cheating behavior is discovered, it will be immediately excluded from the voting nodes. Figure 2 is the Beihang chain architecture diagram.
Insert image description here

  • Storage layer: The storage layer includes operating system and database services;
  • Basic blockchain layer: Transport service puts transactions in cache into buckets; Block service creates bitmaps for transactions in each bucket; Round Robin uses a round-robin method to select threads, create and send blocks to all other nodes for further execution Reputation calculation; synchronizer broadcasts the length of the local blockchain, receives missing blocks, and stores received blocks; ABC (Account Blockchain) synchronizes the blockchain to ensure consistent status across different nodes, creating account indexes to speed up queries , and provide account public and private key services; for on-chain code transactions, TBC (Transaction Blockchain) first executes the on-chain code, and then puts the results into the bucket. For non-on-chain code transactions, it is directly placed into the bucket, and Prepare to create blocks;
  • Cache layer: used to cache temporary information in memory, including new transactions received from users and on-chain code; those blocks that have not yet been transferred to disk; and to support temporary data storage for system operation;
  • API layer: Provides external and internal API interfaces. The internal API is used for internal communication between nodes, such as voting and broadcasting blocks; the external API is used for external users, such as accepting new transactions and query operations;
  • On-chain code layer: Provides contract-related services. The on-chain code is written according to domain-specific requirements, verified for legal correctness by all stakeholders, and then deployed for execution in the blockchain system. This layer has 3 functions: User interaction (editing), process execution engine and contract services supporting account management, state storage and sending transactions;
  • Application layer: This layer has applications, such as banking systems, computational legal systems, credit certification systems and supply chain systems. When designing a blockchain, the more nodes there are, the more secure the system will be, but the consensus will be slower and consume more time. The greater the computing power.
2.3 Blockchain interface design

  OBCC (open blockchain connector) is a set of unified interfaces for blockchains that provide applications to facilitate universities to use blockchain functions, including storing user data in the blockchain and querying information that users need. As shown in Figure 3.
Insert image description here

The interface for writing to the blockchain is defined as put(action,data), where,

  • The parameter action indicates the user's data processing intention, which can be create, insert, update or delete. Note: The blockchain cannot change the data that has been stored in the blockchain. The update and delete here do not update or delete the data like the database. delete, but records the operations that have occurred on the data on the blockchain, that is, recorded as a new transaction;
  • The parameter data is the user's data. The format and content will be different according to different application fields. The
    blockchain query interface is defined as get(condition), where the parameter condition indicates the user's query condition, which can be the hash value of the block or a transaction The hash value can also be keywords related to the application. The use of inverted index and big data analysis technology allows users to quickly and efficiently obtain valuable query results. OBCC provides a toolkit that can be directly imported
    into In your own software project, when programming and developing, you use the functional interface of the blockchain just like calling local functions or methods. As shown in Figure 4.
    Insert image description here

  The article implements the Java version of the blockchain connector - JBCC, which has supported the development of multiple blockchain application systems, including CCTV micro-movie management platform, university student status and file management system, financial transnational payment system, and bank credit card Consumption management system, cross-industry points tracking management system. Application system development based on OBCC blockchain has the characteristics of short development cycle, high scalability and fast running speed.

Research on blockchain application development methods

3.1 Research on dual-chain design of blockchain

  At present, most blockchain applications usually have only one blockchain, and all accounts, contracts, transactions, etc. are placed on this blockchain. For example, the European Union Bank proposed a universal chain concept. In this way, all participating institutions need to share internal information with other participating institutions. All participating institutions vote as a node on the chain to maintain account consistency. This design has poor scalability and low throughput. As business increases, latency will become higher and performance will become lower.
A new architecture proposed in the article is that all participating institutions share metadata and protocols, but do not share data (data is accounts). All participating units can trade with other units, while ensuring Privacy. According to this concept, there are at least the following two types of blockchains, as shown in Figure 5.
A new architecture proposed in the article is that all participating institutions share metadata and protocols, but do not share data (data is accounts). All participating units can trade with other units, while ensuring Privacy. According to this concept, there are at least the following two types of blockchains, as shown in Figure 5.
(1) ABC account blockchain (account blockchain): ABC only stores account information and post-transaction information, but does not execute transactions; (
2) TBC transaction blockchain (trading blockchain): TBC only stores information useful for transactions And execute related transactions.
  ABC is responsible for querying, saving accounts, and building blocks. For example, ABC stores financial institution or family account information. Account information within a chain is shared, making it difficult to tamper with account information. At the same time, ABC also provides Scalability, that is: when the blockchain processing size exceeds the limit, it can be split into multiple sub-ABCs, hosted on different machines to maintain a balanced workload. A blockchain (chain 1) (as shown in Figure 6, block 1. Block 2, Block 3) can be divided into two blockchains. The first (Chain 2) is Block 1, Block 2, Block 3, and Block 4A, and the second (Chain 3) is Block 1, Block 2, and Block 4A. Block 3 and Block 4B, and these two blockchains meet the definition of blockchain.
  ABC is responsible for querying, saving accounts, and building blocks. For example, ABC stores financial institution or family account information. Account information within a chain is shared, which makes the account information difficult to be tampered with. At the same time, ABC also provides scalability, that is: When the blockchain processing size exceeds the limit, it can be split into multiple sub-ABCs, hosted on different machines to maintain a balanced workload. A blockchain (chain 1) (as shown in Figure 6, block 1, block 2, Block 3) can be divided into two blockchains, the first (chain 2) is block 1, block 2, block 3, and block 4A, and the second (chain 3) is block 1, block 2, block 3, and block 4B , and both blockchains meet the definition of blockchain.

3.2 Legally based blockchain application development technology

  Traditional application requirements analysis and modeling usually include functions, performance, security, interfaces, etc., while blockchain application requirement analysis and modeling also need to consider law, because many blockchain applications are related to law. For example, use area To use blockchain to store electronic evidence, the following three conditions must be met in China.
(1) Timeliness: data must be collected in a timely manner;
(2) Process: process data must be recorded;
(3) Immutability :The data collected and stored must be proven not to have been tampered with

3.3 Research on on-chain code design

  The smart contract used in the blockchain system is intended to establish an upgraded version of the code contract that cannot be tampered with and manipulated by humans.
In a traditional reliable database management system (DBMS), transactions should have four characteristics ACID (ISO/IEC 10026-1:1992): atomicity, consistency, isolation, and durability. CAP theory points out that it is impossible for a distributed computing system to ensure consistency at the same time ( Consistency, availability and partition tolerance (partition).
  In the blockchain database, ACID principles are not followed. The blockchain uses distributed ledgers to ensure the consistency of data, and maintains consistency by building blocks. Each block contains many transactions. The transaction method of the blockchain is different from the traditional database transaction method, see Table 2.

4 Summary

  This article mainly studies application system development based on various characteristics of blockchain. The general outline is as follows: The first part introduces the advantages of blockchain for system development, that is, non-tamperability. Then the important mechanism in the blockchain-the consensus mechanism was introduced. Several existing solutions are analyzed for the consensus mechanism. PoW is relatively inefficient. The Byzantine consensus method initially works in series. The parallel Byzantine method proposed in this article makes up for this shortcoming.
The second part gives a relatively reasonable Beihang Chain architecture from the perspective of application system development. The third part determines the leading position of the dual-chain model in the Beihang Chain (this section is introduced with a case in the article, if you are interested, you can check it out by yourself).

5 thoughts

  My feeling is that in the next development tasks, we can try to move closer to the double-chain model. The attributes of a private chain are more suitable for application systems, and the design of a double chain can better ensure the privacy of institutions joining the chain.
  There is a lot of content in this article that is relatively broad. For example, the smart contract/on-chain code area is relatively superficial. We can discuss it with you as the research deepens.

6 This article is only for personal study records. Please give me your advice!

Guess you like

Origin blog.csdn.net/qq_41247688/article/details/129289858