Detailed explanation of Bitcoin in blockchain,


The first thing to get clear is that Bitcoin is a blockchain, but a blockchain is not Bitcoin.

Therefore, in the answer to this question of the blockchain, the words "miners", "mining", "longest chain", "fork" and so on are actually inaccurate.

Write a little about the content of the lecture last month - I personally feel that most of the answers, including those from google searches or wikis, can't explain what a blockchain is. Because there are many people who talk about Bitcoin and many people who understand Bitcoin, but when it comes to blockchain, there is no clear definition of what a blockchain is. Basically, all the introductions are like this:

Bitcoin——>Blockchain is the underlying technology of Bitcoin.

or

Bitcoin -> Bitcoin is a blockchain.

As for the question of what is blockchain, I have not seen a good definition and introduction. It is more an empty article about the meaning of blockchain in a generalized way, or else it is the miners of Yishui and the Mining. So let me talk about my personal definition of blockchain from a purely theoretical perspective:

1. The blockchain is a distributed database (system) placed in a non-secure environment.

2. The blockchain uses cryptography to ensure that existing data cannot be tampered with.

3. The blockchain adopts a consensus algorithm to reach a consensus on new data.

A system with the above three properties is a blockchain.



1. The blockchain is a distributed database (system) placed in a non-secure environment.

There are two main points here: (1) distributed, (2) non-secure environment.

First, this is a distributed, decentralized system. Therefore, there is a central server or node, not a blockchain. Nodes are safe and harmless, so this is not a blockchain. Similarly, from an application point of view, if your application must use a central node (such as using a supercomputer for deep learning) or do not need to consider the situation of node insecurity (such as a sensor in a safe factory), then There is no need to consider blockchain technology.

As for the latter word "database", most mature blockchains are databases at present. For example, Bitcoin is a distributed ledger, and the ledger is actually data. Then, according to the format of the data, it can be divided into three types: 1. The data is completely irrelevant, it is only a consensus, and there is no valid or invalid; 2. The data has some logical structure, such as a transaction in the ledger. In fact, in addition to the amount, there are also input and output, which are connected to the previous transaction. These data need to be verified by logic (for example, in a transaction, the node needs to verify whether the input transaction is valid); 3. The data has Turing-complete logic, and verification When it is necessary to use computing power through nodes, each transaction can have different outputs and states. What each node has to do is not only to verify the authenticity of the transaction and the correctness of the input, but also to read in according to the logic in the transaction. value, check the calculation and then verify the result.

Bitcoin’s system is the second, also known as distributed ledger; Ethereum is the third. The third can support smart contracts.

Taking Bitcoin as an example, 1, it is a completely decentralized system, 2, it is placed in a non-secure environment, and it does not require that all people who use Bitcoin are not malicious.


2. The blockchain uses cryptography to ensure that existing data cannot be tampered with.

This is the most misunderstood part, because many people only think of this when they mention blockchain. Granted, this part is important, and indeed the blockchain gets its name from it, but that's only part of the definition of a blockchain.

The two core points of this section are: (1) cryptographic hash functions, (2) asymmetric encryption.

Both are basic concepts of cryptography, and there are very clear definitions on the Internet. I will only briefly say:

(Cryptography) Hash function: a function Y=H(X), which has the following properties: 1, it is easy to calculate Y with X; 2, it is impossible to calculate X with Y; 3, it is impossible to find another X with Y ' so that H(X')=Y; 3.5, if X and X' differ very little, H(X) and H(X') are completely irrelevant.

This thing is mainly used to verify the integrity of information - put the hash value of this information behind a message, this value is very small, such as 256bit, and it is easy to calculate. After receiving the message, the recipient calculates the hash value again, and compares the two to know whether the message has been tampered with. If it has been tampered with, even if it is only one bit, the entire hash value will be completely different. According to the nature of the hash function, no one can forge another message with the same hash value, which means that the tampered data cannot pass the hash check at all.

Asymmetric encryption: This thing is easy to understand - symmetric encryption is to have a key, which can be understood as a safe key, you encrypt a message into ciphertext, no one can understand what it is, and then decrypt the same key into the original news.

Asymmetric encryption means that there are two keys, one is called the public key and the other is called the private key. If one is used to encrypt, only the other can be used to decrypt, and vice versa. Another important property is that, given you the ciphertext, the plaintext, and one of the keys, you still can't figure out what the other key is. The principle is basically based on some difficult mathematical problems, such as factorization and discrete logarithms, commonly used RSA, Diffie-Hellman and ECC (elliptic curves), Bitcoin uses elliptic curves.

In addition to being used for information encryption like symmetric encryption, asymmetric encryption has another purpose, which is authentication. Because usually we assume a pair of public and private keys, the public key is public, and the private key is only owned by me, so if a person has the corresponding private key, we can identify him as himself. One of the important applications is digital signatures - behind a message, the sender hashes the message and encrypts it with a private key. Then the recipient first hashes the message, then decrypts the digital signature with the corresponding public key, and then compares the two hash values. If they are the same, it means that the message was sent by himself and has not been tampered with.


The above is the basic knowledge. As for how the blockchain is implemented, it is very simple:

Transactions (data) are written in blocks.

The first block is called the genesis block, and you can write anything.

Starting with the second block, the first part of each block has the hash of the previous block. In addition, every transaction (data) in the block has the digital signature of the initiator to ensure authenticity and legitimacy. Thus, no data in previous blocks can be tampered with, for the reasons above.


At this point, some people may ask: Why do you want to get a chain? Wouldn't it be enough to just add a hash value to all the data?

Because - this database is not static.

The data in the database will increase, and each additional data is a block, so these blocks with different generation times are linked together in this form.

As for how to add blocks, it involves the third part - the consensus algorithm.


3. The blockchain adopts a consensus algorithm to reach a consensus on new data.


The purpose of the consensus algorithm is to allow all nodes to reach a consensus on the newly added block, that is to say, everyone must approve the newly added block. For a system with a center, this is very simple, and everyone agrees with what the center says, but in a decentralized system, especially when some nodes are malicious, this thing is very complicated, and there is a corresponding in computer science. It is called the "Byzantine Generals Problem" or "Byzantine Fault Tolerance" (BFT).

There's a lot of stuff about BFT with the example Lamport gave, so I'll take a different angle here.

When Lamport raised this question, he was working on a project for NASA at the Stanford Research Center. The reason he raised this question was not to consider application scenarios similar to Bitcoin (thousands of users on the entire Internet), but to consider the special background. A simple system -

The control system of the space shuttle.

If students with aviation background may know that the aircraft has three independent control systems, why? Because it is impossible for any system to fail completely, even if the failure rate of the aircraft control system is extremely low, there is still a possibility that it will break halfway through the flight. So we can get two independent systems, and the probability of failure at the same time will be greatly reduced.

But the two independent systems are still not enough to accommodate the mistakes of the other system - an aircraft is flying head-on, and one of the two systems says to dodge, the other says not to dodge, then whether to dodge or not to dodge? So we need three independent systems, so that if one system fails, there are two that work properly, giving the correct result with the majority obeying the minority. Students who have learned error correction codes should be familiar with this. The Hamming distance between the outputs of this system is 3, so one-bit errors can be corrected.

However, for the space shuttle, in the context of the Cold War, what if a system is not broken, but is controlled by the enemy? Are three systems enough?

The answer is no, because unlike nodes that are simply broken, malicious nodes can do something else to prevent the entire system from reaching consensus.

If this part is a little complicated, we need to open a separate post, so we only talk about the simplest case (no signature synchronization system).

We call the three systems ABC. The normal workflow is that three people tell each other each time they get the result, and then each person chooses the result that the majority agrees with. This is a distributed system without a central node, which means that three people cannot get together for a meeting, and three people can only communicate in pairs. At this point, suppose C is malicious and its goal is to disrupt the system. So, assuming that the correct reading is 1, A and B both get the result of 1. At this time, the little bitch C tells A that "my result is 0, and B also thinks it is 0", and makes a phone call to B at the same time. Said "Hey, I think it's 0, and A said the same", so A and B were stunned. Suppose you are A, you have heard two different versions of B's ​​answer, B said that he chose 1, C said that B chose 0, but A has no way of knowing who B and C are the kid who deceived him at this time. Bitch, because if B actually told A that 1 was chosen and then told C that it was 0, he would hear exactly the same result as he does now.

So the conclusion is that Byzantine fault tolerance, that is, it needs to accommodate a malicious system instead of a faulty system, and requires 4 independent systems.

(Of course, signatures can solve this problem, but this is only the case of a synchronous system. In an asynchronous system, this problem will become more complicated, because the response of normal nodes is delayed, and malicious nodes may not reply. Therefore, a normal node is a The other party has to wait for the reply from the other node, but it does not know whether the other party will reply because the other party may be malicious, and before receiving the reply, it has absolutely no way to judge whether the other party is a normal node or a malicious node. This problem is called Asynchronous BFT is also the most complex case of BFT. No further explanation will be given here. The BFT algorithm mentioned below is actually the algorithm of asynchronous BFT)

After Lamport raised this question, countless algorithms have been proposed, collectively referred to as BFT (Byzantine Fault Tolerance) algorithms, of which the most representative is called PBFT, and then due to the recent popularity of blockchain, countless algorithms have been optimized for blockchain application scenarios. BFT algorithms have also emerged, but an important problem is that all current BFT algorithms can only be applied in small networks. The reason is simple - because the BFT problem is designed for scenarios like the space shuttle control system, which was mainly considered by early algorithms. The PBFT paper considers a 5-node system. Even counting the newly proposed BFT algorithm, it can be used in networks with no more than 100 nodes at most.

This problem was put on hold for a long time until the birth of Bitcoin - Satoshi Nakamoto simplified this problem in a sense, in Bitcoin, it is also a consensus problem, Satoshi Nakamoto introduced an important assumption - reward, The reason why he can do this is that he is considering a digital currency, which means that consensus is valuable.

So on such a system, he proposed a proof-of-work mechanism.

All mining, miners, longest chains, forks, etc., can all be boiled down to one sentence:

Speaking has a price, telling the truth has benefits, telling lies will cost you money...

This is the core difference between the current two types of consensus algorithms:

BFT consensus model: Malicious nodes can do anything.

Bitcoin consensus model: There is a recognized "value" in the model. Every node requires a certain price to speak, honest nodes will be rewarded, and malicious nodes will be punished in disguise because they only pay the price and receive no reward.

That is to say, the BFT consensus model actually covers the scenarios of the Bitcoin consensus model, and the Bitcoin consensus actually relaxes the restrictions of the BFT consensus model.

The advantage of Bitcoin consensus for BFT is that due to the limitation of the ability of malicious nodes, the damage caused by malicious nodes is greatly reduced, especially for asynchronous systems - malicious nodes in BFT consensus can always refuse to respond while honest nodes still You have to wait for it (because you don't know if it's malicious or not), and with Bitcoin consensus, whatever, you don't get a reward if you don't respond. Therefore, the Bitcoin consensus algorithm can be applied to thousands of nodes, and anyone can join at any time without registering their identity in the network in advance (in the BFT algorithm, the number and identity of nodes in the network must be is known).

But the flaw of the Bitcoin consensus is that, first of all, there must be something valuable, that is to say, it is okay to put it in Bitcoin. Ethereum may be fine now, but what about other digital currencies... The BFT consensus has a The strict limit is that malicious nodes cannot exceed 1/3 of the total number. However, in fact, Bitcoin consensus does not have such a limit. The only limit is to assume that most nodes are rational and profit-seeking, that is, the best one will be adopted. strategies to earn the most value. Therefore, strictly speaking, the behavior of selfish mining is allowed in the Bitcoin consensus, and most attacks are not actually an attack, because these do not break through the framework of the Bitcoin consensus - if this value is infinite Great, Bitcoin consensus is very reliable. However, this is not true, because not every virtual currency is as valuable as Bitcoin, and when the value is not high, the premise of Bitcoin consensus is untenable - when the loss may be tens of thousands When it comes to making money, it is reasonable to assume that everyone is rational, but if the loss is only a few cents, the assumption is quite nonsense. In fact, it has also happened that a Bitcoin mining pool ran to another currency to mine maliciously to defeat the opponent. Case.

In addition, Bitcoin consensus is the longest chain consensus, which means longest chain --> majority --> rational, so forks are allowed. This leads to some side problems. For example, if there is a delay in the network, how do you know that the chain in your hand is the current longest chain in the entire network? Therefore, if more data needs to be transmitted, the delay increases. As the delay increases, the chain in the hands of more people is not the longest chain in the entire network. Therefore, the longest chain in the entire network cannot represent the majority. This breaks the fundamentals of Bitcoin consensus, which is why the Bitcoin block frequency is 10 minute blocks. Bitcoin currently has a famous upper limit of 7 coin transactions per second, and now the expansion is very noisy. The transaction format of Ethereum is different, and a new proof of work is also used. I want to change it to proof of equity, but these are not essential. . The real essence is that under the current network conditions, if the entire network is applied, the transaction volume of Bitcoin consensus basically cannot exceed the order of 100 transactions per second.

The above paragraphs may be too deep. In short, the difference between BFT consensus and Bitcoin consensus can be understood as follows:


BFT Consensus: Come, let's hold a meeting to discuss brainstorming, and discuss until everyone is satisfied with the results.

Question: Everyone knows the efficiency of a meeting. The more people there are, the harder it is to produce results. It can only be used for a few nodes, if it is used for thousands of nodes... Let's imagine the scene where the National People's Congress is held once a day.


Bitcoin Consensus: You read your poems well, the organization has decided, and you are the leader today. If you do well, you will be rewarded, and if you do not do well, you will be deducted.

Question: It’s okay to be rewarded with a few thousand yuan, but who will do it well for a few cents?



The blockchain is therefore divided into two distinct categories. Many people have heard about the public chain and private chain alliance chain. However, if you think that it is differentiated according to the application, you are wrong. In fact, these two The most essential difference between blockchains is that the consensus model or algorithm is different - the BFT algorithm cannot be applied to a large number of nodes, so the BFT algorithm cannot be used as a public chain. Bitcoin consensus must have a value system. It is very unreliable to use this thing as a private chain alliance chain, because the assumption of a simple profit-seeking person is still reliable, but if the object is a company, the company's interests are too much. It's complicated, and it can't be simply assumed that they are only chasing the value on the blockchain.

1. Public chains, represented by Bitcoin, Ethereum and all virtual currencies, all adopt the Bitcoin consensus, and the consensus algorithm basically adopts the proof-of-work mechanism, that is, mining. This mechanism has been discussed enough in other answers. Clear, just ignore it. The work proves that everything is fine, except for the electricity... how much electricity? For Bitcoin, it is almost as much as a city with a million people. In addition, the founders of Ethereum are particularly fond of Proof-of-Stake, and it seems that it will soon be used on a small scale (one in 100 blocks is Proof-of-Stake). But so far, everyone has taken a wait-and-see attitude on the reliability of this thing.

2. Private chain and alliance chain. Represented by IBM's hyperledger-fabric, and a whole bunch of others like tendermint, even R3 corda and ripple, all use BFT consensus. In fact, there are already many applications in this area. The problem is, 1. At present, basically all applications give people the feeling that they are using blockchain for the purpose of making blockchain. . 2. Due to the blockchain for the sake of blockchain, the security and reliability of many scenarios are still questionable, which is often criticized by supporters of public chains.


Well, the above is the blockchain defined by individuals. By the way, I will give an overview of the current development of the blockchain field.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325392434&siteId=291194637