"Blockchain Technology and Application" Class Notes (5): The Implementation Principle of Bitcoin System

The blockchain is a decentralized ledger, and Bitcoin adopts  a transaction-based ledger model  . Only transfer transactions and coinage transactions are recorded, and how much money is in each account is not directly recorded. If you want to know how much money is in a bitcoin account, you need to calculate it through transaction records.

UTXO unspent transaction output

The full node in Bitcoin needs to maintain a data structure called UTXO ( Unspent Transaction Output ), which is the output of transactions that have not been spent yet. A transaction may have multiple outputs, and those that are spent are not in UTXO. As shown in the figure, A transfers 5 BTCs to B, transfers 3 BTCs to C, and B spends 5 BTCs, then the transaction record will not be saved in UTXO, and C will not spend it, then the transaction record will be saved in UTXO

Each element in the UTXO set should give the hash value of the transaction that generated this output, and which output it is in this transaction . With these two pieces of information, a certain output in a certain transaction can be located.

 Why maintain such a data structure?

In order to prevent "double spending attacks" and determine whether a transaction is legal, it is necessary to check whether the BTC you want to spend is in the set. Only in the set is it legal. If the BTC you want to spend is not in the UTXO, it means that the BTC either does not exist at all, or has already been spent. Therefore, the full node needs to maintain a UTXO in memory, so as to facilitate the rapid detection of double spending (double spending attack).

As transactions are published, each transaction consumes some outputs and generates some new ones .

        As shown in the figure, A transfers 5 BTC to B, and then B transfers it to D, then the transaction record of A->B will be deleted in UTXO, and the transaction record of B->D will be added at the same time.

insert image description here

        If someone receives a BTC transfer but never spends it, then this information will always be stored in UTXO. This situation may be that the user does not want to spend these BTCs (such as: Satoshi Nakamoto), or it may be that he has forgotten the private key and cannot spend them. Therefore, UTXO is gradually increasing, but at present, the data can be completely stored in an ordinary server hard disk.

total inputs = total outputs in transactions

Each transaction can have multiple inputs and multiple outputs, but the sum of the inputs must be equal to the sum of the outputs (total inputs = total outputs).

It might be counterintuitive here, not only can there be multiple outputs, but also multiple inputs, and they don't even have to come from the same address . Each input address must provide a corresponding signature, so a transaction may have multiple signatures .

The second incentive mechanism: transaction fee

        The total input of some transactions may be slightly greater than the total output . For example, the total output may be 1 BTC, and the total output may be 0.99 BTC. The difference will be given to the node that obtains the bookkeeping right as the bookkeeping fee.

        This design is because it is not enough to reward the node that has the right to bookkeeping. Why should the node that has the right to bookkeeping write down certain transactions? What good would it do him? A node can only pack its own transactions. Recording other people’s transactions not only needs to verify its legitimacy, but also if there are more transactions in one block, the transmission bandwidth on the network will be more, and the transmission speed on the network will be higher. It will also be slow. The difference here is used as the bookkeeping fee to solve the problem of the motivation to keep books for others.

Here 0.01 BTC is already a large transaction fee, and there are also some very simple transactions without transaction fees, that is, it is completely in line with total inputs = total outputs

        At present, the main purpose of miners mining is still for the first incentive mechanism - to get block rewards . Because the reward for block generation is gradually reduced, it will be halved every 210,000 blocks. The average block generation time of the Bitcoin system is 10 minutes, and the reward for block generation will be halved approximately every 4 years. After many years, the block reward becomes very small, and transaction fees become the main motivation at this time.

In addition to the transaction-based ledger model (transaction-based ledger) such as the Bitcoin system , there are also some systems that are account-based model (account-based ledger) , such as Ethereum to be learned later . In this mode, the system must explicitly record how many coins are in each account.
This model of the Bitcoin system has better privacy protection, but it will bring some costs . For example, the transfer transaction must indicate the source of the currency (the currency is from which output of the previous transaction) to prevent double-spending attacks.

An example of a block

A block screenshot on blockchain.info :

 Note that the nonce in the block is a 4-byte or 32-bit integer, which is only 2^{32} a value. Because Bitcoin has become so popular in recent years and there are many people mining it, the difficulty of mining has been adjusted very high. It is very likely that a solution that meets the difficulty requirements cannot be obtained simply by adjusting the nonce (the search space is not large enough) . So which fields in the block header can be changed? Let's review the fields in the block header again (the number of bytes in parentheses):

  • Version number (4): cannot be changed
  • Previous block header hash value (32): cannot be changed
  • Merkle Tree Root Hash (32): Adjust the root hash of the Merkle Tree by modifying the CoinBase domain of the coinage transaction
  • Block generation time (4): There is some room for adjustment. The Bitcoin system does not require a very precise time. This field can be adjusted within a certain range
  • Coded version (4) of the mining target threshold: it can only be adjusted periodically according to the requirements in the agreement, and cannot be changed casually.
  • nonce (4): can be changed

         It can be seen that because the coinage transaction has no transaction source, you can  CoinBase域  write any content in it. The change of the coinage transaction will change the hash of the transaction. The root hash of the Merkle Tree changes , indirectly adjusting the hash of the block header.

Therefore, this field can be regarded as an extra nonce. If the nonce field of the block header is not enough, we can adjust some bytes of this field together to increase the search space. For example, take out the first 8 bytes of this domain as the extra nonce, then the search space will increase to 2^{32} *  2^{8*8}= at once 2^{96}.

In actual mining, two layers of loops         are generally designed for this purpose . The outer loop adjusts CoinBasethe domain of the minting transaction extra nonce, and then calculates the root hash value of the Merkle Tree; the inner loop adjusts the nonce of the block header, and calculates the hash value of the entire block header .

The figure below shows a small blockchain. Assuming that the transaction in the lower left corner is a coinbase transaction, it can be seen that the change of the transaction will be passed up step by step, and finally the root hash value of the Merkle Tree will change.

insert image description here

 Therefore, in actual mining, there are two layers of circulation. The outer loop adjusts the coinbase field (it can be stipulated that only the first x bytes of it are used as another nonce), and after calculating the root hash value in the block header, the inner loop adjusts the nonce again.

An example of a transfer transaction

Take this transaction as an example.

In this transfer transaction, the left side is the two inputs of the transaction (although Output is written next to it, which means that they spent the output of the previous transaction); the right side is the two outputs of the transaction (you can see from the green Unspent It has not been spent yet, so it will be stored in UTXO).

insert image description here

        It can be seen that the input and output of transactions in the Bitcoin system are specified by scripts. The process of verifying the input and output of transactions is to execute the input script and the output script in pairs (not pairing the input and output scripts of the same transaction, but Pair the input script of this transaction with the output script of the transaction that provides the origin of the coins). As long as they can be successfully executed after pairing, the transaction verification is passed.

Probabilistic Analysis of the Mining Process

        The process of mining is to constantly try nonce to solve the puzzle . Each attempt can be regarded as a Bernoulli trial (Bernoulli trial: a random experiment with binary outcome) . Tossing a coin is the simplest Bernoulli experiment. Either heads up or tails up. The two probabilities do not have to be the same. For mining, the probability of success and failure is very different, and the probability of success is very small.

When a large number of Bernoulli trials are performed, these Bernoulli trials constitute a Bernoulli process (Bernoulli process: a sequence of independent Bernoulli trails) . One of the properties of the Bernoulli process is memoryless (memoryless) , that is, to do a large number of experiments, the results of the previous experiments have no effect on the latter , for example, the probability of flipping a coin many times is tails up, and the next time the coin is flipped heads up Nor will it increase.

When the n of the Bernoulli distribution (that is, the binomial distribution) is large and p is small (the number of trials is large, and the probability of success in each trial is small), it can be approximated as a Poisson distribution. Mining here is a Bernoulli process with a large n and a small p, so it can be approximated as a Poisson process.

Progress free and computing power advantages - the guarantee of mining fairness

The block generation time is subject to exponential distribution . The block generation time of the entire system is adjusted to about 10 minutes according to the Bitcoin protocol, and the specific block generation time of a certain miner depends on its computing power in the entire system. The percentage of the miner's computing power is easy to understand. For example, if a miner's computing power can account for 1% of the total computing power of the entire system, then on average, 1 out of 100 blocks is dug by him. 1 block can be mined in 1000 minutes.

 

The exponential distribution that the block generation time obeys is also memoryless , that is to say, if it is truncated from any position, the remaining part still obeys the exponential distribution. "How much time will be mined in the future" has nothing to do with "how much time has been mined in the past". It is reflected in the mining of the Bitcoin system, that is, no matter how long everyone has been mining, the average time for the next block to be produced in the system is still about 10 minutes.

This is progress free-how much work has been done in the past will not change the probability of subsequent success .

This property may seem ruthless but necessary . Assuming that a cryptocurrency system does not satisfy progress free, that is, the more work done in the past, the greater the probability of success in the future, then theminers with strong computing power will have a disproportionate advantage, which cannot be calculated according to the ratio of computing power Advantage.

Analysis of Bitcoin Total Volume

Block rewards are the only way to generate new bitcoins in the system, and block rewards are halved every 210,000 blocks (approximately every 4 years), so the total amount of newly generated bitcoins forms a geometric series .

        It is a wrong idea to think that Bitcoin mining is solving a certain mathematical problem . Solving the puzzle of Bitcoin mining has no practical significance except for computing power. Bitcoin is becoming more and more difficult to mine just because the block rewards are artificially reduced, and more and more people join in to increase the difficulty of mining in order to maintain the average block time in the system.

        But it should be noted that although the mining process has no practical significance, it is crucial to maintaining the security of the Bitcoin system. Bitcoin is security by mining. Mining provides an effective means of voting with computing power.

Most of the computing power is in the hands of good users, can it be guaranteed that bad transaction records will not be written into the blockchain?
It should be noted that users with low computing power are not completely unable to obtain bookkeeping rights, but only a problem with a low probability. But in fact, even a malicious node with a small amount of computing power has a certain probability of obtaining the accounting right of a certain block.

 Transfer someone else's BTC

        Assuming that a malicious node M obtains the right to bookkeeping, it wants to transfer the money of node A, but because it cannot forge A’s signature (without A’s private key), write any incorrect signature, All will lead to honest nodes not accepting this candidate block, but continuing to expand along the previous block. Because this block is illegal, no matter how long it is, it is not the longest legal chain, and such an attack is invalid.

 fork attack

        M transfers BTC to A, and then digs a block immediately after mining. Here, he fills in the transaction that M transfers BTC to himself, hoping to become the longest legal chain along this block, so that the transfer can be made. Squeeze out for A, so as to roll back the spent BTC. This is also a type of double spend attack.

insert image description here

        Imagine that A is a shopping website that allows BTC payment. After the transaction M->A is written into the blockchain, A thinks that M payment is successful, then the above problems will indeed occur.

        What if the exchange M->A follows some blocks after the block? The difficulty of this attack will be greatly increased. Because its best way is still to insert in the previous block position of M->A, but it is very difficult to make it the longest legal chain, because it is no longer the longest legal chain, honest nodes will only go to Extend the longest legal chain.

        If most of the nodes are in the hands of honest nodes, it will be very difficult to attack, and malicious nodes will have to obtain the bookkeeping rights many times in a row before they can change the longest legal chain. So one of the simplest prevention methods is to wait for a few more blocks , also called waiting for a few more confirmations .

        The default in the Bitcoin protocol is to wait for 6 confirmations (about one hour) before considering one confirmationthe transactions in the block to be tamper-proof. From the previous analysis, it can be seen that the irrevocable nature of Bitcoin transactions on the blockchain ledger  is only a probabilistic guarantee.

Note that which block a candidate block is attached to is determined before mining starts, because the block header has a field such as the hash value of the previous block. Instead of waiting until after the right to bookkeeping is obtained.

zero confirmation

Zero confirmation means that when the transaction has just been published and has not been written into the blockchain, it is considered that the transaction has not been tampered with.

 Zero confirmation is actually widely used for two reasons:

  • If two transactions conflict, the node accepts the transaction it heard first. In the example of fork attack above, M->M after M->A is mostly rejected by honest nodes.
  • The shopping website entrusts all nodes to monitor the blockchain. There is actually a relatively long processing time from successful payment to delivery. If it is found that the transaction does not reach the longest legal chain at the end, the shopping website can choose to cancel the delivery.

Deliberately not posting certain legitimate transactions

        This is okay, because the Bitcoin protocol does not stipulate which transactions must be published by the node that has the right to bookkeeping, and it can be written into the next block if it is not written into this block. There are always honest nodes willing to publish these transactions . Moreover, when the Bitcoin system is working normally, some transactions will be released with a delay. It may be that there are too many transactions in a period of time. After all, a block cannot exceed 1M.

selfish mining

        Under normal circumstances, the node will publish it immediately after digging a block. This is to get block rewards and collect transaction fees. Selfish mining is to keep all the mined blocks . The motivation is, for example, in the previous fork attack, wait until 6 confirmations have passed, and then release the calculated long fork in one breath, replacing it longest legal chain.

        In fact, it is still very difficult to do so, because the computing power of this malicious node must exceed those of honest computing power before it may be longer than it after a certain period of time. In addition, most honest nodes have already expanded the block where the M->A transaction is located, and there must be many accomplice nodes of this malicious node.


        Even if it is not for any attack, but for earning block rewards and collecting transaction fees, selfish mining is also beneficial-it can reduce its competitors. For example, in the picture below, everyone is digging a block from A, and then a certain node digs out B and hides it first. At this time, others are still digging a block from A, and then this node digs out C, publish B and C together, so that there is one less competition for node C.

        Or keep digging down. When you hear someone release D, release B and C together, so that the longest legal chain is along ABC, and the D dug out by others will be invalid.

insert image description here

        But this will bring a lot of risks. Suppose someone digs out D and publishes it before digging out C. At this time, you can only quickly publish B. It is very likely that you will not even be able to compete for this bookkeeping right.

        Under this motivation, the return of selfish mining is not very high, it just makes others do useless work, and I have less competition, but the risk is quite high.

Guess you like

Origin blog.csdn.net/djklsajdklsajdlk/article/details/127702748