If you understand HASH, you will understand more than half of the blockchain

"For the first time in human history, people all over the world, at huge cost, are scrambling to find the results of mathematical operations in the aesthetic sense."

— guard sir

Beeple Encrypted Artwork "Blockchain"

When it comes to blockchain, it seems that everyone understands a little bit, and if you ask in detail, it seems that you don’t understand it again.

For example, you ask a person: why do you want to mine, and what exactly is being mined.

I'm afraid not many people understand.

This article is to explain to you!

foreword

When people talk about blockchain, they often say tamper-proof and tamper-proof, so how is this tamper-proof achieved?

Mainly rely on HASH.

What people often say about mining is actually mining HASH.

Dig a nice looking HASH.

But, what is HASH?

The layman can't explain it clearly, and the layman seems to be seldom able to explain it to the layman.

Many people like to use metaphors to tell laymen about HASH. In fact, you just make 100 metaphors (what mahjong, poker, dice, etc.), and those who don't understand HASH still don't understand.

Among them, the more silly and sweet, even think, is mining digging mahjong?

If you really want to understand HASH, let him see what HASH looks like! !

What is HASH

HASH is an algorithm, you give it a string of numbers and it gives you a string of numbers.

What you give it is called input, what it gives you is called output.

That is to say, the data is input to the HASH function, and the HASH function outputs a string of numbers.

HASH(input) = output

It's like you feed a pig into a sausage machine, and the sausage machine gives you a piece of sausage.

Let's take an example, let's take the most common HASH algorithm MD5 as an example. (Of course, there are many HASH algorithms, and MD5 is one of the most well-known.)

MD5 ("weijianfan") = c49262b1117b9fd1d6ebd74aaa016f3e

In the above example, "weijianfan" is the input, and the string of numbers after it is the output.

Another example:

MD5 ("weiyuerenhua") = a799c7c504a1a80f95ebe69a86c42637

Note that the HASH function has an important feature, no matter how long the input is, the output is fixed length.

Of course, as with general algorithms, as long as the input remains the same, the output remains the same.

For example, MD5, the input is unlimited in length, and the output is 128-bit binary, which is 16-byte hexadecimal. As in the first example above, c4 is a byte, representing 8-bit binary, that is, 11000100, and the last 3e is also a byte, that is, 00111110

Let's make the input longer:

MD5 ("In Math We Trust.") = 767fa54f12bab6b71fb411f265814bb7

Take Chinese characters as input:

MD5 ("博鳌亚洲论坛2021年年会数字支付与数字货币分论坛4月18日晚举行。") = 36b5e89c797d22d14ccefe4ec79f56c2

To make it longer:

MD5 ("Andreas M. Antonopoulos is a noted technologist and serial entrepreneur who has become one of the most well-known and well-respected figures in bitcoin. An engaging public speaker, teacher, and writer, Andreas makes complex subjects accessible and easy to understand. Andreas advises multiple technology startups and speaks regularly at conferences and community events around the world.") = 5ac503b01e213d4794d92134096ad313

Longer Chinese characters:

MD5 ( "Andreas M. Antonopoulos 是⼀位著名的技术学家和连续创业企业家,⽐特币界最著名和倍受尊敬的⼈物之⼀。⾝为⼀名迷 ⼈的公共演说家、教师和作家,他善于把复杂的问题变得简单⽽易于理解。作为⼀名顾问,他则帮助初创者认知、评估并指 引减⼩安全和业务风险。") = 90f039293e0b3da5516e251b93434795

Another feature of HASH is that as long as the input changes a little, the output will be completely different, as if the input is completely different.

For example, delete the word "one" in "one" in the above text, and then do HASH:

MD5 ("Andreas M. Antonopoulos 是位著名的技术学家和连续创业企业家,⽐特币界最著名和倍受尊敬的⼈物之⼀。⾝为⼀名迷 ⼈的公共演说家、教师和作家,他善于把复杂的问题变得简单⽽易于理解。作为⼀名顾问,他则帮助初创者认知、评估并指 引减⼩安全和业务⻛险。") = 159c9d192e45fbd2eaa0c3f068a78508

As you can see, a small change in the input results in a completely different output.

HSAH does not pick the input, whether it is a string, or a file, whether it is a text, an image, or a video, as long as it is a number, it is treated as a binary input.

For example, give the following image to MD5, you can get:

MD5 (Emily Blunt.jpg) = b818d284ef28f733c701f7bc1ee5f669

Figure | Emily Blunt, Britain's most popular actress

If you have the md5 tool on your computer, you can try to do HASH by yourself. For example, on a mac computer, enter md5 -s "xxx" in the "terminal", or md5 1.txt to perform operations on strings or files. md5.

Several gigabytes of video files are no problem:

MD5 (伟大的转折-01.mp4) = 9093c85d13f79609978f52c48e19aa65

Photo|38 episodes of revolutionary historical drama "The Great Turning Point", Zunyi Conference Site

Even if you change any pixel of any frame in this video, the result calculated by MD5 will be completely different.

Therefore, to determine whether a file has been changed by someone, just calculate the HASH. As long as the HASH has not changed, the file has not been passively changed. (The premise is that this HASH algorithm is not bad!)

Now you are probably a little impressed by HASH. Anyway, as long as it is a digital input, it can give you a fixed-length output, and each one is different.

What is a good-looking HASH

A: A good-looking HASH is a HASH with many 0 bits in front. — guard sir

The HASH output is relatively evenly distributed. Basically, if you hash any 16 different inputs, the probability that the first digit is 0 is 1/2 (there are about 8), and the probability that the first two digits are 0 is 1/2. 4. The probability that the first 3 digits are 0 is 1/8, and the probability that the first 4 digits are 0 is 1/16.

Let's do an experiment to see if it's right. The following inputs are given at will:

1、MD5 ("ionnoouyd") = c78a3b60314d11fc9e739aef407989f5
2、MD5 ("njjiuhbh") = 9aee0690f6002392b2c6fc0d2224adb2
3、MD5 ("88990933") = b19bf0928fc649d99b1cdf02748ae88e
4、MD5 ("-sr&&fvbgt") = 24cc429a2636ac1b9092cf3f681bba09
5、MD5 ("区块链技术") = 649e6048a32c09299e1c952347ccac7e
6、MD5 ("hashcash is very good") = 8015b497b3a9bd6ca7ba1213a731b1a1
7、MD5 ("CC0-MIT") = cd08de7f0f219d8e13437c65974f9773
8、MD5 ("beihaimuchang") = 7cd9a24efce05bbceb01c9020d904294
9、MD5 ("wqqwrr2") = 1cb1797fb57add01523fbd6e86ca2b73
10、MD5 ("1123ed") = ae49266f10e08922780afeb664fd61dc
11、MD5 ("hello world") = 5eb63bbbe01eeed093cb22bb8f5acdc3
12、MD5 ("niahoa...") = 5d3e60365ff999e68a932da4619a129b
13、MD5 ("blockchain") = 5510a843bc1b7acb9507a5f71de51b98
14、MD5 ("随机发均出版后我给") = 18eb95457ec90f1c33fa5914579730d7
15、MD5 ("93002712") = 0abf0bd1dbb35366c56b26d157686f0f
16、MD5 ("13811031123") = 940ca3847eec1e99e716975bc7096c8d

As mentioned earlier, the MD5 output is 128-bit, and the above is displayed in hexadecimal. For example, the first byte of the first output is "c7", and the first byte of the second output is "9a". Use The binary representation is "11000111", "10011010". According to this method, the above 16 HASH outputs are (ellipsis means that the following bits are omitted):

HASH01  c7………:11000111……………………
HASH02  9a………:10011010……………………
HASH03  b1………:10110001……………………
HASH04  24………:00100100……………………
HASH05  64………:01100100……………………
HASH06  80………:10000000……………………
HASH07  cd………:11001101……………………
HASH08  7c………:01111100……………………
HASH09  1c………:00011100……………………
HASH10  ae………:10101110……………………
HASH11  5e………:01011110……………………
HASH12  5d………:01011101……………………
HASH13  55………:01010101……………………
HASH14  18………:00011000……………………
HASH15  0a………:00001010……………………
HASH16  94………:10010100……………………

You can count the number of hashes whose first 1, 2, 3, and 4 digits are 0. Is the probability similar to the above?

Of course, my favorite is that the first 4 bits are all 0, HASH15, it looks the best among the 16 bits.

The probability of it appearing is about one in every 16 HASH. That is, 2^4 times out of one.

So, how many times will a HASH with the first 20 digits be 0 come out?

2^20 times.

2^20 is 2 to the 20th power, which is 1,048,576! More than a million! So if you want to find it on your computer, it will take a while~

The first 20 digits are 0, that is, the first 5 digits of hexadecimal are 0, which is probably as follows

00000fc7f1d91e9053995f707a90971d

Isn't it pretty?

At least it looks better than anything that has ever appeared before.

Of course, it's not the most beautiful. The HASH that is unparalleled in the universe is all 0.

This all-zero HASH should have never been calculated. I wonder if it can be found by luck before the extinction of mankind.

In short, what I said is good-looking, that is, there are many 0s in front.

The more 0s, the better.

What are the characteristics of a good HASH algorithm?

Note that there are many HASH algorithms, such as MD5, SHA-1, SHA-2, etc., and SHA-256 and RIPEMD-160 used by BTC.

There are at least two features:

1. For any two different inputs, different outputs should be produced. (this is called anti-collision)

2. It is easy to calculate forward, but it is very difficult to reverse the input from the output, and you can only guess by violence. (this is called irreversible)

First look at the first feature: anti-collision

A good HASH algorithm should produce different outputs for any two different inputs.

Facts have proved that MD5 and SHA-1 cannot be considered as good HASH algorithms, because Academician Wang Xiaoyun and others have been able to find two different inputs and produce the same HASH output. See related articles 1 and 2 for details .

How to visualize this feature?

First of all, the output of this HASH algorithm must have a certain length. If the length is not enough, there will be repetitions. For example, suppose a HASH algorithm has only 1 output, and the output is either 0 or 1. Is it possible to find the same output with different inputs after running it two or three times? If the output is only two digits, the output of this HASH has only 4 possibilities, 00, 01, 10, 11. Is it possible to find the same output with different inputs after running it four or five times!

So MD5 is as long as 128 bits, and SHA-256 is as long as 256 bits!

Secondly, it is necessary to ensure that the HASH value is very random and evenly falls on the entire output space. And if the input is a little different, the output will be completely different.

In this way, there is an effect of ID, and HASH is like a different "fingerprint" for each input. For example, if you hash 10 million different files, and each file gets a unique HASH value, you can use this as the ID of a file. Each ID refers to a unique file. If the same ID is encountered again, it means that the same file is encountered.

Again, to understand, the output space is very large.

For an algorithm like RSA-256, which has 256 output bits, there are 10^78 possible values ​​(or 2^256), how big is that?

The amount of sand on the earth (not just on the beach, but all), someone has estimated, it is roughly 7*10^21, and less than 10^22.

The number of planets in the entire universe is roughly no more than 7*10^22, and less than 10^23.

You give it two different inputs, and the probability that it will produce the same output is much smaller than in this example:

You randomly choose a grain of sand on a planet, and another person, completely unaffected by you, also randomly chooses a grain of sand on a certain planet, and you both choose the same grain of sand! Then you both chose the same atom in this sand by coincidence!

Now look at the second feature: irreversible

A good HASH algorithm requires that the forward calculation is fast, but the reverse is almost impossible.

For example, I now have a BTC private key, which looks like:

5HvrDrdQ9EpJTcJHXuctU9vUjydzuZ1????????????????DCHa

For secrecy, I replaced 16 characters with ?.

计算出这个私钥的MD5值为:630a0cec43d49095027b224ea0f2b317

So, can the hackers all over the world have the ability to get my private key by cracking MD5?

The answer is: according to the current ability, not.

What the hacker can do is to keep trying the 16 possible combinations of the ? numbers and calculate their MD5 values, hoping to calculate the same MD5 value one day.

But this kind of violent guessing takes a long time. (As a rough estimate, with the current technology, if hackers from all over the world unite and have as much computing power as global BTC mining, it will take at least 5 years.)

So what is mining doing?

At 18:09:30 on June 13, 2021, BTC miners dug a HASH:

00000000000000000009813c8a3b95e3a75d878419547b7fe4dd71f9dc71da72

Look how beautiful this HASH is! How many zeros are in front of it!

The above is represented in hexadecimal, so each 0 is actually 0000 in binary, so the HASH above is preceded by 19*4=76 binary 0s!

This HASH is the HASH of a block (strictly speaking, the HASH of the block header).

There's one so pretty every 10 minutes lately.

How many times did the miners make such a beautiful hash?

I don't know, anyway, an average of 2^76 times will only appear like this, how many times do you say you have done it.

They are so hard to dig this, what are they going to do?

Figure BTC for this block reward.

The miner who mined this block (possibly many people mined together) got 6.54164549 BTC, of ​​which 6.25 was rewarded by the system, and the rest was the handling fee earned.

Oh, I roughly understand HASH, what is a block?

Now let me tell you how the ancestor of the blockchain - BTC - works.

If you don’t understand it once, read it twice, and if you don’t understand it twice, read it aloud a third time.

Generally understandable.

1. Several nodes in the Internet (nodes are computers!) run BTC software at the same time. These software are open source, and anyone can download and run them (the nodes mentioned in this article refer to full-featured nodes, and there are currently about 1,000 in the world).

2. If people want to initiate a BTC transfer (that is, a transaction), a node broadcasts the transfer information on the Internet, and the transaction quickly spreads to every node in the entire network. Each node will check whether the transaction it receives is logical (such as whether there is so much money to transfer), and if it is not, it will discard the transaction.

转账就是:张三给李四发送1个BTC,王五给赵六发送0.1个BTC,诸如此类。差不多就是这个意思,每一笔转账也叫一个交易。

3. At an average interval (average 10 minutes), a node in the network will take the lead in packaging a block, which contains all transaction data during this period, and this block will be broadcast to the entire network. All nodes will receive the block (about 1M in size).

3a:打包是有条件的,这个区块的头部的HASH值必须很好看。

3b:平均10分钟才出一个,这是由挖矿难度导致的。难度是动态调整的,每两周调整一次,调整使得全网平均每10分钟才能挖出一个区块。

在一个完全去中心化的网络中,难度调整是如何做到的呢?方法是:每过2016个区块,所有节点都会自发地调整难度(写在代码里了)。新的难度是由最近2016个区块的花费时长与20160分钟(两周)比较后调整得出的。如果花费时间短于20160分钟,就将难度调大一些,反之亦然则调小一些。难度可以简单地理解为要求HASH值前面有多少个0。

3c:每个10分钟内,世界上都会发生很多笔交易,这些交易都等着打包(最终只有打包在区块里的交易才被人们承认)。每个试图打包的矿工,把自己收到的、尚未打包的交易,放在区块里(由于区块大小的限制,平均一个区块能装大约2000多个交易),然后通过填充区块中的随机数区域,计算HASH,以期找到一个好看的HASH(符合难度就叫好看),找到了就叫打包成功(也就是挖矿成功),赶紧广播出去。

3d:每次收到广播来的区块,各节点检验该区块是否符合对区块的要求(至少HASH要好看嘛!),如果符合,就把此块保存下来,然后开始试图出下一个块(也即继续挖矿),把已经收到但尚未打包的交易打包进入下一个要出的块。

4. All nodes are rushing to pack, because whoever packs correctly will be rewarded with BTC.

4a:每成功出一个区块,打包者将会被奖励若干个BTC。最早一开始是奖励50个,每210000个区块(大约4年)奖励减半,所以后来是25个、12.5个、现在是6.25个,到2140年就会无币可挖。

4b:事实上,矿工们除了可以得到系统的奖励,还能得到区块中的手续费。

5. After each node receives the block, if the verification is correct, it will accept the block and append it to its own saved blockchain.

5a:由于每个节点都和其他节点不断同步,所以每个节点(全功能节点)都保存着从第一个区块到现在这个区块的数据(目前已经产生将近70万个区块)。

5b:一个区块的头部,会含有本区块体内全部交易的merkle根(其实也是一种HASH计算啦),如果区块体内的某个交易被篡改,可以通过计算merkle根,和头部的merkle根比较,从而发现篡改。

5c:一个区块的头部,还会含有上个区块头部的HASH值(可以简称为区块的HASH)。这就可以校验上个区块是否完整、正确(因为任何数据差错,都做不出好看的HASH了)。你可以从最新的区块,一直追溯到创世区块(第一个区块),确保没有任何数据被改动过。

5d:整个区块链上,如果其中任何一个区块,有任何一点点的篡改,都会导致这个区块的HASH值不再是一个好看的HASH,也会导致其后区块的HASH都不再好看,任何明眼人,都会知道数据不对了。而每个好看的HASH,都是全球若干矿工,使用若干矿机,花着若干电费,辛辛苦苦挖出来的。一个试图篡改区块的人或组织,没有这么强大的能力。

Epilogue

Now, have you learned about HASH, mining, and why the blockchain is so tamper-proof!

Let me briefly summarize:

HASH is easy to calculate, but it is not easy to reverse it; if the input changes a little, the output will change greatly; for a good HASH algorithm, different inputs will definitely be different outputs, so don’t worry about collision; the HASH value can be Treat it as an ID; by comparing HASH, you can determine whether the input has been changed by someone.

The purpose of mining is to find a good-looking HASH for a block, that is, the HASH whose first few digits are 0; no matter how many nodes are mining, the difficulty can be adjusted automatically and simply, so that it takes about 10 minutes to find a good-looking one. HASH; each block puts its good-looking HASH in the head of the next block; each full node records the data of all blocks, and can easily verify that all blocks have not been changed Yes; if someone wants to change the data in the historical block, it will take a lot of effort!

But that's only half of it, there are some things you don't understand, and it's not what this article wants to describe.

If you are interested in other things, please leave a message.

Text|Wei Jianvan


  1. https://www.zhihu.com/question/19743262/answer/289095984 

  2. https://www.zhihu.com/question/56234281/answer/148349930

Guess you like

Origin blog.csdn.net/vigor2323/article/details/117888051