Recently, I am learning about zero-knowledge proof. Because of the large amount of content and the difficulty, I want to make a series of summaries according to my own learning route. This is the first article. It mainly introduces some important concepts and ideas of zero-knowledge proof, which can be used for zero-knowledge proof. Have an intuitive understanding, and then explain a classic and concise zero-knowledge proof security protocol Schnorr protocol.

This article mainly includes four aspects. First, we will introduce the zero-knowledge proof in general, then take the map three-coloring problem as an example to reflect the idea of zero-knowledge proof, and then analyze the interactive Schnorr protocol and the non-interactive Schnorr protocol.

1. An overview of zero-knowledge proofs

·Introduction

Zero-Knowledge Proof was proposed by S. Goldwasser, S. Micali and C. Rackoff in the early 1980s. The prover can convince the verifier that a statement is correct without providing any useful information to the verifier. A zero-knowledge proof is essentially a protocol involving two or more parties, a series of steps that two or more parties need to take to complete a task.

·concept

"P" stands for "Proofer": as a participant in the zero-knowledge proof, he will not reveal any relevant information while proving the authenticity of the proposition.

"V" stands for "Verifier": as another participant in the zero-knowledge proof, it verifies whether the proposition proposed by the prover and the corresponding proof are correct.

"Commit": The prover makes a commitment to the proposition and waits for the verifier to challenge and verify.

"Challenge": The validator chooses a random number to challenge the proposed commitment.

"Response to the challenge phase (Response)": The prover will return the challenge response with the random number received and the given commitment.

"Verification stage (Verify)": The verifier verifies whether the response to the challenge is correct. If it is wrong, it will prove to be a failure. If it is successful, the next challenge can be performed until the probability that can be trusted reaches the conditions accepted by the verifier, which proves success. .

·nature

In zero-knowledge proofs, three properties need to be satisfied:

correctness . No one can impersonate P to make this proof successful. If this property is not satisfied, that is, P does not know "knowledge", no matter how to prove it, it is difficult for V to believe that P has correct knowledge.

completeness . If both P and V are honest and prove that each step of the process is calculated correctly, then the proof must be successful. That is to say, if P knows "knowledge", then V will have a great probability to believe P.

Zero knowledge . After the proof is executed, V only obtains the information "P has this knowledge", but does not obtain any information about the knowledge itself.

·application

Data privacy protection: In privacy scenarios, based on zero-knowledge, without revealing details such as the receiver, sender, transaction balance, etc. of the transaction, it is proved that the asset transfer on the blockchain is effective. Another example is when buying insurance, the insurance company needs to know if I have a certain disease, but I don’t want the insurance company to know all my medical record information, so I can prove to the insurance company that I don’t have the relevant disease.

Computational compression and blockchain expansion: In the current blockchain architecture, the same computations are repeated many times, such as signature verification, transaction validity verification, and smart contract execution. These computational processes can be compressed by zero-knowledge proof technology. For example, Ethereum adopts zkSNARK, which brings dozens of times of performance improvement.

End-to-end communication encryption: Users can communicate with each other, but the message records will not be completely exposed on the server. At the same time, messages can also show corresponding zero-knowledge proofs according to the requirements of the server, such as the source and destination of the message.

Identity authentication: The user can prove to the website that he owns the private key, and the website does not need to know the content of the private key. The user's identity can be confirmed by verifying this zero-knowledge proof.

Decentralized storage: Servers can prove to users that their data is properly stored and the content of the data is not revealed.

2. Example: Map Three Coloring Problem

·Map three coloring problem

Three-coloring problem: Suppose there is a map, and some roads are built between different cities. The three-coloring problem is whether there is a coloring method, so that each city is represented by one of the three specific colors, and any road connected Neither city is the same color.

Let's design an interactive protocol: Alice is the "prover" and Bob is the "verifier"

Alice has a three-color scheme for a particular map and wants to prove to Bob that she has the scheme without revealing any information.

1. Commitment Phase

First, in the commitment phase, Alice first performs some "transformations" on the dyed graph, replacing the colors, such as changing all blues to green, green to orange, and orange to blue. In this way Alice gets a new coloring answer, at this time she covers each vertex of the new graph with a piece of paper and shows it to Bob.

2. Challenge Phase

Next, we enter the challenge stage. Bob wants to challenge Alice whether he really knows the answer, but he cannot open all the envelopes directly. He can only randomly select any edge and ask Alice to open the pieces of paper of two adjacent nodes to verify whether the colors of the two vertices are not. same.

3. Response to Challenge Phase

Then enter the challenge-response phase, assuming that Bob picks the bottom edge. Alice opens the two nodes specified by Bob as a response to the challenge, asks Bob to check and finds that the colors of the two vertices are different, then Bob thinks the check is correct. But Bob only sees part of the graph. One challenge cannot make him trust him, but multiple challenges may allow Bob to obtain all of Alice's coloring schemes. In the extreme case, Bob checks the colors of adjacent nodes of all edges, thereby completely reconstructing the color scheme. staining scheme.

4. Repeat the process

Therefore, it is necessary to repeat the above three stages multiple times, and Alice will perform a random permutation of the coloring scheme in the commitment stage each time, so that Bob can only obtain information on whether the coloring of the specified two adjacent nodes is equal each time he verifies. With enough repetitions, Bob has a high probability of believing that Alice has a correct coloring scheme. But every time Bob sees the local coloring, it is the result of Alice's transformation. No matter how many times Bob sees it, he cannot spell out a complete three-colored answer. In this process, although Bob has obtained a lot of "information", he has not obtained real "knowledge".

This example provides an intuitive understanding of zero-knowledge proofs. Next, we introduce a concise and versatile zero-knowledge proof system - the Schnorr protocol.

3. Interactive Schnorr Protocol

The Schnorr mechanism is a zero-knowledge proof mechanism based on the discrete logarithm problem. The prover claims to know the value of a key x, and by using Schnorr encryption, it is possible to prove to the verifier the right to know x without revealing x. Can be used to prove that you have a private key but do not disclose the contents of the private key.

The original Schnorr mechanism was an interactive mechanism. The techniques involved in Schnorr are the properties of hash functions and the discrete logarithm problem of elliptic curves.

(The discrete logarithm problem of the elliptic curve means that, knowing the elliptic curve E and point G, and randomly selecting an integer d, it is easy to calculate another point on the elliptic curve Q=d*G, but given Q and G to calculate d is very difficult.)

Suppose Alice has a secret number, a, we can take this number as "private key: sk" and "map" it to a point a*G on the elliptic curve group, aG for short. At this point we refer to it as "public key: PK".

The Schnorr protocol makes full use of the one-way mapping between finite fields and cyclic groups, and implements a concise zero-knowledge proof security protocol: Alice proves to Bob that she has the private key sk corresponding to PK, so how to prove it.

The flow of the interactive Schnorr protocol is divided into three steps:

Step 1: In order to ensure zero-knowledge, Alice needs to generate a random number r first. This random number is used to protect the private key from being extracted by Bob, and will be mapped to the point rG on the elliptic curve group, which is recorded as R and sent to Bob.

Step 2: Bob will provide a random number to challenge, call it c.

Step 3: Alice calculates z = r + a * c according to the number of challenges, and then sends z to Bob, who checks it by the formula: z*G ?= R + c*PK

Since z=r+c*sk, adding the same generator to both sides of the equation gives: z*G= rG + c*(aG)=c*PK+R. It can be verified that Alice does have the private key sk, but the verifier Bob cannot get the value of the private key sk, so this process is zero-knowledge and interactive.

Due to the discrete logarithm problem on the elliptic curve, it is impossible to solve r by R=r*G knowing R and G, so the privacy of r is guaranteed.

However, the whole process is performed in a private secure channel between the prover and the verifier. This is because the protocol has an interaction process, which is only valid for the verifiers participating in the interaction. Other verifiers who do not participate in the interaction cannot judge whether there is collusive fraud in the whole process. Once the two verifiers collude with each other and exchange their own values, The private key can be derived. Therefore, it cannot be publicly verified.

Further analysis, why does the verifier need to reply with a random number c? This is to prevent Alice from cheating.

If Bob doesn't reply with a c, it becomes a one-time interaction. Due to the discrete logarithm problem on elliptic curves, it is impossible to contact a through PK = a * G knowing PK and G, so the privacy of a is guaranteed.

But there is a problem with this scheme, a and r are generated by Alice herself, she knows that Bob will add PK and R and then compare with z * G. So she can construct without knowing a: R = r * G - PK and z = r. So Bob's verification process becomes: z * G ?== PK + R ==> r * G ?== PK + r * G - PK. This is always true, so this scheme is not correct.

Therefore, the private key leakage problem in the interactive Schnorr protocol makes the algorithm unable to be used in a public environment.

This problem can be solved by turning the original interactive protocol into a non-interactive protocol!

Let's see how to turn a three-step Schnorr protocol into one.

4. Non-interactive Schnorr Protocol

Looking back at the second step of the interactive Schnorr protocol, Bob needs to give a random challenge number c, here we can let Alice use the following formula to calculate the challenge number, so as to achieve the purpose of removing the second step of the protocol.

c = Hash(PK, R) 。

where R is the elliptic curve point that Alice sends to Bob, and PK is the public key.

This formula achieves two purposes:
first, Alice has no way to predict c before generating the commitment R, even if c is ultimately generated by Alice.

The second, c is calculated by the Hash function, which will be uniformly distributed in an integer domain and can be used as a random number.

The hash function is "one-way", so that although c is calculated by Alice, Alice does not have the ability to cheat by picking c. Because as soon as Alice generates R, c is equivalent to being fixed.

In this way, the three-step Schnorr protocol is combined into one. Alice can directly send (R, z), because Bob has Alice's public key PK, so Bob can calculate c by himself. Then verify that z*G?=R+c*PK.

As shown in the figure, the three-step Schnorr protocol is combined into one step using the Hash function. Alice can directly send: (R, c, z). And because Bob has PK, Bob can calculate c by himself, so Alice can just send (R, z).

Alice: uniformly randomly selects r, and calculates R=r*G c=Hash(R,PK) z=r+c*sk in turn

Alice: Proof of generation (R, z)

Bob (or any validator): calculate e=Hash(PK, R)

Bob (or any validator): verify z*G?==R+c*PK

5. Schnorr for digital signatures

The Schnorr protocol can be used for digital signatures.

First, in order to ensure that the attacker cannot forge the signature at will, the discrete logarithm problem and the Hash function are used to satisfy the anti-second preimage (anti-collision) as a security assumption.

There are two starting points for proposing digital signatures:

One is that the receiver wants to confirm that the message has not been tampered with during delivery;

Second, if you want to confirm the identity of the sender, it can be understood that the sender has a private key, and the private key is associated with this message.

The identity of the sender must first be proven, which is exactly what the Schnorr protocol does, being able to prove to the counterparty the statement "I own the private key". And this proof process is zero-knowledge and does not reveal any knowledge about the "private key". And c=Hash(m, R) can guarantee that the sender is associated with the message.

The above figure is the Schnorr signature scheme. There is also an optimization here, what Alice sends to Bob is not (R, z) but (c, z), because R can be calculated from c, z.

Analyze the principle of optimization, let n be the number of bits in the size of the finite field. Assuming that a finite field that is very close to 2^256 is used, that is to say, z is 256bit, then the size of the elliptic curve group is almost close to 256bit. In this way, the square root of 2^256 is 2^128, so 256bit The security of the elliptic curve group is only 128 bits. Then, the challenge number c only needs 128 bits. In this way, it is more space-efficient for Alice to send c than to send R, which requires at least 256 bits. c only needs 128bit. Compared with the ECDSA signature scheme, it can save 1/4 of the space.

In-depth zero-knowledge proof (1): Schnorr protocol