Why should we pay attention to the benchmarks of each ZKP solution?

1 Introduction

Recently, there has been a lot of debate among researchers and engineers about who has the best proof system:

Insert image description here

However, before we get into the details of the benchmark, there needs to be some clearer criteria for what makes something more performant or useful in terms of engineering. In addition, performance and applicability sometimes depend on the application scenario, Zac Williamson pointed out:

  • SNARKs may be more advantageous in client-side proofs.

Currently, there are three major public and widely discussed strategies for performance:

  • 1)Folding schemes
  • 2)Lookup singularity
  • 3)STARKs with small fields

Over time, these ideas may coalesce. At the same time, there needs to be a way to analyze their actual potential:

  • These different strategies and proof systems can be analyzed using envelope calculations. but:

    • Envelop calculations are only estimates of the total number of operations and should always be treated with caution.
    • Envelop calculations may be useful in assessing whether one system or algorithm is better than another, but they cannot be used as the final measure of performance.

    Similar to asymptotic complexity: it can be argued that certain algorithms may be optimal from a complexity perspective, but these algorithms have no practical use cases (such as the famous Galactic algorithm ).

  • Furthermore, in engineering, problems are multidimensional, with many interactions between different parts:

    • There are limitations in memory, data communications, hardware acceleration, code maintainability, economics, etc.
    • For example, memory access patterns may cause programs with fewer instructions to run slower if not adapted to caching algorithms, data prefetching, and other memory optimizations.
    • The complexity increases when additional considerations must be made for algorithm and GPU parallelism, and even increases when computations are distributed among multiple machines.
    • In some cases, an efficient algorithm that can only run on a single machine may perform worse than other algorithms that are less efficient and can be distributed across multiple devices.
    • Again this is very similar to what Zac mentioned. Depending on the actual application scenario, there may be different algorithm selection criteria.
    • In software, most of the time, multiple solutions to a problem are selected and used depending on the scenario, or even mixed when needed.
    • Thinking that there is one grand solution to all problems that is optimal in all situations risks overestimating the complexity of the application world.
    • There are claims about the number of operations that do not take into account limitations imposed by the hardware, or that use a special domain family to calculate the number of operations, but that domain is not suitable for the selected elliptic curve type. For example, commonly used pairing-friendly elliptic curves are defined on primes that do not have the same type of valid operations, such as Mersenne primes or "MiniGoldilocks" primes.

Similarly, Justin Thaler also pointed out the complexity in real engineering systems. ustin Thaler asked why Starkware continues to use a fairly large finite domain, even though it offers no advantages over smaller domains. the reason is simple:

  • SHARP was developed ahead of many improvements and has been in production use for many years.
  • More importantly, for production-ready software, we need more than just a prover.
    • We need languages, compilers, virtual machines, developer tools and blockchain sequences.
    • There is a lot of work to be done, and on a production system of enormous value, it may be foolhardy to rush to improve the prover with every possible upgrade.
    • There's a lot of engineering work involved in getting from a great idea in a paper to an actual production-ready system, and along the way we always find more difficulties that weren't initially considered, or were difficult to foresee.

The key is to conduct critical analysis through evaluation and a good understanding of potential solutions. It has been seen that some claims like STARK use over 100 GB of RAM for applets. It's unclear what the standard of comparison will be and how many GB the replacement will use. It's important to leverage open source software, leverage tools developed by others, check that they work as specified, and corroborate the numbers.

Nova and Lasso bring interesting ideas that can lead to new solutions for other proof systems. Folding schemes such as Nova can help solve many problems related to SNARKs based on Plonkish or R1CS arithmetic.
In the case of Cairo Prover, there is a strategy to compress the constraints. Cairo AIR contains constraints on all instructions of a Turing-complete virtual machine:

  • The number of constraints does not vary with the size of the calculation
  • The execution trace grows linearly with the size of the program.
  • Then the execution trace is interpolated and constraints are strengthened through quotients.

Therefore, the relevant metric for Cairo Prover is:

  • The number of steps in the procedure, not the number of constraints. Some common calculations or transactions (such as ERC-20 contracts) should be measured fairly.
  • Also be cautious about treating the speed of a single task as the only thing that matters.
  • A clean code base, ease of maintenance and updates, robustness, security, memory usage and auditability are also factors to consider.

What Celer Network did in the benchmark test was:

  • Trying to make a fair comparison between different proof systems, using SHA-256 as an example circuit.

That said, always keep in mind that it can become tempting for a project or a specific team to over-optimize their codebase for a specific benchmark. Celer pointed out:

  • It’s difficult to compare Nova because, “It’s important to realize that Nova is not directly comparable to other frameworks in terms of time and computation. This uniqueness stems from the incremental computing power that Nova provides. Simply put, The decomposition of the entire calculation into more detailed steps naturally reduces memory consumption, although this may result in an increase in calculation time."

At the same time, some proof systems are not fully optimized, which could change the trend. The memory vs. speed trade-off may be convenient for some use cases but not others.

Another point worth noting is that some people tend to add constraints that do not exist in practice, or tend to generalize a strategy used by one company to all other possible implementations. For example, if A uses Poseidon as a hash function, then it is assumed that B, C, and D should also use Poseion, even though this may not be suitable for B, C, and D's specific applications.
In a recursive environment, a SNARK proof can be used to "verify a STARK proof", which has many use cases.
Of course, if there is a tree of recursive proofs of verifications, then using a faster hash function (such as Blake2) on the leaf nodes, and then in the second layer proving that we verified proofs that used Blake2, with Poseidon or other hash, then There won't be any inconvenience.

There should be clear benchmarks for the code used in production. Of course, some new technologies or ideas may be promising and we should explore them, but one should never be too hasty in jumping on the next boat, especially when a user’s assets or privacy is at stake. The Lambda Class team will implement different proof systems in the Lambdaworks library so that anyone can easily run the bench and check which one works best for them. In addition, if any system is optimized, anyone can submit a PR to improve it. The Lambda Class team is not a maximalist about any proof system; what the Lambda Class team wants is for the technology to succeed and to develop applications based on it. If a particular system works better, the Lambda Class team will learn it and use it.

The Lambda Class team believes:

  • Debating and holding different views is important to come up with new ideas and improvements from which people can benefit.
  • Having open source code, not just papers, that can be used to tweak, analyze, and play around with proof systems is critical to being able to make comparisons.
  • Starkware has just open sourced the battle-tested Stone prover , which will help improve and compare between various strategies.
  • ZPrize proposes open source optimizations for common problems in zero-knowledge proofs. This gives one the opportunity to explore different strategies and arrive at the algorithm that works best in practice.

References

[1] Lambda Class 2023年9月博客 Don’t trust, verify or why you should care about benchmarks

Guess you like

Origin blog.csdn.net/mutourend/article/details/133556308