a16z: Exploring the efficient and secure path of zkVM

Author: Justin Thaler Source: a16z Translation: Shan Oppa, Bitchain Vision

Zero-knowledge virtual machines (zkVMs)The purpose is to “popularize SNARKs” so that even people without SNARK expertise can prove that they run a program correctly on a specific input (or witness).Its core advantage lies in the developer experience, but currently zkVMs areSecurityandperformanceThere are still huge challenges.If zkVMs want to deliver on their promises, designers must overcome these obstacles.This article will explore the possible stages of zkVM development, and the entire process may be requiredYearsOnly then can it be done – don’t believe anyone saying that this can be achieved quickly.

Challenges

existSecurityOn the one hand, zkVMs are highly complex software projects that are still full of vulnerabilities.

existperformanceAs for, it may be slower to prove that a program is executed correctly than natively.Hundreds of thousands of times, making most applications deployed in the real worldNot feasible for now.

Still, many voices in the blockchain industry still promote zkVMsCan be deployed immediately, and even some projects are already paying high computational costs to generate zero-knowledge proofs of on-chain activity.However, since there are still many vulnerabilities in zkVMs, this approach is actually justAn expensive disguise, making the system look like it is protected by SNARK, when in fact it either depends onPermission control, or worse-Exposure to attack risk.

The reality is that we are still years away from building a truly secure and efficient zkVM.This article proposes a series ofSpecific and phasedThe goal to help us track real progress in zkVM, weaken the hype, and guide the community to focus on real technological breakthroughs.

Safety development stage

background

based onSNARKofzkVMsIt usually contains two core components:

1.Polynomial Interactive Oracle Proof (PIOP): An interactive proof framework for proofing polynomials (or constraints derived from polynomials).

2.Polynomial Commitment Scheme (PCS): Ensure that the prover cannot forge the polynomial evaluation results without being detected.

zkVM passEncode valid execution trajectories into constraint systems, ensure the virtual machine’sregisterandMemoryand then use SNARK to prove the satisfaction of these constraints.

In such a complex system, the only way to ensure that zkVM is vulnerable isFormal verification.The following are the different stages of zkVM security, whereThe first phase focuses on the correctness of the agreement,The second and third stages focus on achieving correctness.

Security Phase 1: The Right Protocol

  1. Formal verification of PIOP reliability;

  2. PCS is binding proof of formal verification under certain encryption assumptions or ideal models;

  3. If Fiat-Shamir is used, the concise argument obtained by combining PIOP and PCS is a formal verification proof of security in the random oracle model (enhanced with other encryption assumptions as needed);

  4. The constraint system applied by PIOP is equivalent to the formal verification proof of semantics of VMs;

  5. All of these sections are fully “glued” into a single, formally proven secure SNARK proof for running any program specified by the VM bytecode.If the protocol intends to implement zero knowledge, this attribute must also be formally verified to ensure that sensitive information about the witness is not disclosed.

If zkVM is usedrecursion, then thePIOP, commitment schemes and constraint systemsAll must be verified, otherwise this sub-stage cannot be considered complete.

Security Phase 2: Correct Verifier Implementation

This stage requires zkVMVerifierThe actual implementation (such as Rust, Solidity, etc.) is carried outFormal verification, ensure that it complies with the protocols that have been verified in the first phase.Complete this phase means zkVM’saccomplishandTheoretical designIt’s consistent, not just oneSecurity Agreement on Paper, or an inefficient specification written in a language such as Lean.

WhyFocus only on the validator, not on the proofThere are two main reasons: First,Ensure that the validator is correct, you can ensure the completeness of the zkVM proof system(i.e.: Make sure the validator is not spoofed to accept a false proof).Second,The verifier implementation of zkVM is more than an order of magnitude simpler than the prover implementation, the correctness of the verifier is easier to be guaranteed in the short term.

Security Phase 3: The correct prover implementation

This stage requires zkVMProofThe actual implementation ofFormal verification, make sure it canProperly generatedProof of the proof system that has been verified in the first and second stages.The goal of this stage isCompleteness, i.e. any system using zkVM will not be able to prove a legal statement.Stay stuck.If zkVM needs to have zero-knowledge attributes, formal verification must be provided to ensure that the proof does not disclose any information about the witness.

Estimated timetable

Phase 1 progress: We can look forward to some progress next year (for example,ZKLibThat’s an effort).But no zkVM can fully meet the requirements of Phase 1 within at least two years.

Stages 2 and 3: These phases can be advanced simultaneously with certain aspects of phase 1.For example, some teams have demonstrated that the implementation of the Plonk validator matches the protocol in the paper (although the protocol itself may not be fully verified).Still, I don’t expect any zkVM to reach Phase 3 in less than four years — maybe even longer.

Key Notes: Fiat-Shamir Security vs. Verified Byte Code

A major complexity problem is thatThere are still unsolved research questions about the security of Fiat-Shamir transformation.All three security phases treat Fiat-Shamir and random oracles as absolutely secure, but in reality there may be vulnerabilities in the entire paradigm.This is due toThere is a difference between the idealized model of a random oracle and the hash function actually used.

In the worst case, one has reachedSafety Phase 2The system mayFound completely unsafe due to Fiat-Shamir-related issues.This deserves our high attention and continuous research.We may needModify the Fiat-Shamir transformation itself to better defend against such vulnerabilities.

Systems that do not use recursion are theoretically safer, because some known attacks involve circuits similar to those used in recursive proofs.But this risk is still aUnsolved basic issues.

Another issue to note is that even if zkVM proves that a certain compute program (specified by the bytecode) isPerform correctly, but ifThe bytecode itself has flaws, then the value of this proofExtremely limited.Therefore, the practicality of zkVM depends to a large extent onHow to generate formally verified bytecode, and this challengeExtremely huge,andBeyond the scope of this article.

About quantum safety

Quantum computers will not pose a serious threat for at least 5 years (or more), while software vulnerabilities are a matter of life or death.Therefore, the current priority should be to achieve the security and performance goals proposed in this article.If non-quantum-safe SNARKs can meet these goals faster, we should prioritize them.Consider switching when quantum-resistant SNARKs catch up, or when there are signs that quantum computers with actual threats are about to emerge.

Specific security level

100-bit classic securityIs any SNARK used to protect valuable assetsMinimum standard(But there are still some systemsNot meeting this low standard).Even so, this is stillShould not be acceptedStandard cryptography practice usually requires128-bit Security and above.If SNARK’s performance is truly up to standard, we should not reduce safety in order to improve performance.

Performance phase

Current situation

Currently, zkVMProofThe calculation overhead is approximately1 million times native execution.In other words, if the native execution of a program requiresX CPU cycles,SoGenerate proof of correct executionApproximately neededX × 1,000,000 CPU cycles.This situation isSo this was a year ago, and still this is today(Although there are some misunderstandings).

Some popular statements in the industry today can be misleading, such as:

1.“The cost of generating proof for the entire Ethereum mainnet is less than $1 million per year.”

2.“We almost implemented real-time proof generation of Ethereum blocks, requiring only dozens of GPUs.”

3.“Our latest zkVM is 1000 times faster than previous generations.”

However, these claims can be misleading without context:

1000 times faster than older zkVM, and may still be very slow, this is more tellingHow bad was the past, not how good is now.

The computing volume of Ethereum main network may increase by 10 times in the future, which will make the current performance of zkVM far less likely to keep up with the demand.

• The so-called “almost real-time” proof generation is still under the needs of many blockchain applicationsToo slow(For exampleOptimism has a block time of 2 seconds, much faster than Ethereum’s 12 seconds).

“Does of GPUs run 24/7 for a long time”Not availableEnough activity guaranteed.

• TheseProof generation time is usually for proof sizes over 1MB, and this is for many applicationsToo big.

“Cost less than $1 million a year”Just becauseEthereum full node only performs calculations of about $25 per year.

For application scenarios outside of blockchain, this computing overhead is obviously too high.No matter how many parallel computing or engineering optimizations are, it cannot make up for such huge computing overhead.

The basic goal we should set is: performance overhead is no more than 100,000 times the native execution.But even so, this is still only the first step.If we want to implement truly large-scale mainstream applications, we may need to reduce the overhead to 10,000 times or less of native execution.

Performance measurement

SNARK performance has three main components:

1.The inherent efficiency of the underlying proven system.

2.Optimization for specific applications(e.g. precompiled).

3.Engineering and hardware acceleration(e.g., GPU, FPGA, or multi-core CPU).

While (2) and (3) are critical for actual deployment, they are suitable for any proof system, thusImprovements that may not necessarily reflect basic overhead.For example, adding GPU acceleration and precompilation to zkEVM can easily improve speed by 50 times faster than relying solely on CPUs – which may make one inherently less efficient system look better than another that has not been the same optimization.

Therefore, this article focuses on measuringBasic performance of SNARK without dedicated hardware and precompilation.This is different from the current benchmarking approach, which typically combines all three factors into a “popular value”.It’s likeJudge diamonds by polishing time, rather than evaluating their inherent clarity.

Our goal isThe inherent overhead of isolating universal proof systems, lower the barriers to entry for technologies that have not yet been studied, and help the community eliminate distractions, thereby focusing onProve the real progress in system design.

Performance phase

Here are the milestones for the five performance phases I have proposed.First, we need to significantly reduce prover overhead on the CPU before we can further rely on hardware to reduce overhead.At the same time, memory usage must also be improved.

In all stages,Developers should not adjust code for the performance of zkVM.Developer experience is the core advantage of zkVM.If DevEx is sacrificed to meet performance benchmarks, it will not only lose the significance of benchmarking, but also violate the original intention of zkVM.

These indicators focus mainly onProof of the Cost.However, if the validator is allowedUnlimited cost growth(i.e., unlimited proof size or verification time), then any prover indicator can be easily met.Therefore, the requirements of the following stages must be met,The maximum proof size and maximum verification time must be defined at the same time.

Phase 1 Requirements: “Rational non-trivial verification costs”

Proof size: Must be smaller than the witness size.

Verification time: The speed of verification proof must not be slower than the native execution of the program (i.e., it must not be slower than the direct execution of the calculation).

These areMinimum simplicity requirements,make sureProof size and verification time are no worse than sending a witness directly to the validator and having it check it directly.

Stage 2 and above

Maximum proof size: 256 KB.

Maximum verification time:16 milliseconds.

These caps are intentionally set looser to accommodate novel fast proof technologies, even if they may bring higher verification costs.At the same time, these upper limits exclude proof that is so expensive that few projects are willing to use on the blockchain.

Speed ​​Stage 1

Single-threaded proof must not be more than 100,000 times slower than native execution(Applicable to multiple applications, not just Ethereum block proof) and must not rely on precompilation.

Specifically, suppose that the RISC-V processor on a modern laptop runs at about .3 billion cycles per second, then reaching stage 1 means that the notebook can30,000 RISC-V cycles/secondproof of speed generation (single thread).

The validator cost must meet the previously defined “reasonable non-trivial verification cost” standard.

Speed ​​Phase 2

Single-threaded proof must not be more than 10,000 times slower than native execution.

or, Since some promising SNARK methods (especially binary domain SNARK) are limited by the current CPU and GPU, this phase can be met through FPGAs (or even ASICs):

1. Calculate the number of RISC-V cores that the FPGA simulates at native speed.

2. Calculate the number of FPGAs required to simulate and prove RISC-V execution (near real-time).

3. If (2) quantityNo more than 10,000 times (1), then stage 2 is satisfied.

Proof size: Maximum 256 KB.

Verification time: Maximum 16 milliseconds on standard CPU.

Speed ​​Stage 3

In reachingSpeed ​​Phase 2Based on theProof of proof expenses within 1000×(Applicable to multiple applications) and must be usedPrecompilation of automatic synthesis and formal verification.Essentially,Dynamically customize the instruction set of each program to speed up proof generation, but it must be guaranteedEase of use and formal verification.(aboutWhy is a double-edged sword precompiled, and why “handwritten” precompilation is not a sustainable approach, see the next section.)

Memory Stage 1

In case of less than 2 GB of memoryachieveSpeed ​​Stage 1, and satisfyZero knowledge requirements.This stage forMobile device or browserIt is crucial andA large number of client zkVM use casesOpened the door.For example, a smartphone is usedLocation privacy, identity credentials, etc..If the proof generation requires more than 1-2 GB of memory, most mobile devices will not work.

Two important notes:

1. Even for large-scale computing (which requires trillions of CPU cycles of native execution), the proof system must maintain a 2 GB memory cap, otherwise the applicability will be limited.

2. If the proof is extremely slow, it is easy to maintain a 2 GB memory cap.Therefore, in order for memory stage 1 to make sense, speed stage 1 must be reached within the 2 GB memory limit.

Memory Stage 2

In less than 200 MB of memoryachieveSpeed ​​Stage 1(10 times faster than memory stage 1).

Why reduce it to 200 MB?Consider aNon-blockchain scenarios: When you visit an HTTPS website, authentication and encryption certificates will be downloaded.If the website sends zk proofs for these certificates instead, large websites may need to generate each secondMillions of proofs.If 2 GB of memory is required per proof, the computing resource requirement will be metPB level, obviously not feasible.Therefore, further reduce memory usageNon-blockchain applicationsIt is crucial.

Precompiled: Last mile, or crutches?

PrecompiledRefers toSNARK constraint system optimized specifically for specific functions (such as hashing, elliptic curve signatures).In Ethereum, precompilation can reduce the overhead of Merkle hashing and signature verification, but over-reliance on precompilation cannot really improve the core efficiency of SNARK.

Precompilation issues

1. Still too slow: Even if precompiled with hash and signatures, zkVM still has the inefficiency problem of core proof systems inside and outside the blockchain.

2. Security vulnerabilities: If handwritten precompilation is not formally verified, there are almost certain vulnerabilities, which may lead to catastrophic security failure.

3. Poor developer experience: Currently, many zkVMs need developersHandwriting constraint system, similar to the programming method in the 1960s, seriously affecting the development experience.

4. Benchmark misleading: If benchmarks rely on optimization of specific precompilations, it may mislead people to focus on optimizing manual constraint systems rather than enhancing the SNARK design itself.

5. I/O overhead and RAM-free accessWhile precompiling can improve performance on heavy crypto tasks, they may not provide meaningful acceleration for more diverse workloads, as they incur significant overhead when passing input/output, and they cannot use RAM.

Even in a blockchain environment, as long as you go beyond a single L1 like Ethereum (for example, you want to build a series of cross-chain bridges), you will face different hash functions and signature schemes.Continuous precompilation to solve this problem is neither scalable nor poses a huge security risk.

I do believe that precompilation is still crucial in the long run, but it will only happen once they are automatically synthesized and officially validated.This way, we can maintain the developer experience advantages of zkVM while avoiding catastrophic security risks.This view is reflected in Phase 3.

Expected timetable

I expect a few zkVMs to reach later this yearSpeed ​​Stage 1andMemory Stage 1.I think we can achieve this in the next two years.Speed ​​Phase 2, but it is not clear whether this goal can be achieved without new research ideas.

I expect the rest of the stages (Speed ​​Stage 3andMemory Stage 2) It will take years to achieve.

Although this article lists the security and performance stages of zkVM respectively, the two are not completely independent.As vulnerabilities in zkVM continue to be discovered, I expect that fixes of some of these vulnerabilities will inevitably lead to a significant performance drop.Therefore, in zkVM, it is achievedSafety Phase 2Previously, the performance test results should be considered tentative data.

zkVM has great potential in making zero-knowledge proven to be truly popular, but it is still in its early stages – full of security challenges and severe performance bottlenecks.Market hype and marketing publicity make measuring real progress difficult.With clear security and performance milestones, I hope to provide a roadmap that will clear the fog.We will eventually reach our goal, but it will take time, as well as continuous efforts in research and engineering.

  • Related Posts

    Ethereum’s two major upgrades to Pectra and Fusaka are explained in detail. What will be brought to ETH?

    Author: David C, Bankless; Compilation: AIMan@Bitchain Vision Ethereum continues to develop steadily behind the scenes.Amid the ongoing debate about the “right” roadmap for computers in the world, developers have been…

    The latest updates from ETH and Solana: What are the things to pay attention to?

    Author: Lao Bai Source: X, @Wuhuoqiu FinishedRWA, sayETHandSolanaSomething worth mentioning above. The most worth mentioning on ETH should be the Native Rollup proposed by Justin some time ago. This has…

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    You Missed

    From traditional replication to innovation Can Backpack seize the future?

    • By jakiro
    • March 26, 2025
    • 22 views
    From traditional replication to innovation Can Backpack seize the future?

    Saylor’s $200 trillion BTC strategy: U.S. BTC domination and immortality

    • By jakiro
    • March 26, 2025
    • 20 views
    Saylor’s $200 trillion BTC strategy: U.S. BTC domination and immortality

    Ethereum’s two major upgrades to Pectra and Fusaka are explained in detail. What will be brought to ETH?

    • By jakiro
    • March 26, 2025
    • 22 views
    Ethereum’s two major upgrades to Pectra and Fusaka are explained in detail. What will be brought to ETH?

    Coingecko: How do investors view the potential of crypto AI technology?

    • By jakiro
    • March 26, 2025
    • 53 views
    Coingecko: How do investors view the potential of crypto AI technology?

    Galaxy: Research on the current situation of Futarchy governance system and on-chain forecast market

    • By jakiro
    • March 26, 2025
    • 18 views
    Galaxy: Research on the current situation of Futarchy governance system and on-chain forecast market

    The latest updates from ETH and Solana: What are the things to pay attention to?

    • By jakiro
    • March 25, 2025
    • 20 views
    The latest updates from ETH and Solana: What are the things to pay attention to?
    Home
    News
    School
    Search