Where Zero-Knowledge Machine Learning (zkML) Fits into the Bigger Picture of AI
Diving deep into the role of privacy-preserving solutions in modern day AI (so hopefully it won't take over the world)...
The recent advent of large language models (LLMs) like ChatGPT and text-to-image models like Stable Diffusion have demonstrated the remarkable potential of artificial intelligence (AI), yet also surfaced urgent ethical challenges. As these models rapidly scale by training on expansive internet data, one critical concern is that of privacy. For example, LLMs accumulate vast reserves of sensitive information, with research showing that private data can be extracted from models like GPT. This underscores the need for privacy-enhancing technologies to mitigate the unintended consequences of unchecked AI.
In this primer, I introduce an emerging ecosystem, zero-knowledge machine learning (ZKML), as one such promising technique to reconcile the benefits of ML with data privacy (leveraging zero-knowledge proofs to evaluate ML models without revealing underlying data). Providing an overview of machine learning and some key underlying principles, I highlight the inherent risks of such technology, where cryptography fits in & how it works under the hood, key use cases, what stage the ecosystem is currently in, and how to be an investor in the space.
While early stage, ZKML could act as a horizontal middleware that enables privacy-preserving ML across all industries, particularly areas like finance, healthcare, and web3. As AI proliferates, our collective contribution to ensure accountability among agents and models will prove essential to advance AI responsibly. Let’s dive in…
TLDR;
In the epoch of AI, zkML (Zero-Knowledge Machine Learning) is a privacy preserving solution introducing cryptography as a way of providing accountability for users
Primary value-add includes 1) A model consumer may wish to keep their inputs private from a model provider and 2) A model consumer wants to verify that the generated inferences do in fact come from the specified model in question (verify & audit)
Current GTM remains largely in crypto (in specific DeFi) but the grand vision are use cases in traditional MLaas settings and security
zkML remains in its earliest stages of development, with even the more mature projects exploring PMF; zk will have to find a niche within the broader context of privacy preserving solutions
Biggest challenges come from compute cost/accessibility, scalability, and security
Although the market potential is huge, value capture, PMF, and technical limitations will be the biggest question facing investors
See last section for my thoughts on the role verifiable ML will play in the future and an analogy
Lastly, zkML is continuing to grow rapidly & is an exciting opportunity to explore further
If you just want the main course, jump HERE for my conclusions & key recommendations
Thanks to many members from the ZKML telegram community, with special thanks to my friends Daniel Shorr from Modulus Labs for feedback and Ryan Barney from my time with Pantera Capital this summer for the initial research idea. Please subscribe!
Machine Learning For Beginners
At a high level, machine learning is a subfield of AI that refers to algorithms that can improve and learn from experience automatically without explicit programming. The rapid advancement of machine learning techniques holds significant promise in addressing complex challenges across various domains by leveraging data-driven insights and predictions to improve decision-making and optimize outcomes. Modern machine learning has become highly empirical and data-driven, with deep neural networks (NN) trained on large datasets at the forefront.As these models become more sophisticated, they are poised to revolutionize numerous industries, transforming the way we live, work, and interact with technology.
Here’s a quick visual of a NN process:
Taken From SevenX Research.
Magic in the Black Box
ML models typically have three primary components:
Training Data: A set of input data that is used to train a machine learning algorithm to make predictions or classify new data. Training data can take many forms, such as images, text, audio, numerical data, or a combination of these.
Model Architecture: The overall structure or design of a machine learning model. It defines the types and number of layers, activation functions, and connections between nodes or neurons. The choice of architecture is case dependent.
Model Parameters: Values or weights that the model learns during the training process to make predictions. These values are adjusted iteratively through an optimization algorithm to minimize the error between predicted and actual outcomes.
Models are produced and deployed in two phases:
Training Phase: During training (pre-training, fine-tuning), the model is exposed to a labeled dataset and adjusts its parameters to minimize the error between predicted and actual outcomes. The training process typically involves several iterations or epochs, and the accuracy of the model is evaluated on a separate validation set.
Inference Phase: The inference phase is when a trained machine learning model is used to make predictions on new, unseen data. The model takes in input data and applies the learned parameters to generate an output, such as a classification or regression prediction.
Here’s an analogy from a friend: All throughout K-12th grade, you were gaining general knowledge across many subjects like math, science, history, and language arts. This is like the pre-training phase for a machine learning model, where it learns general patterns from a diverse dataset. Then when you went to college, you chose a major and took more specialized classes focused just on that subject. This is like the fine-tuning stage, where the model becomes an expert in a narrow domain. After graduation, you started your career and used your specialized knowledge from college to do your job. This is like the inference phase, where the trained ML model applies what it learned to generate predictions on new, real-world data. Just as your early broad education provided a foundation before specializing in college, pre-training gives models basic abilities before fine-tuning tailors them to a specific task. And like you using your college learning in your career, the inference stage is when models use their trained knowledge to perform useful work.
Key Problems with Machine Learning
Among others, here are five of the biggest problems we currently face in ML:
Accuracy: Hallucinations, over/underfitting, and concept drift are hard to mitigate. Trade-offs of efficiency and accuracy must be made.
Alignment: Biased data (even when unknown) leads models to inherit and amplify problematic biases. Models can discriminate against certain demographic groups, raising ethical concerns.
Privacy concerns: Data leaks or model manipulation pose security risks for sensitive user data. Use of private data raises ethical issues around informed consent and anonymity.
Lack of transparency & explainability: Opaque inner workings make models inscrutable black boxes. Lack of logic and reasoning mean trust gaps with any output.
Barriers of Accessibility & Compute: Even with open-source resources like Hugging Face/PyTorch/TensorFlow, computing & data costs limit most users' access to the state-of-the-art models they desire.
While these five problem areas pose significant challenges, promising solutions are emerging for each, from technical advances to ethical frameworks. Although these challenges are intertwined, I will largely be focusing this piece on exploring the privacy concern mitigation, touching partly on tangential topics including introducing accountability, transparency, and equitable use of such technology.
An Emerging Solution: zkML
What is Zero-Knowledge (ZK)?
The zk part of zkML stands for zero-knowledge proofs (ZKPs), a cryptographic tool that addresses two critical challenges in web3: scalability and privacy. A zero-knowledge (ZK) proof is a cryptographic protocol in which one party, the prover, can prove to another party, the verifier, that a given statement is true, without revealing any additional information beyond the fact that the statement is true. Here is an analogy for ZKPs.
The two main building blocks (primitives) that ZK brings to the table are the ability to create proofs of computational integrity for a set of given computations, where the proof is significantly easier to verify than it is to perform the computation itself (succinctness). ZK proofs also provide the option to hide parts of said computation whilst preserving computational correctness ( “zero-knowledge.”)
Generating zero-knowledge proofs is very computationally intensive, many times as expensive as the original computation. This means that there are some computations for which it is infeasible to compute zero-knowledge proofs because the time it'd take to create them on the best hardware available makes them impractical. However, advancements in the field of cryptography, hardware, and distributed systems in recent years have allowed zero-knowledge proofs to become feasible for ever more intensive computations. If you’d like to learn more about how ZKPs work, please see work from my friend Michael Blau at a16z here.
zkML & How It Works
Now we get to put it all together. zkML offers a promising approach to resolve the privacy problem within machine learning through the use of cryptographic proofs. As a preface, zkML today is primarily focused on the inference stage of ML models rather than on the training phase primarily due to the computational complexity of verifying training in-circuit.
Although zkML definitions vary and oftentimes encompasses many things beyond inference & verifiable compute, that is what I will be focused on for this section. At a high level, there are four verified inference scenarios. Current zkML solutions are primarily tailored towards those scenarios in green below:
Private Input, Public Model: A Model Consumer (MC) may wish to keep their inputs private from a Model Provider (MP). For example, a MC may wish to prove the result of a credit-scoring model to a lender without disclosing their personal financial information. This could be done using a pre-commitment scheme and running the model locally.
Public Input, Private Model: A common issue with ML-as-a-Service is that the MP may wish to hide their parameters or weights to protect their IP, and the MC wants to verify that the generated inferences do in fact come from the specified model in an adversarial setting. Think of it this way: a MP has an incentive to run a lighter model to save on costs when serving inferences to a MC. Using a commitment of the model weights on-chain, a MC can audit a private model at any time. Here’s an example:
Taken from Daniel Kang
Private Input, Private Model: This scenario arises when the data being used for inference is highly sensitive or confidential, and the model itself is hidden to protect IP. An example of this might include auditing a healthcare model using private patient information. Compositional techniques in zk or use of multi-party computation (MPC) or variations of Fully Homomorphic Encryption (FHE) can be used in this scenario. Refer to Zama for more information.
Public Input, Public Model: When all aspects of the model can be public, zkML solves for a different use case: compression and verification of off-chain computation to an on-chain environment. For larger models, it is more cost-effective to verify a succinct zk proof of an inference than to re-run the model themselves.
In decentralized applications, ZKML enables running computationally intensive machine learning models off-chain while still allowing on-chain smart contracts to leverage the outputs. The core innovation is the creation of a proof that says "trust me, this model produced this output for a given input" without revealing anything else about the model or data. This proof can then be efficiently verified on-chain. Through this, ZKML provides a privacy preserving technique that inherits trust in model behavior and the technical means to incorporate advanced machine learning into decentralized environments.
Why Now & Value-Add
The explosive growth of AI models even just year-to-date heralds a new era of pervasive technology. As adoption begins to spread to the enterprise level and to individuals (personalization models), the need for privacy-preserving solutions becomes paramount. LLMs ingest unfathomable amounts of data, raising concerns around security and consent. Meanwhile, as AI permeates high-stakes domains like finance, healthcare, and law, verifying model behavior without compromising sensitive inputs is essential.
Fortunately, zero-knowledge machine learning (zkML) has emerged as a timely solution. As illustrated by innovations like providing model authenticity & integrity, enabling decentralized applications, and proving personhood, zkML seeks to deliver privacy, security, and trust. As AI capabilities grow more powerful and far-reaching, it is crucial that society have technical means to preserve individual rights.
Potential Use Cases & State of zkML
Now let's get into the good part…
Emerging Use Cases
Traditional ML: easily the biggest potential here. This is what could bring greater enterprise adoption of GenAI. New models built with trust and transparency will arise. Applying ZK to LLMs (especially at the fine-tune/inference stages) would give a better experience to LLM users and model owners (search/chatbot on crypto rails). Decentralized marketplaces for buying and selling models or running competitions can verify accuracy while keeping model details confidential (decentralized Kaggle). For gen AI, prompt creators can sell their IP without full disclosure while still demonstrating outputs (MLaas).
Security: the general sense in ML’s benefit to blockchain is introducing intelligent smart contracts across all verticals, not just security. Community-agreed upon anomaly detection models trained on chain data could provide automated fraud monitoring and contract pausing. This prevents reliance on slow governance or centralized intervention. Imagine the ability for autonomous Layer 1s to stop themselves in case of emergency in a timely manner. Smart contracts can also interact with AI models directly and use them as triggers for actions and verification.
Identity: Replacing private keys with biometric authentication for a more seamless user experience (see Worldcoin). Imagine a crypto wallet which can now use facial recognition as a security factor. Anonymous profiling of user activities and contributions enables fair distribution of airdrops and rewards without compromising privacy.
DeFi (current main GTM): automated protocols can leverage data-driven ML while remaining transparent and tamper-proof. Verifiable off-chain ML oracles allow prediction markets and parametric insurance protocols to leverage advanced models without exposing sensitive data. Investment managers can demonstrate adherence to algorithmic trading strategies without revealing profitable model details.
DeSo, Creator Economy, Gaming: Enables community-driven filtering and moderation without centralized censorship. Personalized advertising while safeguarding your data as well. In-game economic balancing and NPC coordination become verifiable. Users can receive personalized on-chain recommendations securely trained on private data.
Of these use cases, the one I am most interested in is the application of traditional MLaas being deployed at the enterprise level. Industries I see most prominently needing such solutions are finance, healthcare, law, defense, and distributed (decentralized) systems, including applications leveraging blockchain.
Current Landscape
Most existing projects within the ecosystem focus on niche proof-of-concept though adoption is accelerating. Even the more developed projects are still currently exploring PMF, and therefore zkML stays a frontier tech. That said, although a bit outdated, below is a snapshot (not all-inclusive) of what is being built (note that “zkML” often refers to and includes adjacent projects). A couple of stealth projects (TBA) I’ve seen in the past couple weeks are a part of the growing ecosystem as well.
Taken from zkML Github. I am working on an updated market map with a few individuals, reach out via telegram @justinchen1 if you’d like to contribute.
Of these, the most well known projects are detailed below. For the sectors below, each of the two projects pose a different avenue of achieving a similar solution.
AI On-chain
Modulus Labs: Pioneer in the space, targeting traditional ML through efficient on-chain inference (tamper proof AI). Check out their groundbreaking paper on zkML results. TL;DR: Ran model with 18M parameters in 50 seconds, running on AWS (64 cores, 192 GB RAM) using the plonky2 proving system.
“The grand vision of zkML is an idea of faithful AI. Think of a USDA Organics label but for AI, where tamper proof AI solutions give a “health check” for each model, just like nutrition facts (but in this case bias + alignment)” – Daniel Shorr, Modulus
GizaTech: Leveraging Starkware’s ZKP infra to enable private and verifiable ML and deployment
Axiom: Ethereum’s custom circuit zk coprocessor using zk-SNARKs to allow confidential on-chain data analysis and verifiable off-chain computation
Risc Zero: A developer of a general purpose zkVM for creating verifiable applications
Delendum: zkVM leveraging plonky3 proving system for blockchain infrastructure, private computing, and zk apps
DeCompute (stay tuned 👀 )
Gensyn: A decentralized compute network offering privacy-preserving distributed training of machine learning models
FedML: Project & framework bringing together federated learning and zk tech to enable confidential & trusted ML in a decentralized environment
Akash: The first appchain on Cosmos, Akash is a decentralized open-source cloud computing marketplace that enables anyone to buy and sell cloud compute resources. Although no zk components yet, $AKT’s roadmap includes implementing a p2p zkKYC.
Some Notes:
EZKL is simply a library focused on making machine learning frameworks compatible with zero-knowledge cryptography to support confidential model deployment
Zama provides FHE (different from ZK) tooling for ML & blockchain. Pioneer in encrypted AI privacy & even further out on frontier tech scale
As of early September, Worldcoin boasts about 2.3M world ID sign ups from 120 countries (most developing) since launching a little less than two months ago
A new ASIC architecture called TensorPlonk, coined as a “GPU” for zkML that can deliver up to 1000x speedups for certain classes of models, has been introduced in the past couple of weeks
Of these, although too early to tell, I am more keen on Axiom, Delendum, Akash, Gensyn, and Zama. Modulus too is a pioneer I definitely would stay up to date with. I arrived at these conclusions through [basic] analysis mainly of team, problem approach, community sentiment/feedback, and execution, among others. Some decisions came on a basis of whether I believed in general purpose vs. specific purpose (though they will co-exist).
Outlook
Path To Adoption & Market Sequencing
While there are a solid amount of open-source resources out there, much more needs to be done. Let’s take a look at how far we are (& how we get there) to large-scale adoption of zkML.
Noted by SevenX Research, Modulus Labs' paper has given us some data and insights into the feasibility of zkML applications by testing Worldcoin (with strict precision and memory requirements) and AI Arena (with cost-effectiveness and time requirements):
If Worldcoin uses ZKML, the prover's memory consumption will overwhelm any commercially available mobile hardware. If AI Arena's tournament uses ZKML, using ZKCNNs would increase the time and cost to 100x (0.6s vs original 0.008s). Basically, the compute needed for ML and more so zk are already somewhat respectively intensive, and computing the combination of zkML is asymptotic (think like a 1+1 = 100x) and therefore not yet feasible for directly applying the tech to prove time and prove memory usage (zkML projects, given their resources, can pretty much only run a two-digit number of proofs a day as of now). FHE is ballparked to be 2x more computationally intensive than zk.
“Unfortunately, ZKML is currently too slow for practical applications. For example, proving the Twitter model currently takes 6 hours to prove for a single example using ezkl. Verifying the tweets published in one second (~6,000) would cost ~$88,704 on cloud compute hardware. Enter TensorPlonk, a “GPU” for ZKML. We’ve developed a new proving system to enable high-performance proving for a wide range of ML models. Using TensorPlonk, the proving cost for [the Tweet] example above would be ~$30 compared to ~$88,704.” – Daniel Kang, ZK Researcher
Although the improvement rate of cutting-edge ML compared to that of ZK is far superior, new ZK systems and hardware/ASICs advancements such as optimization speed (The performance growth of ZK systems basically follows "Moore's Law"-like paces) pose great potential. New come out almost yearly, and we expect prover performance's rocket growth to continue for a while. I suspect a 2-3 year time frame until we reach a point of adequate hardware to feasibly run what’s necessary at a moderate scale. By then, the majority of enterprises will have adopted some sort of vertical LLM, meaning integration/middleware solutions could pose great opportunities. As for now, though we might have a hard time using blockchain + ZK to verify that the information ChatGPT feeds me is trustworthy, we might be able to fit some smaller and older ML models into ZK circuits.
Software innovations are just as vital for real-world deployment. On the cryptography side, advances & optimizations in protocols leveraging STARKs/SNARKs will enable proofs for increasingly complex ML models. Libraries and compilers that lower the barrier to converting ML frameworks into verifiable algorithms will need maturation. Tooling for efficiently generating proofs and integrating them with smart contracts or other applications remains limited. Finally, on the ML side, techniques like knowledge distillation and model compression will allow larger neural networks to be compressed into forms viable for existing ZK proof systems. Trading some accuracy for efficiency is necessary in the near term. The path forward requires co-design of algorithms, software stacks, and hardware acceleration to make ZKML scalable and usable.
Challenges & Limitations
Beyond having a challenging road ahead to PMF & competition from traditional tech companies, key core challenges remain:
1) Precision: Quantizing models into fixed-point numbers must be carefully balanced with the accuracy trade-off
2) Scalability: Circuit sizes quickly become unmanageable for large neural networks. Efficient proofs for matrix multiplication operations common in ML must be further optimized
3) Security: risks around adversarial attacks during training and potential model stealing through reverse engineering of weights remain
While solutions like new model architectures, layered proving systems, and federated learning help, work is still needed to push zkML towards production readiness across problem domains. The community continues to actively research solutions but core limitations around overhead and attack risks mean applications will need to be chosen wisely where privacy and security benefits outweigh costs.
Potential & zkML’s Role in a Broader Landscape
According to Fortune, The global artificial intelligence market size was valued at $428 billion in 2022 & is projected to grow from $515 billion in 2023 to over $2 trillion by 2030. For context, assuming the majority of users (or even enterprises) don’t care for privacy, prefer cost effective choices, or choose alternate privacy solutions, even just a 1% share would equate to $20 billion. All to say, the long-term opportunity (TAM) is immense but value capture in the chain is another question (explored later) and therefore I suspect SOM might not be as high as most people originally have expected.
Among ZK are other privacy techniques including secure multi-party computation (MPC), differential privacy/federated learning (more to do with training), and trusted execution environments (TEEs; hardware, higher computational efficiency but weaker formal privacy guarantee). Although ZK will have to fit in an ecosystem alongside all these, where trade-offs & differing use cases require separate techniques, many of these are actually complementary and can be used in conjunction with one another.
Crypto x AI Future
So principally opposite, the convergence of crypto and AI proposes many potential synergies. On one hand, crypto can enable new incentive structures, value capture models, intelligent private smart contracts (which can now interact with AI models), infra rails for AI agents, and privacy-preserving capabilities for various facets of AI like data (RLHF & collection), compute (DePIN), and machine learning. Check out my friend Catrina’s (Portal VC) thesis on Why AI Needs Web3.
Meanwhile, AI can augment crypto through LLMs/copilots for writing code/debugging and adversarial training, personalized social graphs/content filtering algos, MEV/defensive tooling, and verification/PoH. All enhancing security, productivity, and scalability across applications of cryptography and blockchain. As these technologies continue advancing, we’ll likely see the rise of highly personalized and private AI services powered by cryptographic trust minimization, as well as more robust and usable cryptographic tools built or amplified using AI.
The interplay between cryptography and AI has the potential to solve challenging problems around incentives, bias, and transparency in AI while also making cryptographic tools easier to work with safely at scale. As both fields continue maturing, the collaborations between crypto and AI are poised to grow even deeper.
Some Thoughts & How To Capitalize
Investments should be evaluated at the highest level through a lens of potential, conviction, risk-reward, value capture and diversification. Beyond feasibility as we’ve discussed, I believe the biggest questions of zkML come from PMF & value capture.
I took some time to think deeply about where to position ourselves as investors in the space. As of now, zkML is a non-mission critical step. While I do see a big use case for MLaas, the industry has yet to prove itself outside of crypto, and that’s where I believe zkML must penetrate in order to even slightly reach its potential. A big concern I have for most of the pure-play zkML narrative is the question of where value accrual and capture will happen. Timeline is also an important question, given technical limitations and other privacy techniques (although can be complimentary). That I don’t have a definitive answer. The narrative is verifiable ML, and my initial gravitation I made a couple months ago towards most projects in the space was that they are more of a public good. However, having more and more conversations recently, I am starting to see more of a strong value capture strategy, where those enterprises which have more personalized & sensitive (complementarity with something like FHE) data, and need 100% model/training authenticity for important business insight or decisions have potential to become the ICP.
One thing I always ask is “what is a similar comparison to something we have today?” In asking that question to zkML, I see a scenario where verifiable ML acts at hyperscale and mission critical through an oversight-esque/auditing role (imagine a rating agency like S&P Global/Moody’s for financial products or Big 4 or usage from some regulatory utility) or a more “___-as-a-service” basis acting similar to a B2B Equifax/Credit Karma (data aggregation with insights, scoring/health check, and reporting for some standardization). Other adjacent areas of zkML will serve separate roles. Realistically it will be somewhere in between, with my assumption leaning slightly towards the latter.
It depends on which facet of the ecosystem you look at, but I still have a bit of trouble uncovering a clear robust business model that is intuitive to me, although that may change as we find a stronger PMF. That said, I am paying particular attention to the zk computation/integration enhancers (middleware/coprocessors), decentralized training (distributed networks & deployment), and ML projects leveraging crypto as an incentivization mechanism (eg. dataset pipeline/RLHF, royalty-esque design for open-source resources), and FHE niches since they offer a more clear opportunity for value capture.
Adjacently in traditional ML, where attention is very much being paid towards fine-tuning currently, I am excited for attention to end-to-end LLMOps (especially data generation (eg. synthetics) & pipeline + deployment at the enterprise level), and model optimization tools (think MosaicML of Databricks, offering solutions like pruning/quantization or LoRA from Microsoft).
Of course, as always, there are a ton of emerging projects, business models & ideas on the horizon, and I saw a number of them at SBC a couple weeks ago. That said, I still deem the space exciting as an investor who is playing the long-term game. If you have questions or would like to chat, please reach out to me at justinkojimchen@utexas.edu or 0xJuicetin on Twitter.
Learn more at the zkML Github or join the zkML Community & zkML Open-Source Community to be involved.
References
[1]https://github.com/zkml-community/awesome-zkml
[2]https://mirror.xyz/sevenxventures.eth/3USbrj7kcK7lyq_7upA4iyWV5pWMII7KrM40z5zpEXo
[3]https://a16zcrypto.com/posts/article/checks-and-balances-machine-learning-and-zero-knowledge-proofs/
[4]https://worldcoin.org/blog/engineering/intro-to-zkml
[5]https://medium.com/@danieldkang/trustless-verification-of-machine-learning-6f648fd8ba88
[6]https://hackmd.io/@cathie/zkml
[7]https://mirror.xyz/1kx.eth/q0s9RCH43JCDq8Z2w2Zo6S5SYcFt9ZQaRITzR4G7a_k
[8]https://messari.io/report/growing-synergies-in-ai-and-crypto?referrer=all-research
[9]https://medium.com/@danieldkang/tensorplonk-a-gpu-for-zkml-delivering-1-000x-speedups-d1ab0ad27e1c