GPU-ACCELERATED ZKML

STWO-ML
Verifiable AI Prover

Prove any ML model's inference is correct with GPU-parallelized Circle STARKs. From Qwen3-14B to production models — cryptographic proof in seconds.

Circle STARKs
H200 GPU
On-Chain Verified
37.64s
Proving Time
Qwen3-14B
206ms
Verification
On-chain
98x
Speedup
vs baseline STWO
1,700x
Trace Reduction
5120x5120 matrices
292
Tests Passing
Full coverage
ARCHITECTURE

How It Works

A four-layer pipeline from confidential GPU inference to permanent on-chain proof

01

Confidential GPU Layer

TEE-attested H200 GPUs execute model inference in a secure enclave. Model weights never leave the trusted execution environment.

H200 GPUTEE AttestationEnclave
02

ML Proof Generation

Sumcheck protocol over the M31 field verifies every matrix multiplication. GPU-parallel round reduction with CUDA kernels at 256 threads per block.

SumcheckM31 FieldCUDA Kernels
03

Recursive Compression

Raw proofs compress from 17MB down to ~1KB via recursive Circle STARKs. FRI commitments over circle groups yield constant-size verification.

Circle STARKsFRI17MB to 1KB
04

On-Chain Settlement

Cairo verifier on Starknet validates the recursive proof. Permanent, immutable proof-of-inference stored on-chain for any model.

Cairo VerifierStarknetImmutable

Technical Deep Dive

Circle STARKs, GPU kernels, and tiled proving at production scale

Sumcheck Protocol

Verifies matrix multiplications via multilinear extensions. Each A*B product is reduced to a univariate polynomial check — O(n) verifier cost for O(n^2) computation.

prove_matmul_sumcheck(A, B, C) → rounds: Vec<(QM31, QM31, QM31)> → final_claim: QM31

Circle STARKs

Arithmetic over the Mersenne-31 field (2^31 - 1). FRI commitment scheme operates on circle groups, enabling efficient polynomial evaluation and composition.

M31 → CM31 (complex) → QM31 (degree-4 secure field) where j² = 2 + i

GPU Kernels

Custom CUDA kernels for sumcheck round reduction. 256 threads per block with shared-memory tree reduce. Auto-dispatches to GPU when k >= 16,384.

sumcheck_round_kernel<<<blocks, 256>>> → s0, s1, s2 parallel reduction → shared memory tree reduce

Tiled MatMul

Proves 5120x5120 matrices by chunking the k-dimension. Each tile produces a sub-proof, composed into a standard format. 4GB memory budget with auto-dispatch.

prove_tiled_matmul(A, B, tile_size) → compose_tiled_proof(sub_proofs) → standard MatMulProof

Supported Operations

MatMulAddMulLayerNormSoftmaxAttentionEmbeddingConv2DQuantize
LIVE DEMO

Proving Pipeline

Watch a 14-billion parameter model go from inference to on-chain verification

Qwen3-14B Proving Pipeline
Simulated
Model Input
14B params
Layer Processing
4 matmuls
Proof Generation
17 MB
Recursive Compress
~1 KB
On-Chain Verified
Starknet
Rust + CUDACircle STARKsM31 Field
17 MB → ~1 KB compression

How We Compare

The only ZKML prover with GPU acceleration, recursive proofs, and on-chain verification

STWO-MLOURS
GPU
Yes
Model Size
Qwen3-14B
Proving Time
37.64s
On-Chain
Yes
Recursive
Yes
zkLLM
GPU
No
Model Size
13B (limited)
Proving Time
~15min
On-Chain
No
Recursive
No
LuminAIR
GPU
No
Model Size
Small models
Proving Time
~5min
On-Chain
Yes
Recursive
No
ICICLE-STWO
GPU
Yes
Model Size
General circuits
Proving Time
Varies
On-Chain
No
Recursive
No
ON-CHAIN

Deployed on Starknet

Live verification contracts on Starknet Sepolia — inspect and verify proofs on-chain

StweMlVerifier v3

ML model verification with SAGE payment and trusted submitter control

0x04f8c5377d94baa15291832dc3821c2fc235a95f0823f86add32f828ea965a15
View on Starkscan

StweMlStarkVerifier

Recursive STARK proof verification on Starknet

0x005928ac548dc2719ef1b34869db2b61c2a55a4b148012fad742262a8d674fba
View on Starkscan

Elo Cairo Verifier

Generic on-chain sumcheck verifier — pure cryptographic verification

0x0531182369ea82331ac39854faab986ba61907c2f88aa75120636a427ff8569e
View on Starkscan
GET STARTED

Prove Your AI

Cryptographic proof that your model computed correctly — from inference to on-chain settlement.