Zero to Agent Hackathon · Vercel × DeepMind · March 2026

AgentQED

Write a math proof in plain English, voice, or a photo of handwritten work. Gemini 3.1 Pro translates it to Lean 4 and a real theorem prover decides if it's right. Compiler-verified, not vibes-verified.

Concept Ganesh Sankar (UC Berkeley). Built with Sankar Subbayya (accurateai.org).

Try the live app → GitHub accurateai.org Sankar's portfolio

What it does

A natural-language proof can be wrong in ways that read fluent. A skipped case. A quantifier flipped. An induction step that quietly assumes what it set out to prove. A language model will tell you it looks fine. AgentQED asks a theorem prover instead.

You give it a proof. The agent translates the proof into Lean 4, runs the actual Lean compiler in an isolated sandbox, and reports back. If the compiler accepts the proof, it is verified by the same tool used to formalize the polynomial Freiman-Ruzsa conjecture. If it rejects the proof, the agent reads the compiler's error, edits the Lean, and tries again. Up to twelve times.

Plain English LaTeX Voice Photo of handwriting Multi-page PDF

A worked example

You type:

Prove that for all natural numbers n, the sum 0 + 1 + ⋯ + n equals n(n+1)/2.

Gemini writes this, and the sandbox compiles it on the first try:

def sumTo : Nat → Nat
  | 0     => 0
  | n + 1 => (n + 1) + sumTo n

theorem sum_formula (n : Nat) : 2 * sumTo n = n * (n + 1) := by
  induction n with
  | zero => rfl
  | succ k ih =>
    simp [sumTo, Nat.mul_add, Nat.add_mul]
    omega

You did not need to know what omega does to get the verified result. (It dispatches linear arithmetic.) The Key Insight and proof structure are shown first; the Lean code is collapsed by default. Understanding comes before syntax.

Architecture

Gemini 3.1 Pro for translation. Handles photographs of handwritten math and does not get stuck inside its own previous suggestion when fed a compiler error.
Vercel Sandbox running Lean 4 in an isolated Firecracker microVM. Each request gets a fresh filesystem, so no shared state can corrupt the next user's run. Cold start about six seconds; warm is instant.
Self-correcting agent loop. On compiler rejection, the stderr is filtered to keep the goal state and the failing line, then prepended to the next prompt. Up to twelve retries. This single feedback-shaping change took the success rate on the sample proof set from about half to all of it.
Next.js + Vercel AI SDK for streaming UI. You watch the agent talk to itself in real time; the wait feels like progress, not a loading spinner.
Multi-modal input via Web Speech API (voice), file upload (images and PDFs), and direct text. A vision pass on images and per-page extraction on PDFs feed into the same translation step.
Progressive disclosure UI. Key Insight at the top, proof structure tree, then collapsed sections for Lean code, step-by-step breakdown, mathematical insight, and Lean tactic notes.

Why not have the LLM grade the proof

Because language models are confident liars about math. A model will tell you your proof of Fermat's Last Theorem is correct. It will say that in a complete sentence. It will keep saying it.

The Lean compiler does not have a personality. If your proof has a gap, the term it expects to typecheck does not typecheck, and the error points at the line. That error is what the agent reads to figure out what to fix. Everything else is plumbing to get you there without learning Lean first.

Try it

Live app

Hosted on Vercel. Six sample proofs on the landing page — modus ponens is the fastest, the sum formula is the most satisfying.

Source on GitHub

Next.js + AI SDK + Vercel Sandbox. The Lean runner, agent loop, and system prompt are all in src/.

Bring your own proof

Photograph something you scribbled on paper, drop it into the upload box, and watch the agent translate and verify. The PDF path works for multi-page proofs too.

Credits

Concept & idea Ganesh Sankar · UC Berkeley

Engineering Sankar Subbayya · AccurateAI

Built at Zero to Agent Hackathon · Shack15 SF · March 2026

Verified does not mean intended. A proof can typecheck and still prove the wrong theorem if the statement is wrong. Always read the statement the agent produced, not just the green checkmark.