HyperTree Proof Search for Neural Theorem Proving

We propose an online training procedure for a transformer-based automated theorem prover. Our approach leverages a new search algorithm, HyperTree Proof Search (HTPS), inspired by the recent success of AlphaZero. Our model learns from previous proof searches through online training, allowing it to generalize to domains far from the training distribution. We report detailed ablations of our pipeline's main components by studying performance on three environments of increasing complexity. In particular, we show that with HTPS alone, a model trained on annotated proofs manages to prove 65.4% of a held-out set of Metamath theorems, significantly outperforming the previous state of the art of 56.5% by GPT-f. Online training on these unproved theorems increases accuracy to 82.6%. With a similar computational budget, we improve the state of the art on the Lean-based miniF2F-curriculum dataset from 31% to 42% proving accuracy.

PDF Abstract
No code implementations yet. Submit your code now

Datasets


Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Automated Theorem Proving Metamath set.mm Evariste Pass@32 72.4 # 1
Automated Theorem Proving miniF2F-curriculum GPT-f Pass@64 30.6 # 4
Automated Theorem Proving miniF2F-curriculum Evariste Pass@64 32.1 # 3
Automated Theorem Proving miniF2F-curriculum Evariste-7d Pass@64 42.5 # 1
Automated Theorem Proving miniF2F-curriculum Evariste-1d Pass@64 33.6 # 2
Automated Theorem Proving miniF2F-test Evariste Pass@64 41 # 1
Automated Theorem Proving miniF2F-test Evariste-7d Pass@64 40.6 # 2
Automated Theorem Proving miniF2F-test Evariste-1d Pass@64 38.9 # 3
Automated Theorem Proving miniF2F-test GPT-f Pass@64 36.6 # 4
Automated Theorem Proving miniF2F-valid Evariste-7d Pass@64 47.5 # 2
Automated Theorem Proving miniF2F-valid Evariste-1d Pass@64 46.7 # 4
Automated Theorem Proving miniF2F-valid GPT-f Pass@64 47.3 # 3
Automated Theorem Proving miniF2F-valid Evariste Pass@64 58.6 # 1

Methods