Automated Theorem Proving

70 papers with code • 10 benchmarks • 8 datasets

The goal of Automated Theorem Proving is to automatically generate a proof, given a conjecture (the target theorem) and a knowledge base of known facts, all expressed in a formal language. Automated Theorem Proving is useful in a wide range of applications, including the verification and synthesis of software and hardware systems.

Source: Learning to Prove Theorems by Learning to Generate Theorems

Benchmarks

Add a Result

These leaderboards are used to track progress in Automated Theorem Proving

Dataset	Best Model	Compare
miniF2F-test	Thor + expert iteration on autoformalised theorems	See all
miniF2F-valid	Lean GPT-f	See all
HolStep (Conditional)	MPNN-DagLSTM	See all
HOList benchmark	4-hop GNN, sub-expression sharing	See all
HolStep (Unconditional)	FormulaNet	See all
Metamath set.mm	Evariste	See all
miniF2F-curriculum	Evariste-7d	See all
CompCert	Proverbot9001	See all
CoqGym	ASTactic	See all
LeanDojo Benchmark	ReProver	See all

Libraries

Use these libraries to find Automated Theorem Proving models and implementations

eleutherai/gpt-neox

2 papers

6,574

Datasets

Latest papers with no code

Most implemented Social Latest No code

Learn from Failure: Fine-Tuning LLMs with Trial-and-Error Data for Intuitionistic Propositional Logic Proving

no code yet • 10 Apr 2024

Recent advances in Automated Theorem Proving have shown the effectiveness of leveraging a (large) language model that generates tactics (i. e. proof steps) to search through proof states.

Paper
Add Code

Wu's Method can Boost Symbolic AI to Rival Silver Medalists and AlphaGeometry to Outperform Gold Medalists at IMO Geometry

no code yet • 9 Apr 2024

In this note, we revisit the IMO-AG-30 Challenge introduced with AlphaGeometry, and find that Wu's method is surprisingly strong.

Paper
Add Code

Proceedings 12th International Workshop on Theorem proving components for Educational software

no code yet • 4 Apr 2024

The ThEdu series pursues the smooth transition from an intuitive way of doing mathematics at secondary school to a more formal approach to the subject in STEM education, while favouring software support for this transition by exploiting the power of theorem-proving technologies.

Paper
Add Code

Multi-Task Learning with Multi-Task Optimization

no code yet • 24 Mar 2024

Multi-task learning solves multiple correlated tasks.

Paper
Add Code

Enhancing Formal Theorem Proving: A Comprehensive Dataset for Training AI Models on Coq Code

no code yet • 19 Mar 2024

In the realm of formal theorem proving, the Coq proof assistant stands out for its rigorous approach to verifying mathematical assertions and software correctness.

Paper
Add Code

BAIT: Benchmarking (Embedding) Architectures for Interactive Theorem-Proving

no code yet • 6 Mar 2024

We also provide a qualitative analysis, illustrating that improved performance is associated with more semantically-aware embeddings.

Paper
Add Code

Learning Guided Automated Reasoning: A Brief Survey

no code yet • 6 Mar 2024

Automated theorem provers and formal proof assistants are general reasoning systems that are in theory capable of proving arbitrarily hard theorems, thus solving arbitrary problems reducible to mathematics and logical reasoning.

Paper
Add Code

A Categorization of Complexity Classes for Information Retrieval and Synthesis Using Natural Logic

no code yet • 28 Feb 2024

In this paper, we introduce a novel framework for analyzing the complexity of a question answer based on the natural deduction calculus as presented in Prawitz (1965).

Paper
Add Code

EvoGPT-f: An Evolutionary GPT Framework for Benchmarking Formal Math Languages

no code yet • 12 Feb 2024

This paper introduces EvoGPT-f: a novel evolutionary framework for the first systematic quantitative analysis of the differential machine learnability of five formal math corpora (Lean 3, Lean 4, Coq, HOL 4, HOL Light) using four tokenization methods (character, word-level, Byte Pair Encoding and StarCoder tokenizer).

Paper
Add Code

"Task Success" is not Enough: Investigating the Use of Video-Language Models as Behavior Critics for Catching Undesirable Agent Behaviors

no code yet • 6 Feb 2024

Large-scale generative models are shown to be useful for sampling meaningful candidate solutions, yet they often overlook task constraints and user preferences.

Paper
Add Code

Automated Theorem Proving

Benchmarks Add a Result

Libraries

Datasets

Latest papers with no code

Content

Benchmarks

Add a Result