InternLM2.5-StepProver: Advancing Automated Theorem Proving via Expert Iteration on Large-Scale LEAN Problems

Oct 28, 2024·

Zijian Wu*

Suozhi Huang*

Zhejian Zhou

Huaiyuan Ying

Jiayu Wang

Dahua Lin

Kai Chen

· 0 min read

PDF Cite Code Dataset

Abstract

We propose to use large scale LEAN problem datasets Lean-workbook for expert iteration with more than 20,000 CPU days. During expert iteration, we found log-linear trends between solved problem amount with proof length and CPU usage. We train a critic model to select relatively easy problems for policy models to make trials and guide the model to search for deeper proofs. InternLM2.5-StepProver achieves open-source state-of-the-art on MiniF2F, Lean-Workbook-Plus, ProofNet, and Putnam benchmarks. Specifically, it achieves a pass of 65.9% on the MiniF2F-test and proves (or disproves) 17.0% of problems in Lean-Workbook-Plus which shows a significant improvement compared to only 9.5% of problems proved when Lean-Workbook-Plus was released.

Type

Preprint

Publication

Tech Report

Last updated on Oct 28, 2024

Large Language Models Formal Math Theorem Proving

← LeanProgress: Guiding Search for Neural Theorem Proving via Proof Progress Prediction Feb 28, 2025

A Knowledge–Data Dual‐Driven Framework for Predicting the Molecular Properties of Rechargeable Battery Electrolytes Aug 15, 2024 →