no code implementations • 20 Dec 2024 • Kaiyu Yang, Gabriel Poesia, Jingxuan He, Wenda Li, Kristin Lauter, Swarat Chaudhuri, Dawn Song
AI for Mathematics (AI4Math) is not only intriguing intellectually but also crucial for AI-driven discovery in science, engineering, and beyond.
no code implementations • 19 Dec 2024 • Simon Frieder, Jonas Bayer, Katherine M. Collins, Julius Berner, Jacob Loader, András Juhász, Fabian Ruehle, Sean Welleck, Gabriel Poesia, Ryan-Rhys Griffiths, Adrian Weller, Anirudh Goyal, Thomas Lukasiewicz, Timothy Gowers
The suite of datasets commonly used to train and evaluate the mathematical capabilities of AI-based mathematical copilots (primarily large language models) exhibit several shortcomings.
no code implementations • 5 Nov 2024 • Gabriel Poesia, Chloe Loughridge, Nada Amin
Since this data-driven approach is hindered by the lack of large-scale training data, we propose a method for open-ended synthesis of new Dafny programs in a flexible pipeline where LLMs formulate high-level ideas, implement them, and incrementally propose changes to existing programs, which Dafny validates.
no code implementations • 9 Aug 2024 • Moussa Koulako Bala Doumbouya, Ananjan Nandi, Gabriel Poesia, Davide Ghilardi, Anna Goldie, Federico Bianchi, Dan Jurafsky, Christopher D. Manning
We demonstrate h4rm3l's efficacy by synthesizing a dataset of 2656 successful novel jailbreak attacks targeting 6 SOTA open-source and proprietary LLMs, and by benchmarking those models against a subset of these synthesized attacks.
1 code implementation • 1 Jul 2024 • Shubhra Mishra, Gabriel Poesia, Belinda Mo, Noah D. Goodman
Mathematical problem solving is an important skill for Large Language Models (LLMs), both as an important capability and a proxy for a range of reasoning abilities.
2 code implementations • 30 Jun 2024 • Gabriel Poesia, David Broman, Nick Haber, Noah D. Goodman
We propose novel methods for hindsight relabeling on proof search trees to significantly improve the agent's sample efficiency in both tasks.
1 code implementation • 12 Jun 2024 • Zhening Li, Gabriel Poesia, Armando Solar-Lezama
Skills are temporal abstractions that are intended to improve reinforcement learning (RL) performance through hierarchical RL.
1 code implementation • 11 Sep 2023 • Ruocheng Wang, Eric Zelikman, Gabriel Poesia, Yewen Pu, Nick Haber, Noah D. Goodman
Inductive reasoning is a core problem-solving capacity: humans can identify underlying principles from a few examples, which robustly generalize to novel scenarios.
1 code implementation • 6 Jun 2023 • Gabriel Poesia, Kanishk Gandhi, Eric Zelikman, Noah D. Goodman
In experiments on PrOntoQA, ProofWriter and Syllogism Validity datasets, \textsc{LogicGuide} significantly improves the performance of GPT-3, GPT-3. 5 Turbo and LLaMA (accuracy gains up to 35\%), while drastically reducing \emph{content effects} -- the interference between unwanted prior assumptions and reasoning, which humans and language models suffer from.
1 code implementation • 16 Apr 2023 • Joy He-Yueya, Gabriel Poesia, Rose E. Wang, Noah D. Goodman
Automatically generating high-quality step-by-step solutions to math word problems has many applications in education.
1 code implementation • 20 Dec 2022 • Eric Zelikman, Qian Huang, Gabriel Poesia, Noah D. Goodman, Nick Haber
Despite recent success in large language model (LLM) reasoning, LLMs struggle with hierarchical multi-step reasoning tasks like generating complex programs.
1 code implementation • 29 Nov 2022 • Gabriel Poesia, Noah D. Goodman
We explore this idea in a case study on 5 sections of beginning algebra on the Khan Academy platform.
1 code implementation • 16 Nov 2022 • Zhening Li, Gabriel Poesia, Omar Costilla-Reyes, Noah Goodman, Armando Solar-Lezama
Humans tame the complexity of mathematical reasoning by developing hierarchies of abstractions.
2 code implementations • ICLR 2022 • Gabriel Poesia, Oleksandr Polozov, Vu Le, Ashish Tiwari, Gustavo Soares, Christopher Meek, Sumit Gulwani
Then, Synchromesh feeds the examples to a pre-trained language model and samples programs using Constrained Semantic Decoding (CSD): a general framework for constraining the output to a set of valid programs in the target language.
no code implementations • EMNLP 2021 • Julia White, Gabriel Poesia, Robert Hawkins, Dorsa Sadigh, Noah Goodman
An overarching goal of natural language processing is to enable machines to communicate seamlessly with humans.
2 code implementations • NeurIPS 2021 • Gabriel Poesia, WenXin Dong, Noah Goodman
Our results suggest new directions for reinforcement learning in symbolic domains, as well as applications to mathematics education.