Search Results for author: Kanishk Gandhi

Found 19 papers, 10 papers with code

Scaling up the think-aloud method

1 code implementation29 May 2025 Daniel Wurgaft, Ben Prystawski, Kanishk Gandhi, Cedegao E. Zhang, Joshua B. Tenenbaum, Noah D. Goodman

The think-aloud method, where participants voice their thoughts as they solve a task, is a valuable source of rich data about human reasoning processes.

Mathematical Reasoning

Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs

1 code implementation3 Mar 2025 Kanishk Gandhi, Ayush Chakravarthy, Anikait Singh, Nathan Lile, Noah D. Goodman

In systematic experimentation with controlled behavioral datasets, we find that priming Llama with examples containing these reasoning behaviors enables substantial improvements during RL, matching or exceeding Qwen's performance.

Reinforcement Learning (RL)

Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models

1 code implementation24 Feb 2025 Alon Albalak, Duy Phung, Nathan Lile, Rafael Rafailov, Kanishk Gandhi, Louis Castricato, Anikait Singh, Chase Blagden, Violet Xiang, Dakota Mahan, Nick Haber

However, existing open math datasets either contain a small collection of high-quality, human-written problems or a large corpus of machine-generated problems of uncertain quality, forcing researchers to choose between quality and quantity.

GSM8K Math +2

Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though

no code implementations8 Jan 2025 Violet Xiang, Charlie Snell, Kanishk Gandhi, Alon Albalak, Anikait Singh, Chase Blagden, Duy Phung, Rafael Rafailov, Nathan Lile, Dakota Mahan, Louis Castricato, Jan-Philipp Franken, Nick Haber, Chelsea Finn

We propose a novel framework, Meta Chain-of-Thought (Meta-CoT), which extends traditional Chain-of-Thought (CoT) by explicitly modeling the underlying reasoning required to arrive at a particular CoT.

Synthetic Data Generation

BoxingGym: Benchmarking Progress in Automated Experimental Design and Model Discovery

1 code implementation2 Jan 2025 Kanishk Gandhi, Michael Y. Li, Lyle Goodyear, Louise Li, Aditi Bhaskar, Mohammed Zaman, Noah D. Goodman

To quantitatively evaluate a scientific agent's ability to collect informative experimental data, we compute the expected information gain (EIG), an information-theoretic quantity which measures how much an experiment reduces uncertainty about the parameters of a generative model.

Benchmarking Experimental Design +2

Psychometric Alignment: Capturing Human Knowledge Distributions via Language Models

1 code implementation22 Jul 2024 Joy He-Yueya, Wanjing Anya Ma, Kanishk Gandhi, Benjamin W. Domingue, Emma Brunskill, Noah D. Goodman

We demonstrate that our metric can capture important variations in populations that traditional metrics, like differences in accuracy, fail to capture.

Procedural Dilemma Generation for Evaluating Moral Reasoning in Humans and Language Models

1 code implementation17 Apr 2024 Jan-Philipp Fränken, Kanishk Gandhi, Tori Qiu, Ayesha Khawaja, Noah D. Goodman, Tobias Gerstenberg

We collected moral permissibility and intention judgments from human participants for a subset of our items and compared these judgments to those from two language models (GPT-4 and Claude-2) across eight conditions.

Decision Making Language Modelling +1

Stream of Search (SoS): Learning to Search in Language

1 code implementation1 Apr 2024 Kanishk Gandhi, Denise Lee, Gabriel Grand, Muxin Liu, Winson Cheng, Archit Sharma, Noah D. Goodman

In this paper, we show how language models can be taught to search by representing the process of search in language, as a flattened string -- a stream of search (SoS).

Language Modelling

Social Contract AI: Aligning AI Assistants with Implicit Group Norms

1 code implementation26 Oct 2023 Jan-Philipp Fränken, Sam Kwok, Peixuan Ye, Kanishk Gandhi, Dilip Arumugam, Jared Moore, Alex Tamkin, Tobias Gerstenberg, Noah D. Goodman

We explore the idea of aligning an AI assistant by inverting a model of users' (unknown) preferences from observed interactions.

A Fourier Neural Operator Approach for Modelling Exciton-Polariton Condensate Systems

no code implementations27 Sep 2023 YuAn Wang, Surya T. Sathujoda, Krzysztof Sawicki, Kanishk Gandhi, Angelica I Aviles-Rivero, Pavlos G. Lagoudakis

A plethora of next-generation all-optical devices based on exciton-polaritons have been proposed in latest years, including prototypes of transistors, switches, analogue quantum simulators and others.

Understanding Social Reasoning in Language Models with Language Models

no code implementations NeurIPS 2023 Kanishk Gandhi, Jan-Philipp Fränken, Tobias Gerstenberg, Noah D. Goodman

Using our framework, we create a new social reasoning benchmark (BigToM) for LLMs which consists of 25 controls and 5, 000 model-written evaluations.

Certified Deductive Reasoning with Language Models

1 code implementation6 Jun 2023 Gabriel Poesia, Kanishk Gandhi, Eric Zelikman, Noah D. Goodman

In experiments on PrOntoQA, ProofWriter and Syllogism Validity datasets, \textsc{LogicGuide} significantly improves the performance of GPT-3, GPT-3. 5 Turbo and LLaMA (accuracy gains up to 35\%), while drastically reducing \emph{content effects} -- the interference between unwanted prior assumptions and reasoning, which humans and language models suffer from.

Logical Reasoning valid

Strategic Reasoning with Language Models

no code implementations30 May 2023 Kanishk Gandhi, Dorsa Sadigh, Noah D. Goodman

Existing approaches to solving strategic games rely on extensive training, yielding strategies that do not generalize to new scenarios or games without retraining.

Eliciting Compatible Demonstrations for Multi-Human Imitation Learning

no code implementations14 Oct 2022 Kanishk Gandhi, Siddharth Karamcheti, Madeline Liao, Dorsa Sadigh

Imitation learning from human-provided demonstrations is a strong approach for learning policies for robot manipulation.

Imitation Learning Robot Manipulation

Baby Intuitions Benchmark (BIB): Discerning the goals, preferences, and actions of others

no code implementations NeurIPS 2021 Kanishk Gandhi, Gala Stojnic, Brenden M. Lake, Moira R. Dillon

To achieve human-like common sense about everyday life, machine learning systems must understand and reason about the goals, preferences, and actions of other agents in the environment.

Common Sense Reasoning

Cannot find the paper you are looking for? You can Submit a new open access paper.