no code implementations • 20 Jun 2025 • Lance Ying, Ryan Truong, Katherine M. Collins, Cedegao E. Zhang, Megan Wei, Tyler Brooke-Wilson, Tan Zhi-Xuan, Lionel Wong, Joshua B. Tenenbaum
Drawing real world social inferences usually requires taking into account information from multiple modalities.
no code implementations • 17 Jun 2025 • Lance Ying, Daniel Xu, Alicia Zhang, Katherine M. Collins, Max H. Siegel, Joshua B. Tenenbaum
People regularly make inferences about objects in the world that they cannot see by flexibly integrating information from multiple sources: auditory and visual cues, language, and our prior beliefs and knowledge about the scene.
no code implementations • 28 May 2025 • Ilia Sucholutsky, Katherine M. Collins, Nori Jacoby, Bill D. Thompson, Robert D. Hawkins
LLMs are already transforming the study of individual cognition, but their application to studying collective cognition has been underexplored.
no code implementations • 22 May 2025 • Kerem Oktar, Katherine M. Collins, Jose Hernandez-Orallo, Diane Coyle, Stephen Cave, Adrian Weller, Ilia Sucholutsky
Artificial Intelligence (AI) systems have historically been used as tools that execute narrowly defined tasks.
no code implementations • 17 Mar 2025 • Umang Bhatt, Sanyam Kapoor, Mihir Upadhyay, Ilia Sucholutsky, Francesco Quinzan, Katherine M. Collins, Adrian Weller, Andrew Gordon Wilson, Muhammad Bilal Zafar
Strategies for orchestrating the interactions between multiple agents, both human and artificial, can wildly overestimate performance and underestimate the cost of orchestration.
no code implementations • 9 Mar 2025 • Lexin Zhou, Lorenzo Pacchiardi, Fernando Martínez-Plumed, Katherine M. Collins, Yael Moros-Daval, Seraphina Zhang, Qinlin Zhao, Yitian Huang, Luning Sun, Jonathan E. Prunty, Zongqian Li, Pablo Sánchez-García, Kexin Jiang Chen, Pablo A. M. Casares, Jiyun Zu, John Burden, Behzad Mehrbakhsh, David Stillwell, Manuel Cebrian, Jindong Wang, Peter Henderson, Sherry Tongshuang Wu, Patrick C. Kyllonen, Lucy Cheke, Xing Xie, José Hernández-Orallo
In this paper, we introduce general scales for AI evaluation that can explain what common AI benchmarks really measure, extract ability profiles of AI systems, and predict their performance for new task instances, in- and out-of-distribution.
no code implementations • 16 Jan 2025 • Katherine M. Collins, Umang Bhatt, Ilia Sucholutsky
We consider strategies that can be undertaken by various stakeholders involved in a single human-AI interaction: human, AI model builder, and society or regulators around the interaction.
no code implementations • 19 Dec 2024 • Simon Frieder, Jonas Bayer, Katherine M. Collins, Julius Berner, Jacob Loader, András Juhász, Fabian Ruehle, Sean Welleck, Gabriel Poesia, Ryan-Rhys Griffiths, Adrian Weller, Anirudh Goyal, Thomas Lukasiewicz, Timothy Gowers
The suite of datasets commonly used to train and evaluate the mathematical capabilities of AI-based mathematical copilots (primarily large language models) exhibit several shortcomings.
no code implementations • 15 Aug 2024 • Zeju Qiu, Weiyang Liu, Haiwen Feng, Zhen Liu, Tim Z. Xiao, Katherine M. Collins, Joshua B. Tenenbaum, Adrian Weller, Michael J. Black, Bernhard Schölkopf
While LLMs exhibit impressive skills in general program synthesis and analysis, symbolic graphics programs offer a new layer of evaluation: they allow us to test an LLM's ability to answer different-grained semantic-level questions of the images or 3D geometries without a vision encoder.
no code implementations • 19 Jul 2024 • Cedegao E. Zhang, Katherine M. Collins, Lionel Wong, Mauricio Barba, Adrian Weller, Joshua B. Tenenbaum
People can evaluate features of problems and their potential solutions well before we can effectively solve them.
no code implementations • 24 Jun 2024 • Katherine M. Collins, Najoung Kim, Yonatan Bitton, Verena Rieser, Shayegan Omidshafiei, Yushi Hu, Sherol Chen, Senjuti Dutta, Minsuk Chang, Kimin Lee, Youwei Liang, Georgina Evans, Sahil Singla, Gang Li, Adrian Weller, Junfeng He, Deepak Ramachandran, Krishnamurthy Dj Dvijotham
Human feedback plays a critical role in learning and refining reward models for text-to-image generation, but the optimal form the feedback should take for learning an accurate reward function has not been conclusively established.
no code implementations • 24 Jun 2024 • Katherine M. Collins, Valerie Chen, Ilia Sucholutsky, Hannah Rose Kirk, Malak Sadek, Holli Sargeant, Ameet Talwalkar, Adrian Weller, Umang Bhatt
Through a user study with real humans, we observe shifts in user behavior from the imposition of a friction over LLMs in the context of a multi-topic question-answering task as a representative task that people may use LLMs for, e. g., in education and information retrieval.
no code implementations • 6 Jun 2024 • Ilia Sucholutsky, Katherine M. Collins, Maya Malaviya, Nori Jacoby, Weiyang Liu, Theodore R. Sumers, Michalis Korakakis, Umang Bhatt, Mark Ho, Joshua B. Tenenbaum, Brad Love, Zachary A. Pardos, Adrian Weller, Thomas L. Griffiths
A good teacher should not only be knowledgeable; but should be able to communicate in a way that the student understands -- to share the student's representation of the world.
1 code implementation • 8 Mar 2024 • Kartik Chandra, Katherine M. Collins, Will Crichton, Tony Chen, Tzu-Mao Li, Adrian Weller, Rachit Nigam, Joshua Tenenbaum, Jonathan Ragan-Kelley
Often, a good explanation for a program's unexpected behavior is a bug in the programmer's code.
1 code implementation • 13 Dec 2023 • Vihari Piratla, Juyeon Heo, Katherine M. Collins, Sukriti Singh, Adrian Weller
We believe the improved quality of uncertainty-aware concept explanations make them a strong candidate for more reliable model interpretation.
no code implementations • 19 Oct 2023 • Cedegao E. Zhang, Katherine M. Collins, Adrian Weller, Joshua B. Tenenbaum
Mathematics is one of the most powerful conceptual systems developed and used by the human species.
1 code implementation • 18 Oct 2023 • Ilia Sucholutsky, Lukas Muttenthaler, Adrian Weller, Andi Peng, Andreea Bobu, Been Kim, Bradley C. Love, Christopher J. Cueva, Erin Grant, Iris Groen, Jascha Achterberg, Joshua B. Tenenbaum, Katherine M. Collins, Katherine L. Hermann, Kerem Oktar, Klaus Greff, Martin N. Hebart, Nathan Cloos, Nikolaus Kriegeskorte, Nori Jacoby, Qiuyi Zhang, Raja Marjieh, Robert Geirhos, Sherol Chen, Simon Kornblith, Sunayana Rane, Talia Konkle, Thomas P. O'Connell, Thomas Unterthiner, Andrew K. Lampinen, Klaus-Robert Müller, Mariya Toneva, Thomas L. Griffiths
These questions pertaining to the study of representational alignment are at the heart of some of the most promising research areas in contemporary cognitive science, neuroscience, and machine learning.
1 code implementation • NeurIPS 2023 • Mateo Espinosa Zarlenga, Katherine M. Collins, Krishnamurthy Dvijotham, Adrian Weller, Zohreh Shams, Mateja Jamnik
To address this, we propose Intervention-aware Concept Embedding models (IntCEMs), a novel CBM-based architecture and training paradigm that improves a model's receptiveness to test-time interventions.
no code implementations • 28 Jul 2023 • Matthew Barker, Emma Kallina, Dhananjay Ashok, Katherine M. Collins, Ashley Casovan, Adrian Weller, Ameet Talwalkar, Valerie Chen, Umang Bhatt
We propose FeedbackLogs, addenda to existing documentation of ML pipelines, to track the input of multiple stakeholders.
no code implementations • 25 Jun 2023 • Lance Ying, Katherine M. Collins, Megan Wei, Cedegao E. Zhang, Tan Zhi-Xuan, Adrian Weller, Joshua B. Tenenbaum, Lionel Wong
To test our model, we design and run a human experiment on a linguistic goal inference task.
no code implementations • 14 Jun 2023 • Matthew Barker, Katherine M. Collins, Krishnamurthy Dvijotham, Adrian Weller, Umang Bhatt
Concept-based models perform prediction using a set of concepts that are interpretable to stakeholders.
1 code implementation • 2 Jun 2023 • Katherine M. Collins, Albert Q. Jiang, Simon Frieder, Lionel Wong, Miri Zilka, Umang Bhatt, Thomas Lukasiewicz, Yuhuai Wu, Joshua B. Tenenbaum, William Hart, Timothy Gowers, Wenda Li, Adrian Weller, Mateja Jamnik
There is much excitement about the opportunity to harness the power of large language models (LLMs) when building problem-solving assistants.
no code implementations • 13 Apr 2023 • Umang Bhatt, Valerie Chen, Katherine M. Collins, Parameswaran Kamalaruban, Emma Kallina, Adrian Weller, Ameet Talwalkar
$\texttt{Modiste}$ leverages stochastic contextual bandit techniques to personalize a decision support policy for each decision-maker and supports extensions to the multi-objective setting to account for auxiliary objectives like the cost of support.
no code implementations • 22 Mar 2023 • Katherine M. Collins, Matthew Barker, Mateo Espinosa Zarlenga, Naveen Raman, Umang Bhatt, Mateja Jamnik, Ilia Sucholutsky, Adrian Weller, Krishnamurthy Dvijotham
We study how existing concept-based models deal with uncertain interventions from humans using two novel datasets: UMNIST, a visual dataset with controlled simulated uncertainty based on the MNIST dataset, and CUB-S, a relabeling of the popular CUB concept dataset with rich, densely-annotated soft labels from humans.
no code implementations • 2 Nov 2022 • Ilia Sucholutsky, Ruairidh M. Battleday, Katherine M. Collins, Raja Marjieh, Joshua C. Peterson, Pulkit Singh, Umang Bhatt, Nori Jacoby, Adrian Weller, Thomas L. Griffiths
Supervised learning typically focuses on learning transferable representations from training examples annotated by humans.
1 code implementation • 2 Nov 2022 • Katherine M. Collins, Umang Bhatt, Weiyang Liu, Vihari Piratla, Ilia Sucholutsky, Bradley Love, Adrian Weller
We focus on the synthetic data used in mixup: a powerful regularizer shown to improve model robustness, generalization, and calibration.
1 code implementation • 2 Jul 2022 • Katherine M. Collins, Umang Bhatt, Adrian Weller
Our elicitation methodology therefore shows nuanced promise in enabling practitioners to enjoy the benefits of improved model performance and reliability with fewer annotators, and serves as a guide for future dataset curators on the benefits of leveraging richer information, such as categorical uncertainty, from individual annotators.
1 code implementation • 11 May 2022 • Katherine M. Collins, Catherine Wong, Jiahai Feng, Megan Wei, Joshua B. Tenenbaum
We first contribute a new challenge benchmark for comparing humans and distributional large language models (LLMs).
no code implementations • NeurIPS 2021 • Yilun Du, Katherine M. Collins, Joshua B. Tenenbaum, Vincent Sitzmann
We leverage neural fields to capture the underlying structure in image, shape, audio and cross-modal audiovisual domains in a modality-independent manner.
no code implementations • ICLR 2022 • Tuan Anh Le, Katherine M. Collins, Luke Hewitt, Kevin Ellis, N. Siddharth, Samuel J. Gershman, Joshua B. Tenenbaum
We build on a recent approach, Memoised Wake-Sleep (MWS), which alleviates part of the problem by memoising discrete variables, and extend it to allow for a principled and effective way to handle continuous variables by learning a separate recognition model used for importance-sampling based approximate inference and marginalization.