no code implementations • 11 Feb 2025 • John Hewitt, Robert Geirhos, Been Kim
This means interpretability can be framed as a communication problem: humans must be able to reference and control machine concepts, and communicate human concepts to machines.
1 code implementation • 9 Dec 2024 • Meera Hahn, Wenjun Zeng, Nithish Kannen, Rich Galt, Kartikeya Badola, Been Kim, Zi Wang
To address this, we propose a design for proactive T2I agents equipped with an interface to (1) actively ask clarification questions when uncertain, and (2) present their understanding of user intent as an understandable belief graph that a user can edit.
no code implementations • 25 Oct 2023 • Lisa Schut, Nenad Tomasev, Tom McGrath, Demis Hassabis, Ulrich Paquet, Been Kim
Artificial Intelligence (AI) systems have made remarkable progress, attaining super-human performance across various domains.
1 code implementation • 18 Oct 2023 • Ilia Sucholutsky, Lukas Muttenthaler, Adrian Weller, Andi Peng, Andreea Bobu, Been Kim, Bradley C. Love, Christopher J. Cueva, Erin Grant, Iris Groen, Jascha Achterberg, Joshua B. Tenenbaum, Katherine M. Collins, Katherine L. Hermann, Kerem Oktar, Klaus Greff, Martin N. Hebart, Nathan Cloos, Nikolaus Kriegeskorte, Nori Jacoby, Qiuyi Zhang, Raja Marjieh, Robert Geirhos, Sherol Chen, Simon Kornblith, Sunayana Rane, Talia Konkle, Thomas P. O'Connell, Thomas Unterthiner, Andrew K. Lampinen, Klaus-Robert Müller, Mariya Toneva, Thomas L. Griffiths
These questions pertaining to the study of representational alignment are at the heart of some of the most promising research areas in contemporary cognitive science, neuroscience, and machine learning.
1 code implementation • 7 Jun 2023 • Robert Geirhos, Roland S. Zimmermann, Blair Bilodeau, Wieland Brendel, Been Kim
Today, visualization methods form the foundation of our knowledge about the internal workings of neural networks, as a type of mechanistic interpretability.
no code implementations • 24 May 2023 • Toby Shevlane, Sebastian Farquhar, Ben Garfinkel, Mary Phuong, Jess Whittlestone, Jade Leung, Daniel Kokotajlo, Nahema Marchal, Markus Anderljung, Noam Kolt, Lewis Ho, Divya Siddarth, Shahar Avin, Will Hawkins, Been Kim, Iason Gabriel, Vijay Bolina, Jack Clark, Yoshua Bengio, Paul Christiano, Allan Dafoe
Current approaches to building general-purpose AI systems tend to produce systems with both beneficial and harmful capabilities.
1 code implementation • NeurIPS 2023 • Peter Hase, Mohit Bansal, Been Kim, Asma Ghandeharioun
This finding raises questions about how past work relies on Causal Tracing to select which model layers to edit.
1 code implementation • 22 Dec 2022 • Blair Bilodeau, Natasha Jaques, Pang Wei Koh, Been Kim
Despite a sea of interpretability methods that can produce plausible explanations, the field has also empirically seen many failure cases of such methods.
no code implementations • 13 Dec 2022 • Amir-Hossein Karimi, Krikamol Muandet, Simon Kornblith, Bernhard Schölkopf, Been Kim
Our work borrows tools from causal inference to systematically assay this relationship.
no code implementations • ICLR 2022 • Julius Adebayo, Michael Muelly, Hal Abelson, Been Kim
We investigate whether three types of post hoc model explanations--feature attribution, concept activation, and training point ranking--are effective for detecting a model's reliance on spurious signals in the training data.
no code implementations • 17 Jun 2022 • Shayegan Omidshafiei, Andrei Kapishnikov, Yannick Assogba, Lucas Dixon, Been Kim
Each year, expert-level performance is attained in increasingly-complex multiagent domains, where notable examples include Go, Poker, and StarCraft II.
no code implementations • NeurIPS Workshop AI4Scien 2021 • Evan D. Anderson, Ramsey Wilcox, Anuj Nayak, Christopher Zwilling, Pablo Robles-Granda, Been Kim, Lav R. Varshney, Aron K. Barbey
Investigating the proposed modeling framework's efficacy, we find that advanced connectome-based predictive modeling generates neuroscience predictions that account for a significantly greater proportion of variance in general intelligence scores than previously established methods, advancing our scientific understanding of the network architecture that underlies human intelligence.
no code implementations • 25 Feb 2022 • Chih-Kuan Yeh, Been Kim, Pradeep Ravikumar
We start by introducing concept explanations including the class of Concept Activation Vectors (CAV) which characterize concepts using vectors in appropriate spaces of neural activations, and discuss different properties of useful concepts, and approaches to measure the usefulness of concept vectors.
no code implementations • 11 Jan 2022 • Devleena Das, Been Kim, Sonia Chernova
Intelligent decision support (IDS) systems leverage artificial intelligence techniques to generate recommendations that guide human users through the decision making phases of a task.
no code implementations • 13 Dec 2021 • Leon Sixt, Evan Zheran Liu, Marie Pellat, James Wexler, Milad Hashemi, Been Kim, Martin Maas
Machine Learning has been successfully applied in systems applications such as memory prefetching and caching, where learned models have been shown to outperform heuristics.
no code implementations • 17 Nov 2021 • Thomas McGrath, Andrei Kapishnikov, Nenad Tomašev, Adam Pearce, Demis Hassabis, Been Kim, Ulrich Paquet, Vladimir Kramnik
In this work we provide evidence that human knowledge is acquired by the AlphaZero neural network as it trains on the game of chess.
no code implementations • 16 Jun 2021 • Jessica Schrouff, Sebastien Baur, Shaobo Hou, Diana Mincu, Eric Loreaux, Ralph Blanes, James Wexler, Alan Karthikesalingam, Been Kim
While there are many methods focused on either one, few frameworks can provide both local and global explanations in a consistent manner.
1 code implementation • ICLR 2022 • Asma Ghandeharioun, Been Kim, Chun-Liang Li, Brendan Jou, Brian Eoff, Rosalind W. Picard
Explaining deep learning model inferences is a promising venue for scientific understanding, improving safety, uncovering hidden biases, evaluating fairness, and beyond, as argued by many scholars.
1 code implementation • NeurIPS 2020 • Julius Adebayo, Michael Muelly, Ilaria Liccardi, Been Kim
For several explanation methods, we assess their ability to: detect spurious correlation artifacts (data contamination), diagnose mislabeled training examples (data contamination), differentiate between a (partially) re-initialized model and a trained one (model contamination), and detect out-of-distribution inputs (test-time contamination).
4 code implementations • ICML 2020 • Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, Percy Liang
We seek to learn models that we can interact with using high-level concepts: if the model did not think there was a bone spur in the x-ray, would it still predict severe arthritis?
2 code implementations • NeurIPS 2020 • Chih-Kuan Yeh, Been Kim, Sercan O. Arik, Chun-Liang Li, Tomas Pfister, Pradeep Ravikumar
Next, we propose a concept discovery method that aims to infer a complete set of concepts that are additionally encouraged to be interpretable, which addresses the limitations of existing methods on concept explanations.
no code implementations • 25 Sep 2019 • Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Pradeep Ravikumar, Tomas Pfister
Next, we propose a concept discovery method that considers two additional constraints to encourage the interpretability of the discovered concepts.
2 code implementations • 23 Jul 2019 • Mengjiao Yang, Been Kim
Despite active development, quantitative evaluation of feature attribution methods remains difficult due to the lack of ground truth: we do not know which input features are in fact important to a model.
3 code implementations • 22 Jul 2019 • Shalmali Joshi, Oluwasanmi Koyejo, Warut Vijitbenjaronk, Been Kim, Joydeep Ghosh
We then provide a mechanism to generate the smallest set of changes that will improve an individual's outcome.
no code implementations • 16 Jul 2019 • Yash Goyal, Amir Feder, Uri Shalit, Been Kim
To overcome this problem, we define the Causal Concept Effect (CaCE) as the causal effect of (the presence or absence of) a human-interpretable concept on a deep neural net's predictions.
1 code implementation • NeurIPS 2019 • Andy Coenen, Emily Reif, Ann Yuan, Been Kim, Adam Pearce, Fernanda Viégas, Martin Wattenberg
Transformer architectures show significant promise for natural language processing.
1 code implementation • 4 Mar 2019 • Been Kim, Emily Reif, Martin Wattenberg, Samy Bengio, Michael C. Mozer
The Gestalt laws of perceptual organization, which describe how visual elements in an image are grouped and interpreted, have traditionally been thought of as innate despite their ecological validity.
2 code implementations • NeurIPS 2019 • Amirata Ghorbani, James Wexler, James Zou, Been Kim
Interpretability has become an important topic of research as more machine learning (ML) models are deployed and widely used to make important decisions.
no code implementations • 31 Jan 2019 • Isaac Lage, Emily Chen, Jeffrey He, Menaka Narayanan, Been Kim, Sam Gershman, Finale Doshi-Velez
Recent years have seen a boom in interest in machine learning systems that can provide a human-understandable rationale for their predictions or decisions.
no code implementations • 23 Oct 2018 • Rajiv Khanna, Been Kim, Joydeep Ghosh, Oluwasanmi Koyejo
Research in both machine learning and psychology suggests that salient examples can help humans to interpret learning models.
5 code implementations • NeurIPS 2018 • Julius Adebayo, Justin Gilmer, Michael Muelly, Ian Goodfellow, Moritz Hardt, Been Kim
We find that reliance, solely, on visual assessment can be misleading.
2 code implementations • 8 Oct 2018 • Julius Adebayo, Justin Gilmer, Ian Goodfellow, Been Kim
Explaining the output of a complicated machine learning model like a deep neural network (DNN) is a central challenge in machine learning.
no code implementations • 3 Jul 2018 • Been Kim, Kush R. Varshney, Adrian Weller
This is the Proceedings of the 2018 ICML Workshop on Human Interpretability in Machine Learning (WHI 2018), which was held in Stockholm, Sweden, July 14, 2018.
3 code implementations • NeurIPS 2019 • Sara Hooker, Dumitru Erhan, Pieter-Jan Kindermans, Been Kim
We propose an empirical measure of the approximate accuracy of feature importance estimates in deep neural networks.
no code implementations • 22 Jun 2018 • Shalmali Joshi, Oluwasanmi Koyejo, Been Kim, Joydeep Ghosh
This work proposes xGEMs or manifold guided exemplars, a framework to understand black-box classifier behavior by exploring the landscape of the underlying data manifold as data points cross decision boundaries.
1 code implementation • NeurIPS 2018 • Heinrich Jiang, Been Kim, Melody Y. Guan, Maya Gupta
Knowing when a classifier's prediction can be trusted is useful in many applications and critical for safely using AI.
no code implementations • NeurIPS 2018 • Isaac Lage, Andrew Slavin Ross, Been Kim, Samuel J. Gershman, Finale Doshi-Velez
We often desire our models to be interpretable as well as accurate.
no code implementations • 2 Feb 2018 • Menaka Narayanan, Emily Chen, Jeffrey He, Been Kim, Sam Gershman, Finale Doshi-Velez
Recent years have seen a boom in interest in machine learning systems that can provide a human-understandable rationale for their predictions or decisions.
2 code implementations • ICLR 2018 • Been Kim, Justin Gilmer, Martin Wattenberg, Fernanda Viégas
In particular, this framework enables non-machine learning experts to express concepts of interests and test hypotheses using examples (e. g., a set of pictures that illustrate the concept).
11 code implementations • ICML 2018 • Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, Rory Sayres
The interpretation of deep learning models is a challenge due to their size, complexity, and often opaque internal state.
1 code implementation • ICLR 2018 • Pieter-Jan Kindermans, Sara Hooker, Julius Adebayo, Maximilian Alber, Kristof T. Schütt, Sven Dähne, Dumitru Erhan, Been Kim
Saliency methods aim to explain the predictions of deep neural networks.
no code implementations • 8 Aug 2017 • Been Kim, Dmitry M. Malioutov, Kush R. Varshney, Adrian Weller
This is the Proceedings of the 2017 ICML Workshop on Human Interpretability in Machine Learning (WHI 2017), which was held in Sydney, Australia, August 10, 2017.
20 code implementations • 12 Jun 2017 • Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, Martin Wattenberg
Explaining the output of a deep network remains a challenge.
4 code implementations • ICLR 2018 • Pieter-Jan Kindermans, Kristof T. Schütt, Maximilian Alber, Klaus-Robert Müller, Dumitru Erhan, Been Kim, Sven Dähne
We show that these methods do not produce the theoretically correct explanation for a linear model.
no code implementations • 28 Feb 2017 • Finale Doshi-Velez, Been Kim
As machine learning systems become ubiquitous, there has been a surge of interest in interpretable machine learning: systems that provide explanation for their outputs.
BIG-bench Machine Learning
Interpretable Machine Learning
+1
no code implementations • NeurIPS 2016 • Been Kim, Rajiv Khanna, Oluwasanmi O. Koyejo
Example-based explanations are widely used in the effort to improve the interpretability of highly complex distributions.
no code implementations • 28 Nov 2016 • Andrew Gordon Wilson, Been Kim, William Herlands
This is the Proceedings of NIPS 2016 Workshop on Interpretable Machine Learning for Complex Systems, held in Barcelona, Spain on December 9, 2016
no code implementations • 8 Jul 2016 • Been Kim, Dmitry M. Malioutov, Kush R. Varshney
This is the Proceedings of the 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016), which was held in New York, NY, June 23, 2016.
no code implementations • NeurIPS 2015 • Been Kim, Julie A. Shah, Finale Doshi-Velez
We present the Mind the Gap Model (MGM), an approach for interpretable feature extraction and selection.
no code implementations • NeurIPS 2014 • Been Kim, Cynthia Rudin, Julie Shah
We present the Bayesian Case Model (BCM), a general framework for Bayesian case-based reasoning (CBR) and prototype classification and clustering.
no code implementations • 8 Jun 2013 • Been Kim, Cynthia Rudin
Most people participate in meetings almost every day, multiple times a day.
no code implementations • 5 Jun 2013 • Been Kim, Caleb M. Chacha, Julie Shah
We present an algorithm that reduces this translation burden by inferring the final plan from a processed form of the human team's planning conversation.