1 code implementation • 22 Feb 2024 • Kenneth Li, Samy Jelassi, Hugh Zhang, Sham Kakade, Martin Wattenberg, David Brandfonbrener
The idea is to learn a simple linear function on a model's embedding space that can be used to reweight candidate completions.
1 code implementation • 3 Jan 2024 • Andrew Lee, Xiaoyan Bai, Itamar Pres, Martin Wattenberg, Jonathan K. Kummerfeld, Rada Mihalcea
While alignment algorithms are now commonly used to tune pre-trained language models towards a user's preferences, we lack explanations for the underlying mechanisms in which models become ``aligned'', thus making it difficult to explain phenomena like jailbreaks.
no code implementations • 23 Oct 2023 • Michael Terry, Chinmay Kulkarni, Martin Wattenberg, Lucas Dixon, Meredith Ringel Morris
AI alignment considers the overall problem of ensuring an AI produces desired outcomes, without undesirable side effects.
1 code implementation • 17 Sep 2023 • Ian Arawjo, Chelse Swoopes, Priyan Vaithilingam, Martin Wattenberg, Elena Glassman
Evaluating outputs of large language models (LLMs) is challenging, requiring making -- and making sense of -- many responses.
1 code implementation • 2 Sep 2023 • Neel Nanda, Andrew Lee, Martin Wattenberg
How do sequence models represent their decision-making process?
no code implementations • 17 Aug 2023 • Evan Hernandez, Arnab Sen Sharma, Tal Haklay, Kevin Meng, Martin Wattenberg, Jacob Andreas, Yonatan Belinkov, David Bau
Linear relation representations may be obtained by constructing a first-order approximation to the LM from a single prompt, and they exist for a variety of factual, commonsense, and linguistic relations.
1 code implementation • 9 Jun 2023 • Yida Chen, Fernanda Viégas, Martin Wattenberg
Latent diffusion models (LDMs) exhibit an impressive ability to produce realistic images, yet the inner workings of these models remain mysterious.
1 code implementation • NeurIPS 2023 • Kenneth Li, Oam Patel, Fernanda Viégas, Hanspeter Pfister, Martin Wattenberg
This intervention significantly improves the performance of LLaMA models on the TruthfulQA benchmark.
no code implementations • 4 May 2023 • Fernanda Viégas, Martin Wattenberg
We conjecture that, for many systems, the two most important models will be of the user and of the system itself.
no code implementations • 4 May 2023 • Catherine Yeh, Yida Chen, Aoyu Wu, Cynthia Chen, Fernanda Viégas, Martin Wattenberg
Transformer models are revolutionizing machine learning, but their inner workings remain mysterious.
1 code implementation • 24 Oct 2022 • Kenneth Li, Aspen K. Hopkins, David Bau, Fernanda Viégas, Hanspeter Pfister, Martin Wattenberg
Language models show a surprising range of capabilities, but the source of their apparent competence is unclear.
1 code implementation • 21 Sep 2022 • Nelson Elhage, Tristan Hume, Catherine Olsson, Nicholas Schiefer, Tom Henighan, Shauna Kravec, Zac Hatfield-Dodds, Robert Lasenby, Dawn Drain, Carol Chen, Roger Grosse, Sam McCandlish, Jared Kaplan, Dario Amodei, Martin Wattenberg, Christopher Olah
Neural networks often pack many unrelated concepts into a single neuron - a puzzling phenomenon known as 'polysemanticity' which makes interpretability much more challenging.
no code implementations • 15 Feb 2022 • Mohammadtaher Safarzadeh, Asad Khan, E. A. Huerta, Martin Wattenberg
We describe a case study of translational research, applying interpretability techniques developed for computer vision to machine learning models used to search for and find gravitational waves.
no code implementations • 14 Apr 2021 • Tolga Bolukbasi, Adam Pearce, Ann Yuan, Andy Coenen, Emily Reif, Fernanda Viégas, Martin Wattenberg
We describe an "interpretability illusion" that arises when analyzing the BERT model.
1 code implementation • 9 Jul 2019 • James Wexler, Mahima Pushkarna, Tolga Bolukbasi, Martin Wattenberg, Fernanda Viegas, Jimbo Wilson
A key challenge in developing and deploying Machine Learning (ML) systems is understanding their performance across a wide range of inputs.
2 code implementations • NeurIPS 2019 • Andy Coenen, Emily Reif, Ann Yuan, Been Kim, Adam Pearce, Fernanda Viégas, Martin Wattenberg
Transformer architectures show significant promise for natural language processing.
1 code implementation • 4 Mar 2019 • Been Kim, Emily Reif, Martin Wattenberg, Samy Bengio, Michael C. Mozer
The Gestalt laws of perceptual organization, which describe how visual elements in an image are grouped and interpreted, have traditionally been thought of as innate despite their ecological validity.
no code implementations • 16 Jan 2019 • Daniel Smilkov, Nikhil Thorat, Yannick Assogba, Ann Yuan, Nick Kreeger, Ping Yu, Kangyi Zhang, Shanqing Cai, Eric Nielsen, David Soergel, Stan Bileschi, Michael Terry, Charles Nicholson, Sandeep N. Gupta, Sarah Sirajuddin, D. Sculley, Rajat Monga, Greg Corrado, Fernanda B. Viégas, Martin Wattenberg
TensorFlow. js is a library for building and executing machine learning algorithms in JavaScript.
1 code implementation • 5 Sep 2018 • Minsuk Kahng, Nikhil Thorat, Duen Horng Chau, Fernanda Viégas, Martin Wattenberg
Recent success in deep learning has generated immense interest among practitioners and students, inspiring many to learn about this new technology.
2 code implementations • ICLR 2018 • Justin Gilmer, Luke Metz, Fartash Faghri, Samuel S. Schoenholz, Maithra Raghu, Martin Wattenberg, Ian Goodfellow
We hypothesize that this counter intuitive behavior is a naturally occurring result of the high dimensional geometry of the data manifold.
2 code implementations • ICLR 2018 • Been Kim, Justin Gilmer, Martin Wattenberg, Fernanda Viégas
In particular, this framework enables non-machine learning experts to express concepts of interests and test hypotheses using examples (e. g., a set of pictures that illustrate the concept).
11 code implementations • ICML 2018 • Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, Rory Sayres
The interpretation of deep learning models is a challenge due to their size, complexity, and often opaque internal state.
no code implementations • 12 Aug 2017 • Daniel Smilkov, Shan Carter, D. Sculley, Fernanda B. Viégas, Martin Wattenberg
The recent successes of deep learning have led to a wave of interest from non-experts.
20 code implementations • 12 Jun 2017 • Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, Martin Wattenberg
Explaining the output of a deep network remains a challenge.
no code implementations • 16 Nov 2016 • Daniel Smilkov, Nikhil Thorat, Charles Nicholson, Emily Reif, Fernanda B. Viégas, Martin Wattenberg
Embeddings are ubiquitous in machine learning, appearing in recommender systems, NLP, and many other applications.
4 code implementations • TACL 2017 • Melvin Johnson, Mike Schuster, Quoc V. Le, Maxim Krikun, Yonghui Wu, Zhifeng Chen, Nikhil Thorat, Fernanda Viégas, Martin Wattenberg, Greg Corrado, Macduff Hughes, Jeffrey Dean
In addition to improving the translation quality of language pairs that the model was trained with, our models can also learn to perform implicit bridging between language pairs never seen explicitly during training, showing that transfer learning and zero-shot translation is possible for neural translation.
4 code implementations • 14 Mar 2016 • Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mane, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viegas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, Xiaoqiang Zheng
TensorFlow is an interface for expressing machine learning algorithms, and an implementation for executing such algorithms.