Search Results for author: Martin Wattenberg

Found 27 papers, 18 papers with code

Q-Probe: A Lightweight Approach to Reward Maximization for Language Models

1 code implementation22 Feb 2024 Kenneth Li, Samy Jelassi, Hugh Zhang, Sham Kakade, Martin Wattenberg, David Brandfonbrener

The idea is to learn a simple linear function on a model's embedding space that can be used to reweight candidate completions.

Code Generation Language Modelling

A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity

1 code implementation3 Jan 2024 Andrew Lee, Xiaoyan Bai, Itamar Pres, Martin Wattenberg, Jonathan K. Kummerfeld, Rada Mihalcea

While alignment algorithms are now commonly used to tune pre-trained language models towards a user's preferences, we lack explanations for the underlying mechanisms in which models become ``aligned'', thus making it difficult to explain phenomena like jailbreaks.

Language Modelling

ChainForge: A Visual Toolkit for Prompt Engineering and LLM Hypothesis Testing

1 code implementation17 Sep 2023 Ian Arawjo, Chelse Swoopes, Priyan Vaithilingam, Martin Wattenberg, Elena Glassman

Evaluating outputs of large language models (LLMs) is challenging, requiring making -- and making sense of -- many responses.

Model Selection Prompt Engineering +1

Linearity of Relation Decoding in Transformer Language Models

no code implementations17 Aug 2023 Evan Hernandez, Arnab Sen Sharma, Tal Haklay, Kevin Meng, Martin Wattenberg, Jacob Andreas, Yonatan Belinkov, David Bau

Linear relation representations may be obtained by constructing a first-order approximation to the LM from a single prompt, and they exist for a variety of factual, commonsense, and linguistic relations.

Relation

Beyond Surface Statistics: Scene Representations in a Latent Diffusion Model

1 code implementation9 Jun 2023 Yida Chen, Fernanda Viégas, Martin Wattenberg

Latent diffusion models (LDMs) exhibit an impressive ability to produce realistic images, yet the inner workings of these models remain mysterious.

Denoising Image Generation

The System Model and the User Model: Exploring AI Dashboard Design

no code implementations4 May 2023 Fernanda Viégas, Martin Wattenberg

We conjecture that, for many systems, the two most important models will be of the user and of the system itself.

AttentionViz: A Global View of Transformer Attention

no code implementations4 May 2023 Catherine Yeh, Yida Chen, Aoyu Wu, Cynthia Chen, Fernanda Viégas, Martin Wattenberg

Transformer models are revolutionizing machine learning, but their inner workings remain mysterious.

Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task

1 code implementation24 Oct 2022 Kenneth Li, Aspen K. Hopkins, David Bau, Fernanda Viégas, Hanspeter Pfister, Martin Wattenberg

Language models show a surprising range of capabilities, but the source of their apparent competence is unclear.

Toy Models of Superposition

1 code implementation21 Sep 2022 Nelson Elhage, Tristan Hume, Catherine Olsson, Nicholas Schiefer, Tom Henighan, Shauna Kravec, Zac Hatfield-Dodds, Robert Lasenby, Dawn Drain, Carol Chen, Roger Grosse, Sam McCandlish, Jared Kaplan, Dario Amodei, Martin Wattenberg, Christopher Olah

Neural networks often pack many unrelated concepts into a single neuron - a puzzling phenomenon known as 'polysemanticity' which makes interpretability much more challenging.

Interpreting a Machine Learning Model for Detecting Gravitational Waves

no code implementations15 Feb 2022 Mohammadtaher Safarzadeh, Asad Khan, E. A. Huerta, Martin Wattenberg

We describe a case study of translational research, applying interpretability techniques developed for computer vision to machine learning models used to search for and find gravitational waves.

BIG-bench Machine Learning Interpretable Machine Learning

An Interpretability Illusion for BERT

no code implementations14 Apr 2021 Tolga Bolukbasi, Adam Pearce, Ann Yuan, Andy Coenen, Emily Reif, Fernanda Viégas, Martin Wattenberg

We describe an "interpretability illusion" that arises when analyzing the BERT model.

The What-If Tool: Interactive Probing of Machine Learning Models

1 code implementation9 Jul 2019 James Wexler, Mahima Pushkarna, Tolga Bolukbasi, Martin Wattenberg, Fernanda Viegas, Jimbo Wilson

A key challenge in developing and deploying Machine Learning (ML) systems is understanding their performance across a wide range of inputs.

BIG-bench Machine Learning Fairness

Neural Networks Trained on Natural Scenes Exhibit Gestalt Closure

1 code implementation4 Mar 2019 Been Kim, Emily Reif, Martin Wattenberg, Samy Bengio, Michael C. Mozer

The Gestalt laws of perceptual organization, which describe how visual elements in an image are grouped and interpreted, have traditionally been thought of as innate despite their ecological validity.

Image Classification

GAN Lab: Understanding Complex Deep Generative Models using Interactive Visual Experimentation

1 code implementation5 Sep 2018 Minsuk Kahng, Nikhil Thorat, Duen Horng Chau, Fernanda Viégas, Martin Wattenberg

Recent success in deep learning has generated immense interest among practitioners and students, inspiring many to learn about this new technology.

Adversarial Spheres

2 code implementations ICLR 2018 Justin Gilmer, Luke Metz, Fartash Faghri, Samuel S. Schoenholz, Maithra Raghu, Martin Wattenberg, Ian Goodfellow

We hypothesize that this counter intuitive behavior is a naturally occurring result of the high dimensional geometry of the data manifold.

TCAV: Relative concept importance testing with Linear Concept Activation Vectors

2 code implementations ICLR 2018 Been Kim, Justin Gilmer, Martin Wattenberg, Fernanda Viégas

In particular, this framework enables non-machine learning experts to express concepts of interests and test hypotheses using examples (e. g., a set of pictures that illustrate the concept).

Medical Diagnosis

Direct-Manipulation Visualization of Deep Networks

no code implementations12 Aug 2017 Daniel Smilkov, Shan Carter, D. Sculley, Fernanda B. Viégas, Martin Wattenberg

The recent successes of deep learning have led to a wave of interest from non-experts.

Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation

4 code implementations TACL 2017 Melvin Johnson, Mike Schuster, Quoc V. Le, Maxim Krikun, Yonghui Wu, Zhifeng Chen, Nikhil Thorat, Fernanda Viégas, Martin Wattenberg, Greg Corrado, Macduff Hughes, Jeffrey Dean

In addition to improving the translation quality of language pairs that the model was trained with, our models can also learn to perform implicit bridging between language pairs never seen explicitly during training, showing that transfer learning and zero-shot translation is possible for neural translation.

Machine Translation NMT +3

Cannot find the paper you are looking for? You can Submit a new open access paper.