Search Results for author: David Bau

Found 46 papers, 29 papers with code

Token Erasure as a Footprint of Implicit Vocabulary Items in LLMs

no code implementations28 Jun 2024 Sheridan Feucht, David Atkinson, Byron Wallace, David Bau

In this work, we find that last token representations of named entities and multi-token words exhibit a pronounced "erasure" effect, where information about previous and current tokens is rapidly forgotten in early layers.

Customizing Text-to-Image Models with a Single Image Pair

no code implementations2 May 2024 Maxwell Jones, Sheng-Yu Wang, Nupur Kumari, David Bau, Jun-Yan Zhu

Both qualitative and quantitative experiments show that our method can effectively learn style while avoiding overfitting to image content, highlighting the potential of modeling such stylistic differences from a single image pair.

Locating and Editing Factual Associations in Mamba

1 code implementation4 Apr 2024 Arnab Sen Sharma, David Atkinson, David Bau

We investigate the mechanisms of factual recall in the Mamba state space model.

Model Editing

Model Lakes

no code implementations4 Mar 2024 Koyena Pal, David Bau, Renée J. Miller

And we discuss what principled data management techniques can be brought to bear on the study of large model management.


Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking

no code implementations22 Feb 2024 Nikhil Prakash, Tamar Rott Shaham, Tal Haklay, Yonatan Belinkov, David Bau

We identify the mechanism that enables entity tracking and show that (i) in both the original model and its fine-tuned versions primarily the same circuit implements entity tracking.

Code Generation Instruction Following

Measuring and Controlling Instruction (In)Stability in Language Model Dialogs

1 code implementation13 Feb 2024 Kenneth Li, Tianle Liu, Naomi Bashkansky, David Bau, Fernanda Viégas, Hanspeter Pfister, Martin Wattenberg

System-prompting is a standard tool for customizing language-model chatbots, enabling them to follow a specific instruction.

Chatbot Language Modelling

Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models

1 code implementation20 Nov 2023 Rohit Gandikota, Joanna Materzynska, Tingrui Zhou, Antonio Torralba, David Bau

We present a method to create interpretable concept sliders that enable precise control over attributes in image generations from diffusion models.

Image Generation

Future Lens: Anticipating Subsequent Tokens from a Single Hidden State

no code implementations8 Nov 2023 Koyena Pal, Jiuding Sun, Andrew Yuan, Byron C. Wallace, David Bau

More concretely, in this paper we ask: Given a hidden (internal) representation of a single token at position $t$ in an input, can we reliably anticipate the tokens that will appear at positions $\geq t + 2$?

Function Vectors in Large Language Models

no code implementations23 Oct 2023 Eric Todd, Millicent L. Li, Arnab Sen Sharma, Aaron Mueller, Byron C. Wallace, David Bau

Using causal mediation analysis on a diverse range of in-context-learning (ICL) tasks, we find that a small number attention heads transport a compact representation of the demonstrated task, which we call a function vector (FV).

In-Context Learning

FIND: A Function Description Benchmark for Evaluating Interpretability Methods

1 code implementation NeurIPS 2023 Sarah Schwettmann, Tamar Rott Shaham, Joanna Materzynska, Neil Chowdhury, Shuang Li, Jacob Andreas, David Bau, Antonio Torralba

FIND contains functions that resemble components of trained neural networks, and accompanying descriptions of the kind we seek to generate.

Unified Concept Editing in Diffusion Models

1 code implementation25 Aug 2023 Rohit Gandikota, Hadas Orgad, Yonatan Belinkov, Joanna Materzyńska, David Bau

Text-to-image models suffer from various safety issues that may limit their suitability for deployment.

Linearity of Relation Decoding in Transformer Language Models

1 code implementation17 Aug 2023 Evan Hernandez, Arnab Sen Sharma, Tal Haklay, Kevin Meng, Martin Wattenberg, Jacob Andreas, Yonatan Belinkov, David Bau

Linear relation representations may be obtained by constructing a first-order approximation to the LM from a single prompt, and they exist for a variety of factual, commonsense, and linguistic relations.


Multimodal Neurons in Pretrained Text-Only Transformers

no code implementations3 Aug 2023 Sarah Schwettmann, Neil Chowdhury, Samuel Klein, David Bau, Antonio Torralba

Language models demonstrate remarkable capacity to generalize representations learned in one modality to downstream tasks in other modalities.

Image Captioning

Discovering Variable Binding Circuitry with Desiderata

no code implementations7 Jul 2023 Xander Davies, Max Nadeau, Nikhil Prakash, Tamar Rott Shaham, David Bau

Recent work has shown that computation in language models may be human-understandable, with successful efforts to localize and intervene on both single-unit features and input-output circuits.

Erasing Concepts from Diffusion Models

2 code implementations ICCV 2023 Rohit Gandikota, Joanna Materzynska, Jaden Fiotto-Kaufman, David Bau

We propose a fine-tuning method that can erase a visual concept from a pre-trained diffusion model, given only the name of the style and using negative guidance as a teacher.

Text-based Image Editing

Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task

2 code implementations24 Oct 2022 Kenneth Li, Aspen K. Hopkins, David Bau, Fernanda Viégas, Hanspeter Pfister, Martin Wattenberg

Language models show a surprising range of capabilities, but the source of their apparent competence is unclear.

Mass-Editing Memory in a Transformer

2 code implementations13 Oct 2022 Kevin Meng, Arnab Sen Sharma, Alex Andonian, Yonatan Belinkov, David Bau

Recent work has shown exciting promise in updating large language models with new memories, so as to replace obsolete information or add specialized knowledge.

Language Modelling

Content-Based Search for Deep Generative Models

1 code implementation6 Oct 2022 Daohan Lu, Sheng-Yu Wang, Nupur Kumari, Rohan Agarwal, Mia Tang, David Bau, Jun-Yan Zhu

To address this need, we introduce the task of content-based model search: given a query and a large set of generative models, finding the models that best match the query.

Contrastive Learning Image and Sketch based Model Retrieval +4

Rewriting Geometric Rules of a GAN

1 code implementation28 Jul 2022 Sheng-Yu Wang, David Bau, Jun-Yan Zhu

Our method allows a user to create a model that synthesizes endless objects with defined geometric changes, enabling the creation of a new generative model without the burden of curating a large-scale dataset.

Local Relighting of Real Scenes

2 code implementations6 Jul 2022 Audrey Cui, Ali Jahanian, Agata Lapedriza, Antonio Torralba, Shahin Mahdizadehaghdam, Rohit Kumar, David Bau

We introduce the task of local relighting, which changes a photograph of a scene by switching on and off the light sources that are visible within the image.

Image Relighting

Disentangling visual and written concepts in CLIP

no code implementations CVPR 2022 Joanna Materzynska, Antonio Torralba, David Bau

The CLIP network measures the similarity between natural text and images; in this work, we investigate the entanglement of the representation of word images and natural images in its image encoder.


Locating and Editing Factual Associations in GPT

2 code implementations10 Feb 2022 Kevin Meng, David Bau, Alex Andonian, Yonatan Belinkov

To test our hypothesis that these computations correspond to factual association recall, we modify feed-forward weights to update specific factual associations using Rank-One Model Editing (ROME).

counterfactual Model Editing +2

Natural Language Descriptions of Deep Visual Features

2 code implementations26 Jan 2022 Evan Hernandez, Sarah Schwettmann, David Bau, Teona Bagashvili, Antonio Torralba, Jacob Andreas

Given a neuron, MILAN generates a description by searching for a natural language string that maximizes pointwise mutual information with the image regions in which the neuron is active.


Editing a classifier by rewriting its prediction rules

1 code implementation NeurIPS 2021 Shibani Santurkar, Dimitris Tsipras, Mahalaxmi Elango, David Bau, Antonio Torralba, Aleksander Madry

We present a methodology for modifying the behavior of a classifier by directly rewriting its prediction rules.

Toward a Visual Concept Vocabulary for GAN Latent Space

1 code implementation ICCV 2021 Sarah Schwettmann, Evan Hernandez, David Bau, Samuel Klein, Jacob Andreas, Antonio Torralba

A large body of recent work has identified transformations in the latent spaces of generative adversarial networks (GANs) that consistently and interpretably transform generated images.


Natural Language Descriptions of Deep Features

no code implementations ICLR 2022 Evan Hernandez, Sarah Schwettmann, David Bau, Teona Bagashvili, Antonio Torralba, Jacob Andreas

Given a neuron, MILAN generates a description by searching for a natural language string that maximizes pointwise mutual information with the image regions in which the neuron is active.


Sketch Your Own GAN

1 code implementation ICCV 2021 Sheng-Yu Wang, David Bau, Jun-Yan Zhu

In particular, we change the weights of an original GAN model according to user sketches.

Diversity Image Generation

What makes fake images detectable? Understanding properties that generalize

1 code implementation ECCV 2020 Lucy Chai, David Bau, Ser-Nam Lim, Phillip Isola

The quality of image generation and manipulation is reaching impressive levels, making it increasingly difficult for a human to distinguish between what is real and what is fake.

Image Generation

Rewriting a Deep Generative Model

3 code implementations ECCV 2020 David Bau, Steven Liu, Tongzhou Wang, Jun-Yan Zhu, Antonio Torralba

To address the problem, we propose a formulation in which the desired rule is changed by manipulating a layer of a deep network as a linear associative memory.

Dissecting Pruned Neural Networks

no code implementations29 Jun 2019 Jonathan Frankle, David Bau

Namely, we consider the effect of removing unnecessary structure on the number of hidden units that learn disentangled representations of human-recognizable concepts as identified by network dissection.

On the Units of GANs (Extended Abstract)

no code implementations29 Jan 2019 David Bau, Jun-Yan Zhu, Hendrik Strobelt, Bolei Zhou, Joshua B. Tenenbaum, William T. Freeman, Antonio Torralba

We quantify the causal effect of interpretable units by measuring the ability of interventions to control objects in the output.

Interpretable Basis Decomposition for Visual Explanation

1 code implementation ECCV 2018 Bolei Zhou, Yiyou Sun, David Bau, Antonio Torralba

Explanations of the decisions made by a deep neural network are important for human end-users to be able to understand and diagnose the trustworthiness of the system.

Revisiting the Importance of Individual Units in CNNs via Ablation

no code implementations7 Jun 2018 Bolei Zhou, Yiyou Sun, David Bau, Antonio Torralba

We confirm that unit attributes such as class selectivity are a poor predictor for impact on overall accuracy as found previously in recent work \cite{morcos2018importance}.

General Classification

Interpreting Deep Visual Representations via Network Dissection

2 code implementations15 Nov 2017 Bolei Zhou, David Bau, Aude Oliva, Antonio Torralba

In this work, we describe Network Dissection, a method that interprets networks by providing labels for the units of their deep visual representations.

Network Dissection: Quantifying Interpretability of Deep Visual Representations

1 code implementation CVPR 2017 David Bau, Bolei Zhou, Aditya Khosla, Aude Oliva, Antonio Torralba

Given any CNN model, the proposed method draws on a broad data set of visual concepts to score the semantics of hidden units at each intermediate convolutional layer.

Cannot find the paper you are looking for? You can Submit a new open access paper.