Dataset replication is a useful tool for assessing whether models have overfit to a specific validation set or the exact circumstances under which it was generated.
In a backdoor attack, an adversary inserts maliciously constructed backdoor examples into a training set to make the resulting model vulnerable to manipulation.
For example, we are able to train an ImageNet ResNet-50 model to 75\% in only 20 mins on a single machine.
In particular, we introduce the notion of a baseline feed: the content that a user would see without filtering (e. g., on Twitter, this could be the chronological timeline).
That is, computationally tractable methods can struggle with accurately attributing model predictions in non-convex settings (e. g., in the context of deep neural networks), while methods that are effective in such regimes require training thousands of models, which makes them impractical for large models or datasets.
In this work, we introduce the notion of a dataset interface: a framework that, given an input dataset and a user-specified shift, returns instances from that input distribution that exhibit the desired shift.
We study the problem of (learning) algorithm comparison, where the goal is to find differences between models trained with two different learning algorithms.
It is commonly believed that in transfer learning including more pre-training data translates into better performance.
Using transfer learning to adapt a pre-trained "source model" to a downstream "target task" can dramatically increase performance with seemingly no downside.
Moreover, by combining our framework with off-the-shelf diffusion models, we can generate images that are especially challenging for the analyzed model, and thus can be used to perform synthetic data augmentation that helps remedy the model's failure modes.
Visual systems of primates are the gold standard of robust perception.
We present a conceptual framework, datamodeling, for analyzing the behavior of a model class in terms of the training data.
We identify properties of universal adversarial perturbations (UAPs) that distinguish them from standard adversarial perturbations.
We present a methodology for modifying the behavior of a classifier by directly rewriting its prediction rules.
1 code implementation • 7 Jun 2021 • Guillaume Leclerc, Hadi Salman, Andrew Ilyas, Sai Vemprala, Logan Engstrom, Vibhav Vineet, Kai Xiao, Pengchuan Zhang, Shibani Santurkar, Greg Yang, Ashish Kapoor, Aleksander Madry
We introduce 3DB: an extendable, unified framework for testing and debugging vision models using photorealistic simulation.
We study a class of realistic computer vision settings wherein one can influence the design of the objects being recognized.
As machine learning systems grow in scale, so do their training data requirements, forcing practitioners to automate and outsource the curation of training data in order to achieve state-of-the-art performance.
We develop a methodology for assessing the robustness of models to subpopulation shift---specifically, their ability to generalize to novel data subpopulations that were not observed during training.
Typically, better pre-trained models yield better transfer results, suggesting that initial accuracy is a key aspect of transfer learning performance.
We assess the tendency of state-of-the-art object recognition models to depend on signals from image backgrounds.
We study the roots of algorithmic progress in deep policy gradient algorithms through a case study on two popular algorithms: Proximal Policy Optimization (PPO) and Trust Region Policy Optimization (TRPO).
Building rich machine learning datasets in a scalable manner often necessitates a crowd-sourced data collection pipeline.
We study ImageNet-v2, a replication of the ImageNet dataset on which models exhibit a significant (11-14%) drop in accuracy, even after controlling for a standard human-in-the-loop measure of data quality.
We study the roots of algorithmic progress in deep policy gradient algorithms through a case study on two popular algorithms, Proximal Policy Optimization and Trust Region Policy Optimization.
We show that the basic classification framework alone can be used to tackle some of the most challenging tasks in image synthesis.
Ranked #56 on Image Generation on CIFAR-10 (Inception score metric)
In this work, we show that robust optimization can be re-cast as a tool for enforcing priors on the features learned by deep neural networks.
Adversarial examples have attracted significant attention in machine learning, but the reasons for their existence and pervasiveness remain unclear.
Correctly evaluating defenses against adversarial examples has proven to be extremely difficult.
We study how the behavior of deep policy gradient algorithms reflects the conceptual framework motivating their development.
We show that there may exist an inherent tension between the goal of adversarial robustness and that of standard generalization.
This suggests that such usage of the first order approximation of the discriminator, which is a de-facto standard in all the existing GAN dynamics, might be one of the factors that makes GAN training so challenging in practice.
A fundamental, and still largely unanswered, question in the context of Generative Adversarial Networks (GANs) is whether GANs are actually able to capture the key characteristics of the datasets they are trained on.
The study of adversarial robustness has so far largely focused on perturbations bound in p-norms.
While Generative Adversarial Networks (GANs) have demonstrated promising performance on multiple vision tasks, their learning dynamics are not yet well understood, both in theory and in practice.
Its principled nature also enables us to identify methods for both training and attacking neural networks that are reliable and, in a certain sense, universal.