Search Results for author: Trevor Darrell

Found 312 papers, 170 papers with code

A ConvNet for the 2020s

45 code implementations CVPR 2022 Zhuang Liu, Hanzi Mao, Chao-yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie

The "Roaring 20s" of visual recognition began with the introduction of Vision Transformers (ViTs), which quickly superseded ConvNets as the state-of-the-art image classification model.

Classification Domain Generalization +3

Caffe: Convolutional Architecture for Fast Feature Embedding

2 code implementations20 Jun 2014 Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, Trevor Darrell

The framework is a BSD-licensed C++ library with Python and MATLAB bindings for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.

Clustering Dimensionality Reduction +1

Deep Layer Aggregation

6 code implementations CVPR 2018 Fisher Yu, Dequan Wang, Evan Shelhamer, Trevor Darrell

We augment standard architectures with deeper aggregation to better fuse information across layers.

Image Classification

Learning to Segment Every Thing

3 code implementations CVPR 2018 Ronghang Hu, Piotr Dollár, Kaiming He, Trevor Darrell, Ross Girshick

Most methods for object instance segmentation require all training examples to be labeled with segmentation masks.

Instance Segmentation Segmentation +1

Context Encoders: Feature Learning by Inpainting

11 code implementations CVPR 2016 Deepak Pathak, Philipp Krahenbuhl, Jeff Donahue, Trevor Darrell, Alexei A. Efros

In order to succeed at this task, context encoders need to both understand the content of the entire image, as well as produce a plausible hypothesis for the missing part(s).

Adversarial Feature Learning

10 code implementations31 May 2016 Jeff Donahue, Philipp Krähenbühl, Trevor Darrell

The ability of the Generative Adversarial Networks (GANs) framework to learn generative models mapping from simple latent distributions to arbitrarily complex data distributions has been demonstrated empirically, with compelling results showing that the latent space of such generators captures semantic variation in the data distribution.

Adversarial Discriminative Domain Adaptation

20 code implementations CVPR 2017 Eric Tzeng, Judy Hoffman, Kate Saenko, Trevor Darrell

Adversarial learning methods are a promising approach to training robust deep networks, and can generate complex samples across diverse domains.

General Classification Unsupervised Domain Adaptation +1

DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition

8 code implementations6 Oct 2013 Jeff Donahue, Yangqing Jia, Oriol Vinyals, Judy Hoffman, Ning Zhang, Eric Tzeng, Trevor Darrell

We evaluate whether features extracted from the activation of a deep convolutional network trained in a fully supervised fashion on a large, fixed set of object recognition tasks can be re-purposed to novel generic tasks.

Clustering Domain Adaptation +3

From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations

1 code implementation3 Jan 2024 Evonne Ng, Javier Romero, Timur Bagautdinov, Shaojie Bai, Trevor Darrell, Angjoo Kanazawa, Alexander Richard

We present a framework for generating full-bodied photorealistic avatars that gesture according to the conversational dynamics of a dyadic interaction.

Quantization

Sequential Modeling Enables Scalable Learning for Large Vision Models

1 code implementation1 Dec 2023 Yutong Bai, Xinyang Geng, Karttikeya Mangalam, Amir Bar, Alan Yuille, Trevor Darrell, Jitendra Malik, Alexei A Efros

We introduce a novel sequential modeling approach which enables learning a Large Vision Model (LVM) without making use of any linguistic data.

Rethinking the Value of Network Pruning

2 code implementations ICLR 2019 Zhuang Liu, Ming-Jie Sun, Tinghui Zhou, Gao Huang, Trevor Darrell

Our observations are consistent for multiple network architectures, datasets, and tasks, which imply that: 1) training a large, over-parameterized model is often not necessary to obtain an efficient final model, 2) learned "important" weights of the large model are typically not useful for the small pruned model, 3) the pruned architecture itself, rather than a set of inherited "important" weights, is more crucial to the efficiency in the final model, which suggests that in some cases pruning can be useful as an architecture search paradigm.

Network Pruning Neural Architecture Search

Frustratingly Simple Few-Shot Object Detection

4 code implementations ICML 2020 Xin Wang, Thomas E. Huang, Trevor Darrell, Joseph E. Gonzalez, Fisher Yu

Such a simple approach outperforms the meta-learning methods by roughly 2~20 points on current benchmarks and sometimes even doubles the accuracy of the prior methods.

Few-Shot Object Detection Meta-Learning +2

VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation

1 code implementation28 Aug 2023 Xudong Wang, Ishan Misra, Ziyun Zeng, Rohit Girdhar, Trevor Darrell

Existing approaches to unsupervised video instance segmentation typically rely on motion estimates and experience difficulties tracking small or divergent motions.

Instance Segmentation Optical Flow Estimation +5

Large-Scale Study of Curiosity-Driven Learning

4 code implementations ICLR 2019 Yuri Burda, Harri Edwards, Deepak Pathak, Amos Storkey, Trevor Darrell, Alexei A. Efros

However, annotating each environment with hand-designed, dense rewards is not scalable, motivating the need for developing reward functions that are intrinsic to the agent.

Atari Games SNES Games

Neural Network Diffusion

1 code implementation20 Feb 2024 Kai Wang, Zhaopan Xu, Yukun Zhou, Zelin Zang, Trevor Darrell, Zhuang Liu, Yang You

The autoencoder extracts latent representations of a subset of the trained network parameters.

Decoder

Modeling Relationships in Referential Expressions with Compositional Modular Networks

2 code implementations CVPR 2017 Ronghang Hu, Marcus Rohrbach, Jacob Andreas, Trevor Darrell, Kate Saenko

In this paper we instead present a modular deep architecture capable of analyzing referential expressions into their component parts, identifying entities and relationships mentioned in the input expression and grounding them all in the scene.

Visual Question Answering (VQA)

Joint Monocular 3D Vehicle Detection and Tracking

1 code implementation ICCV 2019 Hou-Ning Hu, Qi-Zhi Cai, Dequan Wang, Ji Lin, Min Sun, Philipp Krähenbühl, Trevor Darrell, Fisher Yu

The framework can not only associate detections of vehicles in motion over time, but also estimate their complete 3D bounding box information from a sequence of 2D images captured on a moving platform.

3D Object Detection 3D Pose Estimation +4

Meta-Baseline: Exploring Simple Meta-Learning for Few-Shot Learning

10 code implementations ICCV 2021 Yinbo Chen, Zhuang Liu, Huijuan Xu, Trevor Darrell, Xiaolong Wang

The edge between these two lines of works has yet been underexplored, and the effectiveness of meta-learning in few-shot learning remains unclear.

Few-Shot Learning General Classification

Few-shot Object Detection via Feature Reweighting

4 code implementations ICCV 2019 Bingyi Kang, Zhuang Liu, Xin Wang, Fisher Yu, Jiashi Feng, Trevor Darrell

The feature learner extracts meta features that are generalizable to detect novel object classes, using training data from base classes with sufficient samples.

Few-Shot Learning Few-Shot Object Detection +3

Monocular Quasi-Dense 3D Object Tracking

1 code implementation12 Mar 2021 Hou-Ning Hu, Yung-Hsu Yang, Tobias Fischer, Trevor Darrell, Fisher Yu, Min Sun

Experiments on our proposed simulation data and real-world benchmarks, including KITTI, nuScenes, and Waymo datasets, show that our tracking framework offers robust object association and tracking on urban-driving scenarios.

3D Object Tracking Autonomous Driving +3

PAIR-Diffusion: A Comprehensive Multimodal Object-Level Image Editor

1 code implementation30 Mar 2023 Vidit Goel, Elia Peruzzo, Yifan Jiang, Dejia Xu, Xingqian Xu, Nicu Sebe, Trevor Darrell, Zhangyang Wang, Humphrey Shi

We propose PAIR Diffusion, a generic framework that can enable a diffusion model to control the structure and appearance properties of each object in the image.

Object

Multi-Content GAN for Few-Shot Font Style Transfer

6 code implementations CVPR 2018 Samaneh Azadi, Matthew Fisher, Vladimir Kim, Zhaowen Wang, Eli Shechtman, Trevor Darrell

In this work, we focus on the challenge of taking partial observations of highly-stylized text and generalizing the observations to generate unobserved glyphs in the ornamented typeface.

Font Style Transfer

Neural Module Networks

1 code implementation CVPR 2016 Jacob Andreas, Marcus Rohrbach, Trevor Darrell, Dan Klein

Visual question answering is fundamentally compositional in nature---a question like "where is the dog?"

Visual Question Answering

BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning

4 code implementations CVPR 2020 Fisher Yu, Haofeng Chen, Xin Wang, Wenqi Xian, Yingying Chen, Fangchen Liu, Vashisht Madhavan, Trevor Darrell

Datasets drive vision progress, yet existing driving datasets are impoverished in terms of visual content and supported tasks to study multitask learning for autonomous driving.

Autonomous Driving Domain Adaptation +8

Quasi-Dense Similarity Learning for Multiple Object Tracking

3 code implementations CVPR 2021 Jiangmiao Pang, Linlu Qiu, Xia Li, Haofeng Chen, Qi Li, Trevor Darrell, Fisher Yu

Compared to methods with similar detectors, it boosts almost 10 points of MOTA and significantly decreases the number of ID switches on BDD100K and Waymo datasets.

Contrastive Learning Metric Learning +4

K-LITE: Learning Transferable Visual Models with External Knowledge

2 code implementations20 Apr 2022 Sheng Shen, Chunyuan Li, Xiaowei Hu, Jianwei Yang, Yujia Xie, Pengchuan Zhang, Zhe Gan, Lijuan Wang, Lu Yuan, Ce Liu, Kurt Keutzer, Trevor Darrell, Anna Rohrbach, Jianfeng Gao

We propose K-LITE, a simple strategy to leverage external knowledge for building transferable visual systems: In training, it enriches entities in text with WordNet and Wiktionary knowledge, leading to an efficient and scalable approach to learning image representations that uses knowledge about the visual concepts.

Benchmarking Descriptive +4

LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models

1 code implementation23 May 2023 Long Lian, Boyi Li, Adam Yala, Trevor Darrell

Our method significantly outperforms the base diffusion model and several strong baselines in accurately generating images according to prompts that require various capabilities, doubling the generation accuracy across four tasks on average.

Common Sense Reasoning Language Modelling +2

DETReg: Unsupervised Pretraining with Region Priors for Object Detection

1 code implementation CVPR 2022 Amir Bar, Xin Wang, Vadim Kantorov, Colorado J Reed, Roei Herzig, Gal Chechik, Anna Rohrbach, Trevor Darrell, Amir Globerson

Recent self-supervised pretraining methods for object detection largely focus on pretraining the backbone of the object detector, neglecting key parts of detection architecture.

Few-Shot Learning Few-Shot Object Detection +6

Dropout Reduces Underfitting

1 code implementation2 Mar 2023 Zhuang Liu, Zhiqiu Xu, Joseph Jin, Zhiqiang Shen, Trevor Darrell

Additionally, we explore a symmetric technique for regularizing overfitting models - late dropout, where dropout is not used in the early iterations and is only activated later in training.

Semi-supervised Domain Adaptation via Minimax Entropy

3 code implementations ICCV 2019 Kuniaki Saito, Donghyun Kim, Stan Sclaroff, Trevor Darrell, Kate Saenko

Contemporary domain adaptation methods are very effective at aligning feature distributions of source and target domains without any target supervision.

Domain Adaptation Semi-supervised Domain Adaptation

Visual Prompting via Image Inpainting

1 code implementation1 Sep 2022 Amir Bar, Yossi Gandelsman, Trevor Darrell, Amir Globerson, Alexei A. Efros

How does one adapt a pre-trained visual model to novel downstream tasks without task-specific finetuning or any model modification?

Colorization Edge Detection +6

Learning Features by Watching Objects Move

1 code implementation CVPR 2017 Deepak Pathak, Ross Girshick, Piotr Dollár, Trevor Darrell, Bharath Hariharan

Given the extensive evidence that motion plays a key role in the development of the human visual system, we hope that this straightforward approach to unsupervised learning will be more effective than cleverly designed 'pretext' tasks studied in the literature.

object-detection Object Detection +1

Adversarial Continual Learning

1 code implementation ECCV 2020 Sayna Ebrahimi, Franziska Meier, Roberto Calandra, Trevor Darrell, Marcus Rohrbach

We show that shared features are significantly less prone to forgetting and propose a novel hybrid continual learning framework that learns a disjoint representation for task-invariant and task-specific features required to solve a sequence of tasks.

Continual Learning Image Classification

SkipNet: Learning Dynamic Routing in Convolutional Networks

2 code implementations ECCV 2018 Xin Wang, Fisher Yu, Zi-Yi Dou, Trevor Darrell, Joseph E. Gonzalez

While deeper convolutional networks are needed to achieve maximum accuracy in visual perception tasks, for many inputs shallower networks are sufficient.

Decision Making

PyTouch: A Machine Learning Library for Touch Processing

1 code implementation26 May 2021 Mike Lambeta, Huazhe Xu, Jingwei Xu, Po-Wei Chou, Shaoxiong Wang, Trevor Darrell, Roberto Calandra

With the increased availability of rich tactile sensors, there is an equally proportional need for open-source and integrated software capable of efficiently and effectively processing raw touch measurements into high-level signals that can be used for control and decision-making.

BIG-bench Machine Learning Decision Making +1

End-to-end Learning of Driving Models from Large-scale Video Datasets

2 code implementations CVPR 2017 Huazhe Xu, Yang Gao, Fisher Yu, Trevor Darrell

Robust perception-action models should be learned from training data with diverse visual appearances and realistic behaviors, yet current approaches to deep visuomotor policy learning have been generally limited to in-situ models learned from a single vehicle or a simulation environment.

Scene Segmentation

Grounding of Textual Phrases in Images by Reconstruction

3 code implementations12 Nov 2015 Anna Rohrbach, Marcus Rohrbach, Ronghang Hu, Trevor Darrell, Bernt Schiele

We propose a novel approach which learns grounding by reconstructing a given phrase using an attention mechanism, which can be either latent or optimized directly.

Language Modelling Natural Language Visual Grounding +2

Compact Bilinear Pooling

8 code implementations CVPR 2016 Yang Gao, Oscar Beijbom, Ning Zhang, Trevor Darrell

Bilinear models has been shown to achieve impressive performance on a wide range of visual tasks, such as semantic segmentation, fine grained recognition and face recognition.

Face Recognition Few-Shot Learning +3

Variational Adversarial Active Learning

6 code implementations ICCV 2019 Samarth Sinha, Sayna Ebrahimi, Trevor Darrell

Unlike conventional active learning algorithms, our approach is task agnostic, i. e., it does not depend on the performance of the task for which we are trying to acquire labeled data.

Active Learning Image Classification +1

When Do We Not Need Larger Vision Models?

1 code implementation19 Mar 2024 Baifeng Shi, Ziyang Wu, Maolin Mao, Xin Wang, Trevor Darrell

Our results show that a multi-scale smaller model has comparable learning capacity to a larger model, and pre-training smaller models with S$^2$ can match or even exceed the advantage of larger models.

Depth Estimation

Hierarchical Discrete Distribution Decomposition for Match Density Estimation

2 code implementations CVPR 2019 Zhichao Yin, Trevor Darrell, Fisher Yu

Explicit representations of the global match distributions of pixel-wise correspondences between pairs of images are desirable for uncertainty estimation and downstream applications.

Density Estimation Optical Flow Estimation +2

On-target Adaptation

1 code implementation2 Sep 2021 Dequan Wang, Shaoteng Liu, Sayna Ebrahimi, Evan Shelhamer, Trevor Darrell

Domain adaptation seeks to mitigate the shift between training on the \emph{source} domain and testing on the \emph{target} domain.

Domain Adaptation

Masked Visual Pre-training for Motor Control

1 code implementation11 Mar 2022 Tete Xiao, Ilija Radosavovic, Trevor Darrell, Jitendra Malik

This paper shows that self-supervised visual pre-training from real-world images is effective for learning motor control tasks from pixels.

Real-World Robot Learning with Masked Visual Pre-training

1 code implementation6 Oct 2022 Ilija Radosavovic, Tete Xiao, Stephen James, Pieter Abbeel, Jitendra Malik, Trevor Darrell

Finally, we train a 307M parameter vision transformer on a massive collection of 4. 5M images from the Internet and egocentric videos, and demonstrate clearly the benefits of scaling visual pre-training for robot learning.

Localizing Moments in Video with Natural Language

2 code implementations ICCV 2017 Lisa Anne Hendricks, Oliver Wang, Eli Shechtman, Josef Sivic, Trevor Darrell, Bryan Russell

A key obstacle to training our MCN model is that current video datasets do not include pairs of localized video segments and referring expressions, or text descriptions which uniquely identify a corresponding moment.

Natural Language Queries

TOAST: Transfer Learning via Attention Steering

1 code implementation24 May 2023 Baifeng Shi, Siyu Gai, Trevor Darrell, Xin Wang

We introduce Top-Down Attention Steering (TOAST), a novel transfer learning algorithm that keeps the pre-trained backbone frozen, selects task-relevant features in the output, and feeds those features back to the model to steer the attention to the task-specific features.

Fine-Grained Image Classification Instruction Following +2

Women also Snowboard: Overcoming Bias in Captioning Models

2 code implementations ECCV 2018 Kaylee Burns, Lisa Anne Hendricks, Kate Saenko, Trevor Darrell, Anna Rohrbach

We introduce a new Equalizer model that ensures equal gender probability when gender evidence is occluded in a scene and confident predictions when gender evidence is present.

Image Captioning

Top-Down Visual Attention from Analysis by Synthesis

1 code implementation CVPR 2023 Baifeng Shi, Trevor Darrell, Xin Wang

In this paper, we consider top-down attention from a classic Analysis-by-Synthesis (AbS) perspective of vision.

Retrieval Semantic Segmentation +1

Initializing Models with Larger Ones

1 code implementation30 Nov 2023 Zhiqiu Xu, Yanjie Chen, Kirill Vishniakov, Yida Yin, Zhiqiang Shen, Trevor Darrell, Lingjie Liu, Zhuang Liu

Weight selection offers a new approach to leverage the power of pretrained models in resource-constrained settings, and we hope it can be a useful tool for training small models in the large-model era.

Knowledge Distillation

Few-Shot Segmentation Propagation with Guided Networks

1 code implementation25 May 2018 Kate Rakelly, Evan Shelhamer, Trevor Darrell, Alexei A. Efros, Sergey Levine

Learning-based methods for visual segmentation have made progress on particular types of segmentation tasks, but are limited by the necessary supervision, the narrow definitions of fixed tasks, and the lack of control during inference for correcting errors.

Interactive Segmentation Segmentation +3

Clockwork Convnets for Video Semantic Segmentation

1 code implementation11 Aug 2016 Evan Shelhamer, Kate Rakelly, Judy Hoffman, Trevor Darrell

Recent years have seen tremendous progress in still-image segmentation; however the na\"ive application of these state-of-the-art algorithms to every video frame requires considerable computation and ignores the temporal continuity inherent in video.

Image Segmentation Scheduling +4

Data-dependent Initializations of Convolutional Neural Networks

2 code implementations21 Nov 2015 Philipp Krähenbühl, Carl Doersch, Jeff Donahue, Trevor Darrell

Convolutional Neural Networks spread through computer vision like a wildfire, impacting almost all visual tasks imaginable.

Image Classification object-detection +2

Something-Else: Compositional Action Recognition with Spatial-Temporal Interaction Networks

1 code implementation CVPR 2020 Joanna Materzynska, Tete Xiao, Roei Herzig, Huijuan Xu, Xiaolong Wang, Trevor Darrell

Human action is naturally compositional: humans can easily recognize and perform actions with objects that are different from those used in training demonstrations.

Action Recognition Object

Unsupervised Universal Image Segmentation

1 code implementation28 Dec 2023 Dantong Niu, Xudong Wang, Xinyang Han, Long Lian, Roei Herzig, Trevor Darrell

Several unsupervised image segmentation approaches have been proposed which eliminate the need for dense manually-annotated segmentation masks; current models separately handle either semantic segmentation (e. g., STEGO) or class-agnostic instance segmentation (e. g., CutLER), but not both (i. e., panoptic segmentation).

Image Segmentation Instance Segmentation +7

Speaker-Follower Models for Vision-and-Language Navigation

1 code implementation NeurIPS 2018 Daniel Fried, Ronghang Hu, Volkan Cirik, Anna Rohrbach, Jacob Andreas, Louis-Philippe Morency, Taylor Berg-Kirkpatrick, Kate Saenko, Dan Klein, Trevor Darrell

We use this speaker model to (1) synthesize new instructions for data augmentation and to (2) implement pragmatic reasoning, which evaluates how well candidate action sequences explain an instruction.

Data Augmentation Vision and Language Navigation

Natural Language Object Retrieval

1 code implementation CVPR 2016 Ronghang Hu, Huazhe Xu, Marcus Rohrbach, Jiashi Feng, Kate Saenko, Trevor Darrell

In this paper, we address the task of natural language object retrieval, to localize a target object within a given image based on a natural language query of the object.

Image Captioning Image Retrieval +4

Body2Hands: Learning to Infer 3D Hands from Conversational Gesture Body Dynamics

1 code implementation CVPR 2021 Evonne Ng, Shiry Ginosar, Trevor Darrell, Hanbyul Joo

We demonstrate the efficacy of our method on hand gesture synthesis from body motion input, and as a strong body prior for single-view image-based 3D hand pose estimation.

3D Hand Pose Estimation

Towards Practical Multi-Object Manipulation using Relational Reinforcement Learning

1 code implementation23 Dec 2019 Richard Li, Allan Jabri, Trevor Darrell, Pulkit Agrawal

Learning robotic manipulation tasks using reinforcement learning with sparse rewards is currently impractical due to the outrageous data requirements.

Object reinforcement-learning +2

Long-term Recurrent Convolutional Networks for Visual Recognition and Description

7 code implementations CVPR 2015 Jeff Donahue, Lisa Anne Hendricks, Marcus Rohrbach, Subhashini Venugopalan, Sergio Guadarrama, Kate Saenko, Trevor Darrell

Models based on deep convolutional networks have dominated recent image interpretation tasks; we investigate whether models which are also recurrent, or "temporally deep", are effective for tasks involving sequences, visual and otherwise.

Retrieval Video Recognition

Language-Conditioned Graph Networks for Relational Reasoning

1 code implementation ICCV 2019 Ronghang Hu, Anna Rohrbach, Trevor Darrell, Kate Saenko

E. g., conditioning on the "on" relationship to the plate, the object "mug" gathers messages from the object "plate" to update its representation to "mug on the plate", which can be easily consumed by a simple classifier for answer prediction.

Object Referring Expression Comprehension +2

Segmentation from Natural Language Expressions

4 code implementations20 Mar 2016 Ronghang Hu, Marcus Rohrbach, Trevor Darrell

To produce pixelwise segmentation for the language expression, we propose an end-to-end trainable recurrent and convolutional network model that jointly learns to process visual and linguistic information.

Referring Expression Segmentation Segmentation +1

Region Similarity Representation Learning

1 code implementation ICCV 2021 Tete Xiao, Colorado J Reed, Xiaolong Wang, Kurt Keutzer, Trevor Darrell

We present Region Similarity Representation Learning (ReSim), a new approach to self-supervised representation learning for localization-based tasks such as object detection and segmentation.

Instance Segmentation Object +5

Contrastive Test-Time Adaptation

1 code implementation CVPR 2022 Dian Chen, Dequan Wang, Trevor Darrell, Sayna Ebrahimi

We propose a novel way to leverage self-supervised contrastive learning to facilitate target feature learning, along with an online pseudo labeling scheme with refinement that significantly denoises pseudo labels.

Contrastive Learning Test-time Adaptation +1

Learning Instance Segmentation by Interaction

1 code implementation21 Jun 2018 Deepak Pathak, Yide Shentu, Dian Chen, Pulkit Agrawal, Trevor Darrell, Sergey Levine, Jitendra Malik

The agent uses its current segmentation model to infer pixels that constitute objects and refines the segmentation model by interacting with these pixels.

Instance Segmentation Segmentation +1

Unsupervised Domain Adaptation through Self-Supervision

3 code implementations26 Sep 2019 Yu Sun, Eric Tzeng, Trevor Darrell, Alexei A. Efros

This paper addresses unsupervised domain adaptation, the setting where labeled training data is available on a source domain, but the goal is to have good performance on a target domain with only unlabeled data.

Unsupervised Domain Adaptation

Deep Compositional Captioning: Describing Novel Object Categories without Paired Training Data

1 code implementation CVPR 2016 Lisa Anne Hendricks, Subhashini Venugopalan, Marcus Rohrbach, Raymond Mooney, Kate Saenko, Trevor Darrell

Current deep caption models can only describe objects contained in paired image-sentence corpora, despite the fact that they are pre-trained with large object recognition datasets, namely ImageNet.

Image Captioning Novel Concepts +3

ReCLIP: A Strong Zero-Shot Baseline for Referring Expression Comprehension

2 code implementations ACL 2022 Sanjay Subramanian, William Merrill, Trevor Darrell, Matt Gardner, Sameer Singh, Anna Rohrbach

Training a referring expression comprehension (ReC) model for a new visual domain requires collecting referring expressions, and potentially corresponding bounding boxes, for images in the domain.

Image Classification Referring Expression +1

Robust Object Detection via Instance-Level Temporal Cycle Confusion

1 code implementation ICCV 2021 Xin Wang, Thomas E. Huang, Benlin Liu, Fisher Yu, Xiaolong Wang, Joseph E. Gonzalez, Trevor Darrell

Building reliable object detectors that are robust to domain shifts, such as various changes in context, viewpoint, and object appearances, is critical for real-world applications.

Object object-detection +2

Deep Domain Confusion: Maximizing for Domain Invariance

7 code implementations10 Dec 2014 Eric Tzeng, Judy Hoffman, Ning Zhang, Kate Saenko, Trevor Darrell

Recent reports suggest that a generic supervised deep CNN model trained on a large-scale dataset reduces, but does not remove, dataset bias on a standard benchmark.

Domain Adaptation Model Selection +1

Explainable Neural Computation via Stack Neural Module Networks

1 code implementation ECCV 2018 Ronghang Hu, Jacob Andreas, Trevor Darrell, Kate Saenko

In complex inferential tasks like question answering, machine learning models must confront two challenges: the need to implement a compositional reasoning process, and, in many applications, the need for this reasoning process to be interpretable to assist users in both development and prediction.

Decision Making Question Answering +1

FCNs in the Wild: Pixel-level Adversarial and Constraint-based Adaptation

3 code implementations8 Dec 2016 Judy Hoffman, Dequan Wang, Fisher Yu, Trevor Darrell

In this paper, we introduce the first domain adaptive semantic segmentation method, proposing an unsupervised adversarial approach to pixel prediction problems.

Semantic Segmentation Synthetic-to-Real Translation

Sequence to Sequence -- Video to Text

4 code implementations3 May 2015 Subhashini Venugopalan, Marcus Rohrbach, Jeff Donahue, Raymond Mooney, Trevor Darrell, Kate Saenko

Our LSTM model is trained on video-sentence pairs and learns to associate a sequence of video frames to a sequence of words in order to generate a description of the event in the video clip.

Caption Generation Language Modelling +1

Textual Explanations for Self-Driving Vehicles

2 code implementations ECCV 2018 Jinkyu Kim, Anna Rohrbach, Trevor Darrell, John Canny, Zeynep Akata

Finally, we explore a version of our model that generates rationalizations, and compare with introspective explanations on the same video segments.

Constrained Convolutional Neural Networks for Weakly Supervised Segmentation

1 code implementation ICCV 2015 Deepak Pathak, Philipp Krähenbühl, Trevor Darrell

We propose Constrained CNN (CCNN), a method which uses a novel loss function to optimize for any set of linear constraints on the output space (i. e. predicted label distribution) of a CNN.

Image Segmentation Semantic Segmentation +2

Compositional GAN: Learning Image-Conditional Binary Composition

1 code implementation19 Jul 2018 Samaneh Azadi, Deepak Pathak, Sayna Ebrahimi, Trevor Darrell

Generative Adversarial Networks (GANs) can produce images of remarkable complexity and realism but are generally structured to sample from a single latent source ignoring the explicit spatial interaction between multiple entities that could be present in a scene.

Back to the Source: Diffusion-Driven Test-Time Adaptation

1 code implementation7 Jul 2022 Jin Gao, Jialing Zhang, Xihui Liu, Trevor Darrell, Evan Shelhamer, Dequan Wang

We instead update the target data, by projecting all test inputs toward the source domain with a generative diffusion model.

Test-time Adaptation

Self-Supervised Pretraining Improves Self-Supervised Pretraining

1 code implementation23 Mar 2021 Colorado J. Reed, Xiangyu Yue, Ani Nrusimha, Sayna Ebrahimi, Vivek Vijaykumar, Richard Mao, Bo Li, Shanghang Zhang, Devin Guillory, Sean Metzger, Kurt Keutzer, Trevor Darrell

Through experimentation on 16 diverse vision datasets, we show HPT converges up to 80x faster, improves accuracy across tasks, and improves the robustness of the self-supervised pretraining process to changes in the image augmentation policy or amount of pretraining data.

Image Augmentation

Describing Differences in Image Sets with Natural Language

1 code implementation5 Dec 2023 Lisa Dunlap, Yuhui Zhang, Xiaohan Wang, Ruiqi Zhong, Trevor Darrell, Jacob Steinhardt, Joseph E. Gonzalez, Serena Yeung-Levy

To aid in this discovery process, we explore the task of automatically describing the differences between two $\textbf{sets}$ of images, which we term Set Difference Captioning.

Language Modelling

Diversify Your Vision Datasets with Automatic Diffusion-Based Augmentation

1 code implementation NeurIPS 2023 Lisa Dunlap, Alyssa Umino, Han Zhang, Jiezhi Yang, Joseph E. Gonzalez, Trevor Darrell

As such, we explore how natural language descriptions of the domains seen in training data can be used with large vision models trained on diverse pretraining datasets to generate useful variations of the training data.

Domain Generalization Image Augmentation

TAFE-Net: Task-Aware Feature Embeddings for Low Shot Learning

1 code implementation CVPR 2019 Xin Wang, Fisher Yu, Ruth Wang, Trevor Darrell, Joseph E. Gonzalez

We show that TAFE-Net is highly effective in generalizing to new tasks or concepts and evaluate the TAFE-Net on a range of benchmarks in zero-shot and few-shot learning.

Attribute Few-Shot Learning +1

Task-Aware Feature Generation for Zero-Shot Compositional Learning

1 code implementation11 Jun 2019 Xin Wang, Fisher Yu, Trevor Darrell, Joseph E. Gonzalez

In this work, we propose a task-aware feature generation (TFG) framework for compositional learning, which generates features of novel visual concepts by transferring knowledge from previously seen concepts.

Novel Concepts Zero-Shot Learning

Auto-Tuned Sim-to-Real Transfer

1 code implementation15 Apr 2021 Yuqing Du, Olivia Watkins, Trevor Darrell, Pieter Abbeel, Deepak Pathak

Policies trained in simulation often fail when transferred to the real world due to the `reality gap' where the simulator is unable to accurately capture the dynamics and visual properties of the real world.

Generalized orderless pooling performs implicit salient matching

2 code implementations ICCV 2017 Marcel Simon, Yang Gao, Trevor Darrell, Joachim Denzler, Erik Rodner

In this paper, we generalize average and bilinear pooling to "alpha-pooling", allowing for learning the pooling strategy during training.

Multitask Vision-Language Prompt Tuning

1 code implementation21 Nov 2022 Sheng Shen, Shijia Yang, Tianjun Zhang, Bohan Zhai, Joseph E. Gonzalez, Kurt Keutzer, Trevor Darrell

Specifically, (i) we demonstrate the effectiveness of learning a single transferable prompt from multiple source tasks to initialize the prompt for each target task; (ii) we show many target tasks can benefit each other from sharing prompt vectors and thus can be jointly learned via multitask prompt tuning.

Visual Prompt Tuning

Using Language to Extend to Unseen Domains

1 code implementation18 Oct 2022 Lisa Dunlap, Clara Mohri, Devin Guillory, Han Zhang, Trevor Darrell, Joseph E. Gonzalez, aditi raghunathan, Anja Rohrbach

It is expensive to collect training data for every possible domain that a vision model may encounter when deployed.

Domain Adaptation

Object Hallucination in Image Captioning

1 code implementation EMNLP 2018 Anna Rohrbach, Lisa Anne Hendricks, Kaylee Burns, Trevor Darrell, Kate Saenko

Despite continuously improving performance, contemporary image captioning models are prone to "hallucinating" objects that are not actually in a scene.

Hallucination Image Captioning +2

xT: Nested Tokenization for Larger Context in Large Images

1 code implementation4 Mar 2024 Ritwik Gupta, Shufan Li, Tyler Zhu, Jitendra Malik, Trevor Darrell, Karttikeya Mangalam

Modern computer vision pipelines handle large images in one of two sub-optimal ways: down-sampling or cropping.

Early Convolutions Help Transformers See Better

1 code implementation NeurIPS 2021 Tete Xiao, Mannat Singh, Eric Mintun, Trevor Darrell, Piotr Dollár, Ross Girshick

To test whether this atypical design choice causes an issue, we analyze the optimization behavior of ViT models with their original patchify stem versus a simple counterpart where we replace the ViT stem by a small number of stacked stride-two 3*3 convolutions.

Robust Change Captioning

1 code implementation ICCV 2019 Dong Huk Park, Trevor Darrell, Anna Rohrbach

We present a novel Dual Dynamic Attention Model (DUDA) to perform robust Change Captioning.

Natural Language Visual Grounding

Semantic Bottleneck Scene Generation

2 code implementations26 Nov 2019 Samaneh Azadi, Michael Tschannen, Eric Tzeng, Sylvain Gelly, Trevor Darrell, Mario Lucic

For the former, we use an unconditional progressive segmentation generation network that captures the distribution of realistic semantic scene layouts.

Conditional Image Generation Image-to-Image Translation +2

Object-Region Video Transformers

1 code implementation CVPR 2022 Roei Herzig, Elad Ben-Avraham, Karttikeya Mangalam, Amir Bar, Gal Chechik, Anna Rohrbach, Trevor Darrell, Amir Globerson

In this work, we present Object-Region Video Transformers (ORViT), an \emph{object-centric} approach that extends video transformer layers with a block that directly incorporates object representations.

Action Detection Few-Shot action recognition +3

PANDA: Pose Aligned Networks for Deep Attribute Modeling

1 code implementation CVPR 2014 Ning Zhang, Manohar Paluri, Marc'Aurelio Ranzato, Trevor Darrell, Lubomir Bourdev

We propose a method for inferring human attributes (such as gender, hair style, clothes style, expression, action) from images of people under large variation of viewpoint, pose, appearance, articulation and occlusion.

Attribute Facial Attribute Classification +2

Localizing Moments in Video with Temporal Language

1 code implementation EMNLP 2018 Lisa Anne Hendricks, Oliver Wang, Eli Shechtman, Josef Sivic, Trevor Darrell, Bryan Russell

To benchmark whether our model, and other recent video localization models, can effectively reason about temporal language, we collect the novel TEMPOral reasoning in video and language (TEMPO) dataset.

Natural Language Queries Retrieval +1

Simultaneous Deep Transfer Across Domains and Tasks

1 code implementation ICCV 2015 Eric Tzeng, Judy Hoffman, Trevor Darrell, Kate Saenko

Recent reports suggest that a generic supervised deep CNN model trained on a large-scale dataset reduces, but does not remove, dataset bias.

Domain Adaptation

Adversarial Inference for Multi-Sentence Video Description

1 code implementation CVPR 2019 Jae Sung Park, Marcus Rohrbach, Trevor Darrell, Anna Rohrbach

Among the main issues are the fluency and coherence of the generated descriptions, and their relevance to the video.

Image Captioning Sentence +1

Exploring Simple and Transferable Recognition-Aware Image Processing

1 code implementation21 Oct 2019 Zhuang Liu, Hung-Ju Wang, Tinghui Zhou, Zhiqiang Shen, Bingyi Kang, Evan Shelhamer, Trevor Darrell

Interestingly, the processing model's ability to enhance recognition quality can transfer when evaluated on models of different architectures, recognized categories, tasks and training datasets.

Image Retrieval Recommendation Systems

Visual Attention Emerges from Recurrent Sparse Reconstruction

1 code implementation23 Apr 2022 Baifeng Shi, Yale Song, Neel Joshi, Trevor Darrell, Xin Wang

We present VARS, Visual Attention from Recurrent Sparse reconstruction, a new attention formulation built on two prominent features of the human visual attention mechanism: recurrency and sparsity.

Learning Detection with Diverse Proposals

1 code implementation CVPR 2017 Samaneh Azadi, Jiashi Feng, Trevor Darrell

To predict a set of diverse and informative proposals with enriched representations, this paper introduces a differentiable Determinantal Point Process (DPP) layer that is able to augment the object detection architectures.

Object object-detection +1

Reliable Visual Question Answering: Abstain Rather Than Answer Incorrectly

1 code implementation28 Apr 2022 Spencer Whitehead, Suzanne Petryk, Vedaad Shakib, Joseph Gonzalez, Trevor Darrell, Anna Rohrbach, Marcus Rohrbach

We first enable abstention capabilities for several VQA models, and analyze both their coverage, the portion of questions answered, and risk, the error on that portion.

Question Answering Visual Question Answering

Compositional Video Synthesis with Action Graphs

1 code implementation27 Jun 2020 Amir Bar, Roei Herzig, Xiaolong Wang, Anna Rohrbach, Gal Chechik, Trevor Darrell, Amir Globerson

Our generative model for this task (AG2Vid) disentangles motion and appearance features, and by incorporating a scheduling mechanism for actions facilitates a timely and coordinated video generation.

Scheduling Video Generation +2

NewsCLIPpings: Automatic Generation of Out-of-Context Multimodal Media

1 code implementation EMNLP 2021 Grace Luo, Trevor Darrell, Anna Rohrbach

Online misinformation is a prevalent societal issue, with adversaries relying on tools ranging from cheap fakes to sophisticated deep fakes.

Misinformation

Voxel-informed Language Grounding

2 code implementations ACL 2022 Rodolfo Corona, Shizhan Zhu, Dan Klein, Trevor Darrell

Natural language applied to natural 2D images describes a fundamentally 3D world.

Which One? Leveraging Context Between Objects and Multiple Views for Language Grounding

2 code implementations12 Nov 2023 Chancharik Mitra, Abrar Anwar, Rodolfo Corona, Dan Klein, Trevor Darrell, Jesse Thomason

When connecting objects and their language referents in an embodied 3D environment, it is important to note that: (1) an object can be better characterized by leveraging comparative information between itself and other objects, and (2) an object's appearance can vary with camera position.

Object Position

Discriminator Rejection Sampling

2 code implementations ICLR 2019 Samaneh Azadi, Catherine Olsson, Trevor Darrell, Ian Goodfellow, Augustus Odena

We propose a rejection sampling scheme using the discriminator of a GAN to approximately correct errors in the GAN generator distribution.

Image Generation

Regularization Matters in Policy Optimization

2 code implementations21 Oct 2019 Zhuang Liu, Xuanlin Li, Bingyi Kang, Trevor Darrell

In this work, we present the first comprehensive study of regularization techniques with multiple policy optimization algorithms on continuous control tasks.

Continuous Control Reinforcement Learning (RL)

Instance-Aware Predictive Navigation in Multi-Agent Environments

1 code implementation14 Jan 2021 Jinkun Cao, Xin Wang, Trevor Darrell, Fisher Yu

To decide the action at each step, we seek the action sequence that can lead to safe future states based on the prediction module outputs by repeatedly sampling likely action sequences.

Refine and Represent: Region-to-Object Representation Learning

1 code implementation25 Aug 2022 Akash Gokul, Konstantinos Kallidromitis, Shufan Li, Yusuke Kato, Kazuki Kozuka, Trevor Darrell, Colorado J Reed

Recent works in self-supervised learning have demonstrated strong performance on scene-level dense prediction tasks by pretraining with object-centric or region-based correspondence objectives.

Object Representation Learning +4

Deep Object-Centric Representations for Generalizable Robot Learning

1 code implementation14 Aug 2017 Coline Devin, Pieter Abbeel, Trevor Darrell, Sergey Levine

We devise an object-level attentional mechanism that can be used to determine relevant objects from a few trajectories or demonstrations, and then immediately incorporate those objects into a learned policy.

Object Reinforcement Learning (RL)

Teachable Reinforcement Learning via Advice Distillation

1 code implementation NeurIPS 2021 Olivia Watkins, Trevor Darrell, Pieter Abbeel, Jacob Andreas, Abhishek Gupta

Training automated agents to complete complex tasks in interactive environments is challenging: reinforcement learning requires careful hand-engineering of reward functions, imitation learning requires specialized infrastructure and access to a human expert, and learning from intermediate forms of supervision (like binary preferences) is time-consuming and extracts little information from each human intervention.

Imitation Learning reinforcement-learning +1

Compositional Chain-of-Thought Prompting for Large Multimodal Models

1 code implementation27 Nov 2023 Chancharik Mitra, Brandon Huang, Trevor Darrell, Roei Herzig

The combination of strong visual backbones and Large Language Model (LLM) reasoning has led to Large Multimodal Models (LMMs) becoming the current standard for a wide range of vision and language (VL) tasks.

Language Modelling Large Language Model +1

Identity-Aware Multi-Sentence Video Description

1 code implementation ECCV 2020 Jae Sung Park, Trevor Darrell, Anna Rohrbach

This auxiliary task allows us to propose a two-stage approach to Identity-Aware Video Description.

Gender Prediction Sentence +1

Finding Visual Task Vectors

1 code implementation8 Apr 2024 Alberto Hojel, Yutong Bai, Trevor Darrell, Amir Globerson, Amir Bar

In this work, we analyze the activations of MAE-VQGAN, a recent Visual Prompting model, and find task vectors, activations that encode task-specific information.

Visual Prompting

Video Prediction via Example Guidance

1 code implementation ICML 2020 Jingwei Xu, Huazhe Xu, Bingbing Ni, Xiaokang Yang, Trevor Darrell

In video prediction tasks, one major challenge is to capture the multi-modal nature of future contents and dynamics.

Video Prediction

G^3: Geolocation via Guidebook Grounding

1 code implementation28 Nov 2022 Grace Luo, Giscard Biamby, Trevor Darrell, Daniel Fried, Anna Rohrbach

We propose the task of Geolocation via Guidebook Grounding that uses a dataset of StreetView images from a diverse set of locations and an associated textual guidebook for GeoGuessr, a popular interactive geolocation game.

Deep Spatial Autoencoders for Visuomotor Learning

1 code implementation21 Sep 2015 Chelsea Finn, Xin Yu Tan, Yan Duan, Trevor Darrell, Sergey Levine, Pieter Abbeel

Our method uses a deep spatial autoencoder to acquire a set of feature points that describe the environment for the current task, such as the positions of objects, and then learns a motion skill with these feature points using an efficient reinforcement learning method based on local linear models.

reinforcement-learning Reinforcement Learning (RL)

Compositional Plan Vectors

1 code implementation NeurIPS 2019 Coline Devin, Daniel Geng, Pieter Abbeel, Trevor Darrell, Sergey Levine

We show that CPVs can be learned within a one-shot imitation learning framework without any additional supervision or information about task hierarchy, and enable a demonstration-conditioned policy to generalize to tasks that sequence twice as many skills as the tasks seen during training.

Imitation Learning

Captioning Images with Diverse Objects

1 code implementation CVPR 2017 Subhashini Venugopalan, Lisa Anne Hendricks, Marcus Rohrbach, Raymond Mooney, Trevor Darrell, Kate Saenko

We propose minimizing a joint objective which can learn from these diverse data sources and leverage distributional semantic embeddings, enabling the model to generalize and describe novel objects outside of image-caption datasets.

Object Object Recognition

Twitter-COMMs: Detecting Climate, COVID, and Military Multimodal Misinformation

1 code implementation NAACL 2022 Giscard Biamby, Grace Luo, Trevor Darrell, Anna Rohrbach

Detecting out-of-context media, such as "mis-captioned" images on Twitter, is a relevant problem, especially in domains of high public significance.

Misinformation

Spatio-Temporal Action Graph Networks

1 code implementation4 Dec 2018 Roei Herzig, Elad Levi, Huijuan Xu, Hang Gao, Eli Brosh, Xiaolong Wang, Amir Globerson, Trevor Darrell

Events defined by the interaction of objects in a scene are often of critical importance; yet important events may have insufficient labeled examples to train a conventional deep model to generalize to future object appearance.

Activity Recognition Autonomous Driving +3

Fully Convolutional Multi-Class Multiple Instance Learning

1 code implementation22 Dec 2014 Deepak Pathak, Evan Shelhamer, Jonathan Long, Trevor Darrell

We propose a novel MIL formulation of multi-class semantic segmentation learning by a fully convolutional network.

Multiple Instance Learning Segmentation +1

Explaining Reinforcement Learning Policies through Counterfactual Trajectories

1 code implementation29 Jan 2022 Julius Frost, Olivia Watkins, Eric Weiner, Pieter Abbeel, Trevor Darrell, Bryan Plummer, Kate Saenko

In order for humans to confidently decide where to employ RL agents for real-world tasks, a human developer must validate that the agent will perform well at test-time.

counterfactual Decision Making +2

A Coefficient Makes SVRG Effective

1 code implementation9 Nov 2023 Yida Yin, Zhiqiu Xu, Zhiyuan Li, Trevor Darrell, Zhuang Liu

Stochastic Variance Reduced Gradient (SVRG), introduced by Johnson & Zhang (2013), is a theoretically compelling optimization method.

Image Classification

Watch Those Words: Video Falsification Detection Using Word-Conditioned Facial Motion

1 code implementation21 Dec 2021 Shruti Agarwal, Liwen Hu, Evonne Ng, Trevor Darrell, Hao Li, Anna Rohrbach

In today's era of digital misinformation, we are increasingly faced with new threats posed by video falsification techniques.

Misinformation

Recognizing Image Style

1 code implementation15 Nov 2013 Sergey Karayev, Matthew Trentacoste, Helen Han, Aseem Agarwala, Trevor Darrell, Aaron Hertzmann, Holger Winnemoeller

The style of an image plays a significant role in how it is viewed, but style has received little attention in computer vision research.

Image Retrieval TAG

Zero-shot Policy Learning with Spatial Temporal RewardDecomposition on Contingency-aware Observation

1 code implementation17 Oct 2019 Huazhe Xu, Boyuan Chen, Yang Gao, Trevor Darrell

The agent is first presented with previous experiences in the training environment, along with task description in the form of trajectory-level sparse rewards.

Continuous Control Model Predictive Control +2

Regularization Matters in Policy Optimization - An Empirical Study on Continuous Control

1 code implementation ICLR 2021 Zhuang Liu, Xuanlin Li, Bingyi Kang, Trevor Darrell

In this work, we present the first comprehensive study of regularization techniques with multiple policy optimization algorithms on continuous control tasks.

Continuous Control

CLIP-It! Language-Guided Video Summarization

1 code implementation NeurIPS 2021 Medhini Narasimhan, Anna Rohrbach, Trevor Darrell

A generic video summary is an abridged version of a video that conveys the whole story and features the most important scenes.

Query-focused Summarization Video Summarization

Deep Mixture of Experts via Shallow Embedding

no code implementations5 Jun 2018 Xin Wang, Fisher Yu, Lisa Dunlap, Yi-An Ma, Ruth Wang, Azalia Mirhoseini, Trevor Darrell, Joseph E. Gonzalez

Larger networks generally have greater representational power at the cost of increased computational complexity.

Few-Shot Learning Zero-Shot Learning

Fooling Vision and Language Models Despite Localization and Attention Mechanism

no code implementations CVPR 2018 Xiaojun Xu, Xinyun Chen, Chang Liu, Anna Rohrbach, Trevor Darrell, Dawn Song

Our work sheds new light on understanding adversarial attacks on vision systems which have a language component and shows that attention, bounding box localization, and compositional internal structures are vulnerable to adversarial attacks.

Dense Captioning Natural Language Understanding +2

Reinforcement Learning from Imperfect Demonstrations

no code implementations ICLR 2018 Yang Gao, Huazhe Xu, Ji Lin, Fisher Yu, Sergey Levine, Trevor Darrell

We propose a unified reinforcement learning algorithm, Normalized Actor-Critic (NAC), that effectively normalizes the Q-function, reducing the Q-values of actions unseen in the demonstration data.

reinforcement-learning Reinforcement Learning (RL)

Recasting Gradient-Based Meta-Learning as Hierarchical Bayes

no code implementations ICLR 2018 Erin Grant, Chelsea Finn, Sergey Levine, Trevor Darrell, Thomas Griffiths

Meta-learning allows an intelligent agent to leverage prior learning episodes as a basis for quickly improving performance on a novel task.

Meta-Learning

Grounding Visual Explanations (Extended Abstract)

no code implementations17 Nov 2017 Lisa Anne Hendricks, Ronghang Hu, Trevor Darrell, Zeynep Akata

Existing models which generate textual explanations enforce task relevance through a discriminative term loss function, but such mechanisms only weakly constrain mentioned object parts to actually be present in the image.

Attribute

Gradient-free Policy Architecture Search and Adaptation

no code implementations16 Oct 2017 Sayna Ebrahimi, Anna Rohrbach, Trevor Darrell

We develop a method for policy architecture search and adaptation via gradient-free optimization which can learn to perform autonomous driving tasks.

Autonomous Driving Neural Architecture Search

Attentive Explanations: Justifying Decisions and Pointing to the Evidence

no code implementations14 Dec 2016 Dong Huk Park, Lisa Anne Hendricks, Zeynep Akata, Bernt Schiele, Trevor Darrell, Marcus Rohrbach

In contrast, humans can justify their decisions with natural language and point to the evidence in the visual world which led to their decisions.

Decision Making Question Answering +2

Adapting Deep Visuomotor Representations with Weak Pairwise Constraints

no code implementations23 Nov 2015 Eric Tzeng, Coline Devin, Judy Hoffman, Chelsea Finn, Pieter Abbeel, Sergey Levine, Kate Saenko, Trevor Darrell

We propose a novel, more powerful combination of both distribution and pairwise image alignment, and remove the requirement for expensive annotation by using weakly aligned pairs of images in the source and target domains.

Domain Adaptation

Visual Discovery at Pinterest

no code implementations15 Feb 2017 Andrew Zhai, Dmitry Kislyuk, Yushi Jing, Michael Feng, Eric Tzeng, Jeff Donahue, Yue Li Du, Trevor Darrell

Over the past three years Pinterest has experimented with several visual search and recommendation services, including Related Pins (2014), Similar Looks (2015), Flashlight (2016) and Lens (2017).

object-detection Object Detection

Learning Modular Neural Network Policies for Multi-Task and Multi-Robot Transfer

no code implementations22 Sep 2016 Coline Devin, Abhishek Gupta, Trevor Darrell, Pieter Abbeel, Sergey Levine

Using deep reinforcement learning to train general purpose neural network policies alleviates some of the burden of manual representation engineering by using expressive policy classes, but exacerbates the challenge of data collection, since such methods tend to be less efficient than RL with low-dimensional, hand-designed representations.

reinforcement-learning Reinforcement Learning (RL) +2

Utilizing Large Scale Vision and Text Datasets for Image Segmentation from Referring Expressions

no code implementations30 Aug 2016 Ronghang Hu, Marcus Rohrbach, Subhashini Venugopalan, Trevor Darrell

Image segmentation from referring expressions is a joint vision and language modeling task, where the input is an image and a textual expression describing a particular region in the image; and the goal is to localize and segment the specific image region based on the given expression.

Image Captioning Image Segmentation +3

End-to-End Training of Deep Visuomotor Policies

no code implementations2 Apr 2015 Sergey Levine, Chelsea Finn, Trevor Darrell, Pieter Abbeel

Policy search methods can allow robots to learn control policies for a wide range of tasks, but practical applications of policy search often require hand-engineered components for perception, state estimation, and low-level control.

Deep Learning for Tactile Understanding From Visual and Haptic Data

no code implementations19 Nov 2015 Yang Gao, Lisa Anne Hendricks, Katherine J. Kuchenbecker, Trevor Darrell

Robots which interact with the physical world will benefit from a fine-grained tactile understanding of objects and surfaces.

Generating Visual Explanations

no code implementations28 Mar 2016 Lisa Anne Hendricks, Zeynep Akata, Marcus Rohrbach, Jeff Donahue, Bernt Schiele, Trevor Darrell

Clearly explaining a rationale for a classification decision to an end-user can be as important as the decision itself.

General Classification Sentence +1

Auxiliary Image Regularization for Deep CNNs with Noisy Labels

no code implementations22 Nov 2015 Samaneh Azadi, Jiashi Feng, Stefanie Jegelka, Trevor Darrell

Precisely-labeled data sets with sufficient amount of samples are very important for training deep convolutional neural networks (CNNs).

Image Classification

Fine-grained pose prediction, normalization, and recognition

no code implementations22 Nov 2015 Ning Zhang, Evan Shelhamer, Yang Gao, Trevor Darrell

Pose variation and subtle differences in appearance are key challenges to fine-grained classification.

General Classification Pose Prediction

Mapping Images to Sentiment Adjective Noun Pairs with Factorized Neural Nets

no code implementations21 Nov 2015 Takuya Narihira, Damian Borth, Stella X. Yu, Karl Ni, Trevor Darrell

We consider the visual sentiment task of mapping an image to an adjective noun pair (ANP) such as "cute baby".

Image Captioning

Spatial Semantic Regularisation for Large Scale Object Detection

no code implementations ICCV 2015 Damian Mrowca, Marcus Rohrbach, Judy Hoffman, Ronghang Hu, Kate Saenko, Trevor Darrell

Our approach proves to be especially useful in large scale settings with thousands of classes, where spatial and semantic interactions are very frequent and only weakly supervised detectors can be built due to a lack of bounding box annotations.

Clustering Object +2

Learning Compact Convolutional Neural Networks with Nested Dropout

no code implementations22 Dec 2014 Chelsea Finn, Lisa Anne Hendricks, Trevor Darrell

Recently, nested dropout was proposed as a method for ordering representation units in autoencoders by their information content, without diminishing reconstruction cost.

Detector Discovery in the Wild: Joint Multiple Instance and Representation Learning

no code implementations CVPR 2015 Judy Hoffman, Deepak Pathak, Trevor Darrell, Kate Saenko

We develop methods for detector learning which exploit joint training over both weak and strong labels and which transfer learned perceptual representations from strongly-labeled auxiliary tasks.

Multiple Instance Learning Representation Learning +1

Do Convnets Learn Correspondence?

no code implementations NeurIPS 2014 Jonathan Long, Ning Zhang, Trevor Darrell

Convolutional neural nets (convnets) trained from massive labeled datasets have substantially improved the state-of-the-art in image classification and object detection.

General Classification Image Classification +3

Part-based R-CNNs for Fine-grained Category Detection

no code implementations15 Jul 2014 Ning Zhang, Jeff Donahue, Ross Girshick, Trevor Darrell

Semantic part localization can facilitate fine-grained categorization by explicitly isolating subtle appearance differences associated with specific object parts.

Fine-Grained Image Classification Object +2

Weakly-supervised Discovery of Visual Pattern Configurations

no code implementations NeurIPS 2014 Hyun Oh Song, Yong Jae Lee, Stefanie Jegelka, Trevor Darrell

The increasing prominence of weakly labeled data nurtures a growing demand for object detection methods that can cope with minimal supervision.

Object object-detection +1

Detection Bank: An Object Detection Based Video Representation for Multimedia Event Recognition

no code implementations28 May 2014 Tim Althoff, Hyun Oh Song, Trevor Darrell

While low-level image features have proven to be effective representations for visual recognition tasks such as object recognition and scene classification, they are inadequate to capture complex semantic meaning required to solve high-level visual tasks such as multimedia event detection and recognition.

Event Detection Object +5

On learning to localize objects with minimal supervision

no code implementations5 Mar 2014 Hyun Oh Song, Ross Girshick, Stefanie Jegelka, Julien Mairal, Zaid Harchaoui, Trevor Darrell

Learning to localize objects with minimal supervision is an important problem in computer vision, since large fully annotated datasets are extremely costly to obtain.

Weakly Supervised Object Detection

Modeling Radiometric Uncertainty for Vision with Tone-mapped Color Images

no code implementations27 Nov 2013 Ayan Chakrabarti, Ying Xiong, Baochen Sun, Trevor Darrell, Daniel Scharstein, Todd Zickler, Kate Saenko

To produce images that are suitable for display, tone-mapping is widely used in digital cameras to map linear color measurements into narrow gamuts with limited dynamic range.

Tone Mapping

One-Shot Adaptation of Supervised Deep Convolutional Models

no code implementations21 Dec 2013 Judy Hoffman, Eric Tzeng, Jeff Donahue, Yangqing Jia, Kate Saenko, Trevor Darrell

In other words, are deep CNNs trained on large amounts of labeled data as susceptible to dataset bias as previous methods have been shown to be?

Domain Adaptation Image Classification

Cannot find the paper you are looking for? You can Submit a new open access paper.