Search Results for author: Trevor Darrell

Found 225 papers, 117 papers with code

A ConvNet for the 2020s

8 code implementations10 Jan 2022 Zhuang Liu, Hanzi Mao, Chao-yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie

The "Roaring 20s" of visual recognition began with the introduction of Vision Transformers (ViTs), which quickly superseded ConvNets as the state-of-the-art image classification model.

 Ranked #1 on Domain Generalization on ImageNet-Sketch (using extra training data)

Domain Generalization Image Classification +2

Watch Those Words: Video Falsification Detection Using Word-Conditioned Facial Motion

1 code implementation21 Dec 2021 Shruti Agarwal, Liwen Hu, Evonne Ng, Trevor Darrell, Hao Li, Anna Rohrbach

In today's era of digital misinformation, we are increasingly faced with new threats posed by video falsification techniques.

Misinformation

Twitter-COMMs: Detecting Climate, COVID, and Military Multimodal Misinformation

no code implementations16 Dec 2021 Giscard Biamby, Grace Luo, Trevor Darrell, Anna Rohrbach

Detecting out-of-context media, such as "miscaptioned" images on Twitter, often requires detecting inconsistencies between the two modalities.

Misinformation

Learning to Detect Every Thing in an Open World

no code implementations3 Dec 2021 Kuniaki Saito, Ping Hu, Trevor Darrell, Kate Saenko

Many open-world applications require the detection of novel objects, yet state-of-the-art object detection and instance segmentation networks do not excel at this task.

Data Augmentation Instance Segmentation +2

Teachable Reinforcement Learning via Advice Distillation

no code implementations NeurIPS 2021 Olivia Watkins, Abhishek Gupta, Trevor Darrell, Pieter Abbeel, Jacob Andreas

Training automated agents to perform complex behaviors in interactive environments is challenging: reinforcement learning requires careful hand-engineering of reward functions, imitation learning requires specialized infrastructure and access to a human expert, and learning from intermediate forms of supervision (like binary preferences) is time-consuming and provides minimal information per human intervention.

Decision Making Imitation Learning

Object-Region Video Transformers

no code implementations13 Oct 2021 Roei Herzig, Elad Ben-Avraham, Karttikeya Mangalam, Amir Bar, Gal Chechik, Anna Rohrbach, Trevor Darrell, Amir Globerson

In this work, we present Object-Region Video Transformers (ORViT), an \emph{object-centric} approach that extends video transformer layers with a block that directly incorporates object representations.

Action Detection Action Recognition +1

Pyramid Mini-Batching for Optimal Transport

no code implementations29 Sep 2021 Devin Guillory, Kuniaki Saito, Eric Tzeng, Yannik Pitcan, Kate Saenko, Trevor Darrell

Optimal transport theory provides a useful tool to measure the differences between two distributions.

Domain Adaptation

On-target Adaptation

1 code implementation2 Sep 2021 Dequan Wang, Shaoteng Liu, Sayna Ebrahimi, Evan Shelhamer, Trevor Darrell

Domain adaptation seeks to mitigate the shift between training on the \emph{source} domain and testing on the \emph{target} domain.

Domain Adaptation

Region-level Active Detector Learning

no code implementations20 Aug 2021 Michael Laielli, Giscard Biamby, Dian Chen, Ritwik Gupta, Adam Loeffler, Phat Dat Nguyen, Ross Luo, Trevor Darrell, Sayna Ebrahimi

Active learning for object detection is conventionally achieved by applying techniques developed for classification in a way that aggregates individual detections into image-level selection criteria.

Active Learning Object Detection

Predicting with Confidence on Unseen Distributions

no code implementations ICCV 2021 Devin Guillory, Vaishaal Shankar, Sayna Ebrahimi, Trevor Darrell, Ludwig Schmidt

Our work connects techniques from domain adaptation and predictive uncertainty literature, and allows us to predict model accuracy on challenging unseen distributions without access to labeled data.

Domain Adaptation

CLIP-It! Language-Guided Video Summarization

no code implementations NeurIPS 2021 Medhini Narasimhan, Anna Rohrbach, Trevor Darrell

A generic video summary is an abridged version of a video that conveys the whole story and features the most important scenes.

Video Summarization

Early Convolutions Help Transformers See Better

1 code implementation NeurIPS 2021 Tete Xiao, Mannat Singh, Eric Mintun, Trevor Darrell, Piotr Dollár, Ross Girshick

To test whether this atypical design choice causes an issue, we analyze the optimization behavior of ViT models with their original patchify stem versus a simple counterpart where we replace the ViT stem by a small number of stacked stride-two 3*3 convolutions.

DETReg: Unsupervised Pretraining with Region Priors for Object Detection

1 code implementation8 Jun 2021 Amir Bar, Xin Wang, Vadim Kantorov, Colorado J Reed, Roei Herzig, Gal Chechik, Anna Rohrbach, Trevor Darrell, Amir Globerson

Recent self-supervised pretraining methods for object detection largely focus on pretraining the backbone of the object detector, neglecting key parts of detection architecture.

Few-Shot Learning Few-Shot Object Detection +3

Towards Learning to Play Piano with Dexterous Hands and Touch

1 code implementation3 Jun 2021 Huazhe Xu, Yuping Luo, Shaoxiong Wang, Trevor Darrell, Roberto Calandra

The virtuoso plays the piano with passion, poetry and extraordinary technical ability.

PyTouch: A Machine Learning Library for Touch Processing

1 code implementation26 May 2021 Mike Lambeta, Huazhe Xu, Jingwei Xu, Po-Wei Chou, Shaoxiong Wang, Trevor Darrell, Roberto Calandra

With the increased availability of rich tactile sensors, there is an equally proportional need for open-source and integrated software capable of efficiently and effectively processing raw touch measurements into high-level signals that can be used for control and decision-making.

Decision Making Touch detection

Robust Object Detection via Instance-Level Temporal Cycle Confusion

1 code implementation ICCV 2021 Xin Wang, Thomas E. Huang, Benlin Liu, Fisher Yu, Xiaolong Wang, Joseph E. Gonzalez, Trevor Darrell

Building reliable object detectors that are robust to domain shifts, such as various changes in context, viewpoint, and object appearances, is critical for real-world applications.

Robust Object Detection

Auto-Tuned Sim-to-Real Transfer

1 code implementation15 Apr 2021 Yuqing Du, Olivia Watkins, Trevor Darrell, Pieter Abbeel, Deepak Pathak

Policies trained in simulation often fail when transferred to the real world due to the `reality gap' where the simulator is unable to accurately capture the dynamics and visual properties of the real world.

NewsCLIPpings: Automatic Generation of Out-of-Context Multimodal Media

1 code implementation EMNLP 2021 Grace Luo, Trevor Darrell, Anna Rohrbach

Online misinformation is a prevalent societal issue, with adversaries relying on tools ranging from cheap fakes to sophisticated deep fakes.

Misinformation

Strumming to the Beat: Audio-Conditioned Contrastive Video Textures

no code implementations6 Apr 2021 Medhini Narasimhan, Shiry Ginosar, Andrew Owens, Alexei A. Efros, Trevor Darrell

We learn representations for video frames and frame-to-frame transition probabilities by fitting a video-specific model trained using contrastive learning.

Contrastive Learning Self-Supervised Learning +1

Confidence Adaptive Anytime Pixel-Level Recognition

no code implementations1 Apr 2021 Zhuang Liu, Trevor Darrell, Evan Shelhamer

We redesign the exits to account for the depth and spatial resolution of the features for each exit.

Image Classification Pose Estimation +1

Region Similarity Representation Learning

1 code implementation ICCV 2021 Tete Xiao, Colorado J Reed, Xiaolong Wang, Kurt Keutzer, Trevor Darrell

We present Region Similarity Representation Learning (ReSim), a new approach to self-supervised representation learning for localization-based tasks such as object detection and segmentation.

Instance Segmentation Object Detection +3

Self-Supervised Pretraining Improves Self-Supervised Pretraining

1 code implementation23 Mar 2021 Colorado J. Reed, Xiangyu Yue, Ani Nrusimha, Sayna Ebrahimi, Vivek Vijaykumar, Richard Mao, Bo Li, Shanghang Zhang, Devin Guillory, Sean Metzger, Kurt Keutzer, Trevor Darrell

Through experimentation on 16 diverse vision datasets, we show HPT converges up to 80x faster, improves accuracy across tasks, and improves the robustness of the self-supervised pretraining process to changes in the image augmentation policy or amount of pretraining data.

Image Augmentation

Monocular Quasi-Dense 3D Object Tracking

1 code implementation12 Mar 2021 Hou-Ning Hu, Yung-Hsu Yang, Tobias Fischer, Trevor Darrell, Fisher Yu, Min Sun

Experiments on our proposed simulation data and real-world benchmarks, including KITTI, nuScenes, and Waymo datasets, show that our tracking framework offers robust object association and tracking on urban-driving scenarios.

3D Object Tracking Autonomous Driving +2

Instance-Aware Predictive Navigation in Multi-Agent Environments

1 code implementation14 Jan 2021 Jinkun Cao, Xin Wang, Trevor Darrell, Fisher Yu

To decide the action at each step, we seek the action sequence that can lead to safe future states based on the prediction module outputs by repeatedly sampling likely action sequences.

Regularization Matters in Policy Optimization - An Empirical Study on Continuous Control

1 code implementation ICLR 2021 Zhuang Liu, Xuanlin Li, Bingyi Kang, Trevor Darrell

In this work, we present the first comprehensive study of regularization techniques with multiple policy optimization algorithms on continuous control tasks.

Continuous Control

Novelty Detection with Rotated Contrastive Predictive Coding

no code implementations1 Jan 2021 Dong Huk Park, Trevor Darrell

To this end, reconstruction-based learning is often used in which the normality of an observation is expressed in how well it can be reconstructed.

Contrastive Learning

Unconditional Synthesis of Complex Scenes Using a Semantic Bottleneck

no code implementations1 Jan 2021 Samaneh Azadi, Michael Tschannen, Eric Tzeng, Sylvain Gelly, Trevor Darrell, Mario Lucic

Coupling the high-fidelity generation capabilities of label-conditional image synthesis methods with the flexibility of unconditional generative models, we propose a semantic bottleneck GAN model for unconditional synthesis of complex scenes.

Image Generation

Contrastive Video Textures

no code implementations1 Jan 2021 Medhini Narasimhan, Shiry Ginosar, Andrew Owens, Alexei A Efros, Trevor Darrell

By randomly traversing edges with high transition probabilities, we generate diverse temporally smooth videos with novel sequences and transitions.

Contrastive Learning Video Generation

Minimax Active Learning

no code implementations18 Dec 2020 Sayna Ebrahimi, William Gan, Dian Chen, Giscard Biamby, Kamyar Salahi, Michael Laielli, Shizhan Zhu, Trevor Darrell

Active learning aims to develop label-efficient algorithms by querying the most representative samples to be labeled by a human annotator.

Active Learning Image Classification +1

Temporal Action Detection with Multi-level Supervision

no code implementations ICCV 2021 Baifeng Shi, Qi Dai, Judy Hoffman, Kate Saenko, Trevor Darrell, Huijuan Xu

We extensively benchmark against the baselines for SSAD and OSAD on our created data splits in THUMOS14 and ActivityNet1. 2, and demonstrate the effectiveness of the proposed UFA and IB methods.

Action Detection

Modular Networks for Compositional Instruction Following

no code implementations NAACL 2021 Rodolfo Corona, Daniel Fried, Coline Devin, Dan Klein, Trevor Darrell

In our approach, subgoal modules each carry out natural language instructions for a specific subgoal type.

Auxiliary Task Reweighting for Minimum-data Learning

no code implementations NeurIPS 2020 Baifeng Shi, Judy Hoffman, Kate Saenko, Trevor Darrell, Huijuan Xu

By adjusting the auxiliary task weights to minimize the divergence between the surrogate prior and the true prior of the main task, we obtain a more accurate prior estimation, achieving the goal of minimizing the required amount of training data for the main task and avoiding a costly grid search.

Domain Adaptation Multi-Label Classification

Reducing Class Collapse in Metric Learning with Easy Positive Sampling

no code implementations28 Sep 2020 Elad Levi, Tete Xiao, Xiaolong Wang, Trevor Darrell

We theoretically prove and empirically show that under reasonable noise assumptions, prevalent embedding losses in metric learning, e. g., triplet loss, tend to project all samples of a class with various modes onto a single point in the embedding space, resulting in a class collapse that usually renders the space ill-sorted for classification or retrieval.

Image Retrieval Metric Learning

ePointDA: An End-to-End Simulation-to-Real Domain Adaptation Framework for LiDAR Point Cloud Segmentation

no code implementations7 Sep 2020 Sicheng Zhao, Yezhen Wang, Bo Li, Bichen Wu, Yang Gao, Pengfei Xu, Trevor Darrell, Kurt Keutzer

They require prior knowledge of real-world statistics and ignore the pixel-level dropout noise gap and the spatial feature gap between different domains.

Autonomous Driving Domain Adaptation +2

Hierarchical Style-based Networks for Motion Synthesis

no code implementations ECCV 2020 Jingwei Xu, Huazhe Xu, Bingbing Ni, Xiaokang Yang, Xiaolong Wang, Trevor Darrell

Generating diverse and natural human motion is one of the long-standing goals for creating intelligent characters in the animated world.

motion synthesis

What Should Not Be Contrastive in Contrastive Learning

no code implementations ICLR 2021 Tete Xiao, Xiaolong Wang, Alexei A. Efros, Trevor Darrell

Recent self-supervised contrastive methods have been able to produce impressive transferable visual representations by learning to be invariant to different data augmentations.

Contrastive Learning

Body2Hands: Learning to Infer 3D Hands from Conversational Gesture Body Dynamics

no code implementations CVPR 2021 Evonne Ng, Shiry Ginosar, Trevor Darrell, Hanbyul Joo

We demonstrate the efficacy of our method on hand gesture synthesis from body motion input, and as a strong body prior for single-view image-based 3D hand pose estimation.

3D Hand Pose Estimation

Video Prediction via Example Guidance

1 code implementation ICML 2020 Jingwei Xu, Huazhe Xu, Bingbing Ni, Xiaokang Yang, Trevor Darrell

In video prediction tasks, one major challenge is to capture the multi-modal nature of future contents and dynamics.

Video Prediction

Compositional Video Synthesis with Action Graphs

1 code implementation27 Jun 2020 Amir Bar, Roei Herzig, Xiaolong Wang, Anna Rohrbach, Gal Chechik, Trevor Darrell, Amir Globerson

Our generative model for this task (AG2Vid) disentangles motion and appearance features, and by incorporating a scheduling mechanism for actions facilitates a timely and coordinated video generation.

Video Generation Video Prediction +1

Quasi-Dense Similarity Learning for Multiple Object Tracking

1 code implementation CVPR 2021 Jiangmiao Pang, Linlu Qiu, Xia Li, Haofeng Chen, Qi Li, Trevor Darrell, Fisher Yu

Compared to methods with similar detectors, it boosts almost 10 points of MOTA and significantly decreases the number of ID switches on BDD100K and Waymo datasets.

Contrastive Learning Metric Learning +3

Rethinking preventing class-collapsing in metric learning with margin-based losses

no code implementations ICCV 2021 Elad Levi, Tete Xiao, Xiaolong Wang, Trevor Darrell

We theoretically prove and empirically show that under reasonable noise assumptions, margin-based losses tend to project all samples of a class with various modes onto a single point in the embedding space, resulting in a class collapse that usually renders the space ill-sorted for classification or retrieval.

Image Retrieval Metric Learning

ParkPredict: Motion and Intent Prediction of Vehicles in Parking Lots

no code implementations21 Apr 2020 Xu Shen, Ivo Batkovic, Vijay Govindarajan, Paolo Falcone, Trevor Darrell, Francesco Borrelli

We investigate the problem of predicting driver behavior in parking lots, an environment which is less structured than typical road networks and features complex, interactive maneuvers in a compact space.

Contrastive Examples for Addressing the Tyranny of the Majority

no code implementations14 Apr 2020 Viktoriia Sharmanska, Lisa Anne Hendricks, Trevor Darrell, Novi Quadrianto

Computer vision algorithms, e. g. for face recognition, favour groups of individuals that are better represented in the training data.

Face Recognition

Spatio-Temporal Action Detection with Multi-Object Interaction

no code implementations1 Apr 2020 Huijuan Xu, Lizhi Yang, Stan Sclaroff, Kate Saenko, Trevor Darrell

Spatio-temporal action detection in videos requires localizing the action both spatially and temporally in the form of an "action tube".

Action Detection Human Detection

Revisiting Few-shot Activity Detection with Class Similarity Control

no code implementations31 Mar 2020 Huijuan Xu, Ximeng Sun, Eric Tzeng, Abir Das, Kate Saenko, Trevor Darrell

In this paper, we present a conceptually simple and general yet novel framework for few-shot temporal activity detection based on proposal regression which detects the start and end time of the activities in untrimmed videos.

Action Detection Activity Detection +1

Adversarial Continual Learning

1 code implementation ECCV 2020 Sayna Ebrahimi, Franziska Meier, Roberto Calandra, Trevor Darrell, Marcus Rohrbach

We show that shared features are significantly less prone to forgetting and propose a novel hybrid continual learning framework that learns a disjoint representation for task-invariant and task-specific features required to solve a sequence of tasks.

Continual Learning Image Classification

Frustratingly Simple Few-Shot Object Detection

2 code implementations ICML 2020 Xin Wang, Thomas E. Huang, Trevor Darrell, Joseph E. Gonzalez, Fisher Yu

Such a simple approach outperforms the meta-learning methods by roughly 2~20 points on current benchmarks and sometimes even doubles the accuracy of the prior methods.

Few-Shot Object Detection Meta-Learning

Meta-Baseline: Exploring Simple Meta-Learning for Few-Shot Learning

3 code implementations ICCV 2021 Yinbo Chen, Zhuang Liu, Huijuan Xu, Trevor Darrell, Xiaolong Wang

The edge between these two lines of works has yet been underexplored, and the effectiveness of meta-learning in few-shot learning remains unclear.

Few-Shot Learning General Classification

Towards Practical Multi-Object Manipulation using Relational Reinforcement Learning

1 code implementation23 Dec 2019 Richard Li, Allan Jabri, Trevor Darrell, Pulkit Agrawal

Learning robotic manipulation tasks using reinforcement learning with sparse rewards is currently impractical due to the outrageous data requirements.

Something-Else: Compositional Action Recognition with Spatial-Temporal Interaction Networks

1 code implementation CVPR 2020 Joanna Materzynska, Tete Xiao, Roei Herzig, Huijuan Xu, Xiaolong Wang, Trevor Darrell

Human action is naturally compositional: humans can easily recognize and perform actions with objects that are different from those used in training demonstrations.

Action Recognition

Compositional Plan Vectors

1 code implementation NeurIPS 2019 Coline Devin, Daniel Geng, Pieter Abbeel, Trevor Darrell, Sergey Levine

We show that CPVs can be learned within a one-shot imitation learning framework without any additional supervision or information about task hierarchy, and enable a demonstration-conditioned policy to generalize to tasks that sequence twice as many skills as the tasks seen during training.

Imitation Learning

Semantic Bottleneck Scene Generation

2 code implementations26 Nov 2019 Samaneh Azadi, Michael Tschannen, Eric Tzeng, Sylvain Gelly, Trevor Darrell, Mario Lucic

For the former, we use an unconditional progressive segmentation generation network that captures the distribution of realistic semantic scene layouts.

Conditional Image Generation Image-to-Image Translation +1

Plan Arithmetic: Compositional Plan Vectors for Multi-Task Control

no code implementations30 Oct 2019 Coline Devin, Daniel Geng, Pieter Abbeel, Trevor Darrell, Sergey Levine

We show that CPVs can be learned within a one-shot imitation learning framework without any additional supervision or information about task hierarchy, and enable a demonstration-conditioned policy to generalize to tasks that sequence twice as many skills as the tasks seen during training.

Imitation Learning

Exploring Simple and Transferable Recognition-Aware Image Processing

1 code implementation21 Oct 2019 Zhuang Liu, Hung-Ju Wang, Tinghui Zhou, Zhiqiang Shen, Bingyi Kang, Evan Shelhamer, Trevor Darrell

Interestingly, the processing model's ability to enhance recognition quality can transfer when evaluated on models of different architectures, recognized categories, tasks and training datasets.

Image Retrieval Recommendation Systems

Regularization Matters in Policy Optimization

2 code implementations21 Oct 2019 Zhuang Liu, Xuanlin Li, Bingyi Kang, Trevor Darrell

In this work, we present the first comprehensive study of regularization techniques with multiple policy optimization algorithms on continuous control tasks.

Continuous Control

Zero-shot Policy Learning with Spatial Temporal RewardDecomposition on Contingency-aware Observation

1 code implementation17 Oct 2019 Huazhe Xu, Boyuan Chen, Yang Gao, Trevor Darrell

The agent is first presented with previous experiences in the training environment, along with task description in the form of trajectory-level sparse rewards.

Continuous Control Zero-Shot Learning

Unsupervised Domain Adaptation through Self-Supervision

3 code implementations26 Sep 2019 Yu Sun, Eric Tzeng, Trevor Darrell, Alexei A. Efros

This paper addresses unsupervised domain adaptation, the setting where labeled training data is available on a source domain, but the goal is to have good performance on a target domain with only unlabeled data.

Unsupervised Domain Adaptation

Scoring-Aggregating-Planning: Learning task-agnostic priors from interactions and sparse rewards for zero-shot generalization

no code implementations25 Sep 2019 Huazhe Xu, Boyuan Chen, Yang Gao, Trevor Darrell

In this paper, we propose Scoring-Aggregating-Planning (SAP), a framework that can learn task-agnostic semantics and dynamics priors from arbitrary quality interactions as well as the corresponding sparse rewards and then plan on unseen tasks in zero-shot condition.

Blurring Structure and Learning to Optimize and Adapt Receptive Fields

no code implementations25 Sep 2019 Evan Shelhamer, Dequan Wang, Trevor Darrell

Adapting receptive fields by dynamic Gaussian structure further improves results, equaling the accuracy of free-form deformation while improving efficiency.

Semantic Segmentation

Composable Semi-parametric Modelling for Long-range Motion Generation

no code implementations25 Sep 2019 Jingwei Xu, Huazhe Xu, Bingbing Ni, Xiaokang Yang, Trevor Darrell

Learning diverse and natural behaviors is one of the longstanding goal for creating intelligent characters in the animated world.

Weakly-Supervised Trajectory Segmentation for Learning Reusable Skills

no code implementations25 Sep 2019 Parsa Mahmoudieh, Trevor Darrell, Deepak Pathak

Instead of direct manual supervision which is tedious and prone to bias, in this work, our goal is to extract reusable skills from a collection of human demonstrations collected directly for several end-tasks.

Multiple Instance Learning

Dynamic Scale Inference by Entropy Minimization

no code implementations8 Aug 2019 Dequan Wang, Evan Shelhamer, Bruno Olshausen, Trevor Darrell

Given the variety of the visual world there is not one true scale for recognition: objects may appear at drastically different sizes across the visual field.

Semantic Segmentation

Task-Aware Feature Generation for Zero-Shot Compositional Learning

1 code implementation11 Jun 2019 Xin Wang, Fisher Yu, Trevor Darrell, Joseph E. Gonzalez

In this work, we propose a task-aware feature generation (TFG) framework for compositional learning, which generates features of novel visual concepts by transferring knowledge from previously seen concepts.

Zero-Shot Learning

Are You Looking? Grounding to Multiple Modalities in Vision-and-Language Navigation

no code implementations ACL 2019 Ronghang Hu, Daniel Fried, Anna Rohrbach, Dan Klein, Trevor Darrell, Kate Saenko

The actual grounding can connect language to the environment through multiple modalities, e. g. "stop at the door" might ground into visual objects, while "turn right" might rely only on the geometric structure of a route.

Vision and Language Navigation

Monocular Plan View Networks for Autonomous Driving

no code implementations16 May 2019 Dequan Wang, Coline Devin, Qi-Zhi Cai, Philipp Krähenbühl, Trevor Darrell

Convolutions on monocular dash cam videos capture spatial invariances in the image plane but do not explicitly reason about distances and depth.

3D Object Detection Autonomous Driving

Language-Conditioned Graph Networks for Relational Reasoning

no code implementations ICCV 2019 Ronghang Hu, Anna Rohrbach, Trevor Darrell, Kate Saenko

E. g., conditioning on the "on" relationship to the plate, the object "mug" gathers messages from the object "plate" to update its representation to "mug on the plate", which can be easily consumed by a simple classifier for answer prediction.

Referring Expression Comprehension Relational Reasoning +1

Meta-Learning to Guide Segmentation

no code implementations ICLR 2019 Kate Rakelly*, Evan Shelhamer*, Trevor Darrell, Alexei A. Efros, Sergey Levine

To explore generalization, we analyze guidance as a bridge between different levels of supervision to segment classes as the union of instances.

Meta-Learning

Blurring the Line Between Structure and Learning to Optimize and Adapt Receptive Fields

no code implementations25 Apr 2019 Evan Shelhamer, Dequan Wang, Trevor Darrell

Adapting receptive fields by dynamic Gaussian structure further improves results, equaling the accuracy of free-form deformation while improving efficiency.

Semantic Segmentation

Semi-supervised Domain Adaptation via Minimax Entropy

2 code implementations ICCV 2019 Kuniaki Saito, Donghyun Kim, Stan Sclaroff, Trevor Darrell, Kate Saenko

Contemporary domain adaptation methods are very effective at aligning feature distributions of source and target domains without any target supervision.

Domain Adaptation

TAFE-Net: Task-Aware Feature Embeddings for Low Shot Learning

1 code implementation CVPR 2019 Xin Wang, Fisher Yu, Ruth Wang, Trevor Darrell, Joseph E. Gonzalez

We show that TAFE-Net is highly effective in generalizing to new tasks or concepts and evaluate the TAFE-Net on a range of benchmarks in zero-shot and few-shot learning.

Few-Shot Learning Zero-Shot Learning

Variational Adversarial Active Learning

6 code implementations ICCV 2019 Samarth Sinha, Sayna Ebrahimi, Trevor Darrell

Unlike conventional active learning algorithms, our approach is task agnostic, i. e., it does not depend on the performance of the task for which we are trying to acquire labeled data.

Active Learning Image Classification +1

Compositional GAN (Extended Abstract): Learning Image-Conditional Binary Composition

no code implementations ICLR Workshop DeepGenStruct 2019 Samaneh Azadi, Deepak Pathak, Sayna Ebrahimi, Trevor Darrell

Generative Adversarial Networks (GANs) can produce images of surprising complexity and realism but are generally structured to sample from a single latent source ignoring the explicit spatial interaction between multiple entities that could be present in a scene.

Cross-Linked Variational Autoencoders for Generalized Zero-Shot Learning

no code implementations ICLR Workshop LLD 2019 Edgar Schönfeld, Sayna Ebrahimi, Samarth Sinha, Trevor Darrell, Zeynep Akata

While following the same direction, we also take artificial feature generation one step further and propose a model where a shared latent space of image features and class embeddings is learned by aligned variational autoencoders, for the purpose of generating latent features to train a softmax classifier.

Few-Shot Learning Generalized Zero-Shot Learning

Robust Change Captioning

1 code implementation ICCV 2019 Dong Huk Park, Trevor Darrell, Anna Rohrbach

We present a novel Dual Dynamic Attention Model (DUDA) to perform robust Change Captioning.

Natural Language Visual Grounding

Similarity R-C3D for Few-shot Temporal Activity Detection

no code implementations25 Dec 2018 Huijuan Xu, Bingyi Kang, Ximeng Sun, Jiashi Feng, Kate Saenko, Trevor Darrell

In this paper, we present a conceptually simple and general yet novel framework for few-shot temporal activity detection which detects the start and end time of the few-shot input activities in an untrimmed video.

Action Detection Activity Detection

Hierarchical Discrete Distribution Decomposition for Match Density Estimation

2 code implementations CVPR 2019 Zhichao Yin, Trevor Darrell, Fisher Yu

Explicit representations of the global match distributions of pixel-wise correspondences between pairs of images are desirable for uncertainty estimation and downstream applications.

Density Estimation Optical Flow Estimation +2

Adversarial Inference for Multi-Sentence Video Description

1 code implementation CVPR 2019 Jae Sung Park, Marcus Rohrbach, Trevor Darrell, Anna Rohrbach

Among the main issues are the fluency and coherence of the generated descriptions, and their relevance to the video.

Image Captioning Video Description

Generalized Zero- and Few-Shot Learning via Aligned Variational Autoencoders

2 code implementations5 Dec 2018 Edgar Schönfeld, Sayna Ebrahimi, Samarth Sinha, Trevor Darrell, Zeynep Akata

Many approaches in generalized zero-shot learning rely on cross-modal mapping between the image feature space and the class embedding space.

Few-Shot Learning Generalized Zero-Shot Learning

Few-shot Object Detection via Feature Reweighting

4 code implementations ICCV 2019 Bingyi Kang, Zhuang Liu, Xin Wang, Fisher Yu, Jiashi Feng, Trevor Darrell

The feature learner extracts meta features that are generalizable to detect novel object classes, using training data from base classes with sufficient samples.

Few-Shot Learning Few-Shot Object Detection +1

Spatio-Temporal Action Graph Networks

1 code implementation4 Dec 2018 Roei Herzig, Elad Levi, Huijuan Xu, Hang Gao, Eli Brosh, Xiaolong Wang, Amir Globerson, Trevor Darrell

Events defined by the interaction of objects in a scene are often of critical importance; yet important events may have insufficient labeled examples to train a conventional deep model to generalize to future object appearance.

Activity Recognition Autonomous Driving +2

SPLAT: Semantic Pixel-Level Adaptation Transforms for Detection

no code implementations3 Dec 2018 Eric Tzeng, Kaylee Burns, Kate Saenko, Trevor Darrell

Without dense labels, as is the case when only detection labels are available in the source, transformations are learned using CycleGAN alignment.

Domain Adaptation Semantic Segmentation

Disentangling Propagation and Generation for Video Prediction

no code implementations ICCV 2019 Hang Gao, Huazhe Xu, Qi-Zhi Cai, Ruth Wang, Fisher Yu, Trevor Darrell

A dynamic scene has two types of elements: those that move fluidly and can be predicted from previous frames, and those which are disoccluded (exposed) and cannot be extrapolated.

Predict Future Video Frames

Joint Monocular 3D Vehicle Detection and Tracking

1 code implementation ICCV 2019 Hou-Ning Hu, Qi-Zhi Cai, Dequan Wang, Ji Lin, Min Sun, Philipp Krähenbühl, Trevor Darrell, Fisher Yu

The framework can not only associate detections of vehicles in motion over time, but also estimate their complete 3D bounding box information from a sequence of 2D images captured on a moving platform.

3D Object Detection 3D Pose Estimation +4

Deep Object-Centric Policies for Autonomous Driving

no code implementations13 Nov 2018 Dequan Wang, Coline Devin, Qi-Zhi Cai, Fisher Yu, Trevor Darrell

While learning visuomotor skills in an end-to-end manner is appealing, deep neural networks are often uninterpretable and fail in surprising ways.

Autonomous Driving

Discriminator Rejection Sampling

1 code implementation ICLR 2019 Samaneh Azadi, Catherine Olsson, Trevor Darrell, Ian Goodfellow, Augustus Odena

We propose a rejection sampling scheme using the discriminator of a GAN to approximately correct errors in the GAN generator distribution.

Image Generation

Rethinking the Value of Network Pruning

2 code implementations ICLR 2019 Zhuang Liu, Ming-Jie Sun, Tinghui Zhou, Gao Huang, Trevor Darrell

Our observations are consistent for multiple network architectures, datasets, and tasks, which imply that: 1) training a large, over-parameterized model is often not necessary to obtain an efficient final model, 2) learned "important" weights of the large model are typically not useful for the small pruned model, 3) the pruned architecture itself, rather than a set of inherited "important" weights, is more crucial to the efficiency in the final model, which suggests that in some cases pruning can be useful as an architecture search paradigm.

Network Pruning Neural Architecture Search

Uncertainty-guided Lifelong Learning in Bayesian Networks

no code implementations27 Sep 2018 Sayna Ebrahimi, Mohamed Elhoseiny, Trevor Darrell, Marcus Rohrbach

Sequentially learning of tasks arriving in a continuous stream is a complex problem and becomes more challenging when the model has a fixed capacity.

Continual Learning

Object Hallucination in Image Captioning

1 code implementation EMNLP 2018 Anna Rohrbach, Lisa Anne Hendricks, Kaylee Burns, Trevor Darrell, Kate Saenko

Despite continuously improving performance, contemporary image captioning models are prone to "hallucinating" objects that are not actually in a scene.

Image Captioning

Localizing Moments in Video with Temporal Language

1 code implementation EMNLP 2018 Lisa Anne Hendricks, Oliver Wang, Eli Shechtman, Josef Sivic, Trevor Darrell, Bryan Russell

To benchmark whether our model, and other recent video localization models, can effectively reason about temporal language, we collect the novel TEMPOral reasoning in video and language (TEMPO) dataset.

Video Understanding

Large-Scale Study of Curiosity-Driven Learning

4 code implementations ICLR 2019 Yuri Burda, Harri Edwards, Deepak Pathak, Amos Storkey, Trevor Darrell, Alexei A. Efros

However, annotating each environment with hand-designed, dense rewards is not scalable, motivating the need for developing reward functions that are intrinsic to the agent.

Atari Games SNES Games

Textual Explanations for Self-Driving Vehicles

1 code implementation ECCV 2018 Jinkyu Kim, Anna Rohrbach, Trevor Darrell, John Canny, Zeynep Akata

Finally, we explore a version of our model that generates rationalizations, and compare with introspective explanations on the same video segments.

Grounding Visual Explanations

no code implementations ECCV 2018 Lisa Anne Hendricks, Ronghang Hu, Trevor Darrell, Zeynep Akata

Our model improves the textual explanation quality of fine-grained classification decisions on the CUB dataset by mentioning phrases that are grounded in the image.

General Classification

Explainable Neural Computation via Stack Neural Module Networks

1 code implementation ECCV 2018 Ronghang Hu, Jacob Andreas, Trevor Darrell, Kate Saenko

In complex inferential tasks like question answering, machine learning models must confront two challenges: the need to implement a compositional reasoning process, and, in many applications, the need for this reasoning process to be interpretable to assist users in both development and prediction.

Decision Making Question Answering +1

Compositional GAN: Learning Image-Conditional Binary Composition

1 code implementation19 Jul 2018 Samaneh Azadi, Deepak Pathak, Sayna Ebrahimi, Trevor Darrell

Generative Adversarial Networks (GANs) can produce images of remarkable complexity and realism but are generally structured to sample from a single latent source ignoring the explicit spatial interaction between multiple entities that could be present in a scene.

Generating Counterfactual Explanations with Natural Language

no code implementations26 Jun 2018 Lisa Anne Hendricks, Ronghang Hu, Trevor Darrell, Zeynep Akata

We call such textual explanations counterfactual explanations, and propose an intuitive method to generate counterfactual explanations by inspecting which evidence in an input is missing, but might contribute to a different classification decision if present in the image.

Fine-Grained Image Classification General Classification

Learning Instance Segmentation by Interaction

1 code implementation21 Jun 2018 Deepak Pathak, Yide Shentu, Dian Chen, Pulkit Agrawal, Trevor Darrell, Sergey Levine, Jitendra Malik

The agent uses its current segmentation model to infer pixels that constitute objects and refines the segmentation model by interacting with these pixels.

Instance Segmentation Semantic Segmentation

Speaker-Follower Models for Vision-and-Language Navigation

1 code implementation NeurIPS 2018 Daniel Fried, Ronghang Hu, Volkan Cirik, Anna Rohrbach, Jacob Andreas, Louis-Philippe Morency, Taylor Berg-Kirkpatrick, Kate Saenko, Dan Klein, Trevor Darrell

We use this speaker model to (1) synthesize new instructions for data augmentation and to (2) implement pragmatic reasoning, which evaluates how well candidate action sequences explain an instruction.

Data Augmentation Vision and Language Navigation

Deep Mixture of Experts via Shallow Embedding

no code implementations5 Jun 2018 Xin Wang, Fisher Yu, Lisa Dunlap, Yi-An Ma, Ruth Wang, Azalia Mirhoseini, Trevor Darrell, Joseph E. Gonzalez

Larger networks generally have greater representational power at the cost of increased computational complexity.

Few-Shot Learning Zero-Shot Learning

Few-Shot Segmentation Propagation with Guided Networks

1 code implementation25 May 2018 Kate Rakelly, Evan Shelhamer, Trevor Darrell, Alexei A. Efros, Sergey Levine

Learning-based methods for visual segmentation have made progress on particular types of segmentation tasks, but are limited by the necessary supervision, the narrow definitions of fixed tasks, and the lack of control during inference for correcting errors.

Interactive Segmentation Semantic Segmentation +2

BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning

2 code implementations CVPR 2020 Fisher Yu, Haofeng Chen, Xin Wang, Wenqi Xian, Yingying Chen, Fangchen Liu, Vashisht Madhavan, Trevor Darrell

Datasets drive vision progress, yet existing driving datasets are impoverished in terms of visual content and supported tasks to study multitask learning for autonomous driving.

Autonomous Driving Domain Adaptation +7

Women also Snowboard: Overcoming Bias in Captioning Models

1 code implementation ECCV 2018 Kaylee Burns, Lisa Anne Hendricks, Kate Saenko, Trevor Darrell, Anna Rohrbach

We introduce a new Equalizer model that ensures equal gender probability when gender evidence is occluded in a scene and confident predictions when gender evidence is present.

Image Captioning

Reinforcement Learning from Imperfect Demonstrations

no code implementations ICLR 2018 Yang Gao, Huazhe Xu, Ji Lin, Fisher Yu, Sergey Levine, Trevor Darrell

We propose a unified reinforcement learning algorithm, Normalized Actor-Critic (NAC), that effectively normalizes the Q-function, reducing the Q-values of actions unseen in the demonstration data.

Recasting Gradient-Based Meta-Learning as Hierarchical Bayes

no code implementations ICLR 2018 Erin Grant, Chelsea Finn, Sergey Levine, Trevor Darrell, Thomas Griffiths

Meta-learning allows an intelligent agent to leverage prior learning episodes as a basis for quickly improving performance on a novel task.

Meta-Learning

Multi-Content GAN for Few-Shot Font Style Transfer

6 code implementations CVPR 2018 Samaneh Azadi, Matthew Fisher, Vladimir Kim, Zhaowen Wang, Eli Shechtman, Trevor Darrell

In this work, we focus on the challenge of taking partial observations of highly-stylized text and generalizing the observations to generate unobserved glyphs in the ornamented typeface.

Font Style Transfer

Learning to Segment Every Thing

3 code implementations CVPR 2018 Ronghang Hu, Piotr Dollár, Kaiming He, Trevor Darrell, Ross Girshick

Most methods for object instance segmentation require all training examples to be labeled with segmentation masks.

Instance Segmentation Semantic Segmentation

SkipNet: Learning Dynamic Routing in Convolutional Networks

2 code implementations ECCV 2018 Xin Wang, Fisher Yu, Zi-Yi Dou, Trevor Darrell, Joseph E. Gonzalez

While deeper convolutional networks are needed to achieve maximum accuracy in visual perception tasks, for many inputs shallower networks are sufficient.

Decision Making

Grounding Visual Explanations (Extended Abstract)

no code implementations17 Nov 2017 Lisa Anne Hendricks, Ronghang Hu, Trevor Darrell, Zeynep Akata

Existing models which generate textual explanations enforce task relevance through a discriminative term loss function, but such mechanisms only weakly constrain mentioned object parts to actually be present in the image.

Gradient-free Policy Architecture Search and Adaptation

no code implementations16 Oct 2017 Sayna Ebrahimi, Anna Rohrbach, Trevor Darrell

We develop a method for policy architecture search and adaptation via gradient-free optimization which can learn to perform autonomous driving tasks.

Autonomous Driving Neural Architecture Search

Fooling Vision and Language Models Despite Localization and Attention Mechanism

no code implementations CVPR 2018 Xiaojun Xu, Xinyun Chen, Chang Liu, Anna Rohrbach, Trevor Darrell, Dawn Song

Our work sheds new light on understanding adversarial attacks on vision systems which have a language component and shows that attention, bounding box localization, and compositional internal structures are vulnerable to adversarial attacks.

Natural Language Understanding Question Answering +1

Deep Object-Centric Representations for Generalizable Robot Learning

1 code implementation14 Aug 2017 Coline Devin, Pieter Abbeel, Trevor Darrell, Sergey Levine

We devise an object-level attentional mechanism that can be used to determine relevant objects from a few trajectories or demonstrations, and then immediately incorporate those objects into a learned policy.

Localizing Moments in Video with Natural Language

2 code implementations ICCV 2017 Lisa Anne Hendricks, Oliver Wang, Eli Shechtman, Josef Sivic, Trevor Darrell, Bryan Russell

A key obstacle to training our MCN model is that current video datasets do not include pairs of localized video segments and referring expressions, or text descriptions which uniquely identify a corresponding moment.

Deep Layer Aggregation

6 code implementations CVPR 2018 Fisher Yu, Dequan Wang, Evan Shelhamer, Trevor Darrell

We augment standard architectures with deeper aggregation to better fuse information across layers.

Generalized orderless pooling performs implicit salient matching

2 code implementations ICCV 2017 Marcel Simon, Yang Gao, Trevor Darrell, Joachim Denzler, Erik Rodner

In this paper, we generalize average and bilinear pooling to "alpha-pooling", allowing for learning the pooling strategy during training.

Learning to Reason: End-to-End Module Networks for Visual Question Answering

1 code implementation ICCV 2017 Ronghang Hu, Jacob Andreas, Marcus Rohrbach, Trevor Darrell, Kate Saenko

Natural language questions are inherently compositional, and many are most easily answered by reasoning about their decomposition into modular sub-problems.

Visual Dialog Visual Question Answering

Learning Detection with Diverse Proposals

1 code implementation CVPR 2017 Samaneh Azadi, Jiashi Feng, Trevor Darrell

To predict a set of diverse and informative proposals with enriched representations, this paper introduces a differentiable Determinantal Point Process (DPP) layer that is able to augment the object detection architectures.

Object Detection

Adversarial Discriminative Domain Adaptation

17 code implementations CVPR 2017 Eric Tzeng, Judy Hoffman, Kate Saenko, Trevor Darrell

Adversarial learning methods are a promising approach to training robust deep networks, and can generate complex samples across diverse domains.

General Classification Unsupervised Domain Adaptation +1

Visual Discovery at Pinterest

no code implementations15 Feb 2017 Andrew Zhai, Dmitry Kislyuk, Yushi Jing, Michael Feng, Eric Tzeng, Jeff Donahue, Yue Li Du, Trevor Darrell

Over the past three years Pinterest has experimented with several visual search and recommendation services, including Related Pins (2014), Similar Looks (2015), Flashlight (2016) and Lens (2017).

Object Detection

Learning Features by Watching Objects Move

1 code implementation CVPR 2017 Deepak Pathak, Ross Girshick, Piotr Dollár, Trevor Darrell, Bharath Hariharan

Given the extensive evidence that motion plays a key role in the development of the human visual system, we hope that this straightforward approach to unsupervised learning will be more effective than cleverly designed 'pretext' tasks studied in the literature.

Object Detection Transfer Learning

Attentive Explanations: Justifying Decisions and Pointing to the Evidence

no code implementations14 Dec 2016 Dong Huk Park, Lisa Anne Hendricks, Zeynep Akata, Bernt Schiele, Trevor Darrell, Marcus Rohrbach

In contrast, humans can justify their decisions with natural language and point to the evidence in the visual world which led to their decisions.

Decision Making Question Answering +1

FCNs in the Wild: Pixel-level Adversarial and Constraint-based Adaptation

4 code implementations8 Dec 2016 Judy Hoffman, Dequan Wang, Fisher Yu, Trevor Darrell

In this paper, we introduce the first domain adaptive semantic segmentation method, proposing an unsupervised adversarial approach to pixel prediction problems.

Semantic Segmentation Synthetic-to-Real Translation

End-to-end Learning of Driving Models from Large-scale Video Datasets

2 code implementations CVPR 2017 Huazhe Xu, Yang Gao, Fisher Yu, Trevor Darrell

Robust perception-action models should be learned from training data with diverse visual appearances and realistic behaviors, yet current approaches to deep visuomotor policy learning have been generally limited to in-situ models learned from a single vehicle or a simulation environment.

Scene Segmentation

Modeling Relationships in Referential Expressions with Compositional Modular Networks

2 code implementations CVPR 2017 Ronghang Hu, Marcus Rohrbach, Jacob Andreas, Trevor Darrell, Kate Saenko

In this paper we instead present a modular deep architecture capable of analyzing referential expressions into their component parts, identifying entities and relationships mentioned in the input expression and grounding them all in the scene.

Visual Question Answering

Learning Modular Neural Network Policies for Multi-Task and Multi-Robot Transfer

no code implementations22 Sep 2016 Coline Devin, Abhishek Gupta, Trevor Darrell, Pieter Abbeel, Sergey Levine

Using deep reinforcement learning to train general purpose neural network policies alleviates some of the burden of manual representation engineering by using expressive policy classes, but exacerbates the challenge of data collection, since such methods tend to be less efficient than RL with low-dimensional, hand-designed representations.

Transfer Learning

Utilizing Large Scale Vision and Text Datasets for Image Segmentation from Referring Expressions

no code implementations30 Aug 2016 Ronghang Hu, Marcus Rohrbach, Subhashini Venugopalan, Trevor Darrell

Image segmentation from referring expressions is a joint vision and language modeling task, where the input is an image and a textual expression describing a particular region in the image; and the goal is to localize and segment the specific image region based on the given expression.

Image Captioning Language Modelling +1

Clockwork Convnets for Video Semantic Segmentation

1 code implementation11 Aug 2016 Evan Shelhamer, Kate Rakelly, Judy Hoffman, Trevor Darrell

Recent years have seen tremendous progress in still-image segmentation; however the na\"ive application of these state-of-the-art algorithms to every video frame requires considerable computation and ignores the temporal continuity inherent in video.

Semantic Segmentation Video Recognition +1

Captioning Images with Diverse Objects

1 code implementation CVPR 2017 Subhashini Venugopalan, Lisa Anne Hendricks, Marcus Rohrbach, Raymond Mooney, Trevor Darrell, Kate Saenko

We propose minimizing a joint objective which can learn from these diverse data sources and leverage distributional semantic embeddings, enabling the model to generalize and describe novel objects outside of image-caption datasets.

Object Recognition

Learning With Side Information Through Modality Hallucination

no code implementations CVPR 2016 Judy Hoffman, Saurabh Gupta, Trevor Darrell

Thus, our method transfers information commonly extracted from depth training data to a network which can extract that information from the RGB counterpart.

Object Detection

Adversarial Feature Learning

10 code implementations31 May 2016 Jeff Donahue, Philipp Krähenbühl, Trevor Darrell

The ability of the Generative Adversarial Networks (GANs) framework to learn generative models mapping from simple latent distributions to arbitrarily complex data distributions has been demonstrated empirically, with compelling results showing that the latent space of such generators captures semantic variation in the data distribution.

Context Encoders: Feature Learning by Inpainting

11 code implementations CVPR 2016 Deepak Pathak, Philipp Krahenbuhl, Jeff Donahue, Trevor Darrell, Alexei A. Efros

In order to succeed at this task, context encoders need to both understand the content of the entire image, as well as produce a plausible hypothesis for the missing part(s).

Generating Visual Explanations

no code implementations28 Mar 2016 Lisa Anne Hendricks, Zeynep Akata, Marcus Rohrbach, Jeff Donahue, Bernt Schiele, Trevor Darrell

Clearly explaining a rationale for a classification decision to an end-user can be as important as the decision itself.

General Classification

Segmentation from Natural Language Expressions

3 code implementations20 Mar 2016 Ronghang Hu, Marcus Rohrbach, Trevor Darrell

To produce pixelwise segmentation for the language expression, we propose an end-to-end trainable recurrent and convolutional network model that jointly learns to process visual and linguistic information.

Referring Expression Segmentation Semantic Segmentation