Search Results for author: Trevor Darrell

Found 310 papers, 170 papers with code

Exposing the Limits of Video-Text Models through Contrast Sets

1 code implementation • NAACL 2022 • Jae Sung Park, Sheng Shen, Ali Farhadi, Trevor Darrell, Yejin Choi, Anna Rohrbach

We test the robustness of recent methods on the proposed automatic contrast sets, and compare them to additionally collected human-generated counterparts, to assess their effectiveness.

Language Modelling Multiple-choice +2

Paper
Code

EgoPet: Egomotion and Interaction Data from an Animal's Perspective

no code implementations • 15 Apr 2024 • Amir Bar, Arya Bakhtiar, Danny Tran, Antonio Loquercio, Jathushan Rajasegaran, Yann Lecun, Amir Globerson, Trevor Darrell

Animals perceive the world to plan their actions and interact with other agents to accomplish complex tasks, demonstrating capabilities that are still unmatched by AI systems.

Paper
Add Code

Finding Visual Task Vectors

1 code implementation • 8 Apr 2024 • Alberto Hojel, Yutong Bai, Trevor Darrell, Amir Globerson, Amir Bar

In this work, we analyze the activations of MAE-VQGAN, a recent Visual Prompting model, and find task vectors, activations that encode task-specific information.

Visual Prompting

Paper
Code

ALOHa: A New Measure for Hallucination in Captioning Models

no code implementations • 3 Apr 2024 • Suzanne Petryk, David M. Chan, Anish Kachinthaya, Haodi Zou, John Canny, Joseph E. Gonzalez, Trevor Darrell

Despite recent advances in multimodal pre-training for visual description, state-of-the-art models still produce captions containing errors, such as hallucinating objects not present in a scene.

Hallucination Object +2

Paper
Add Code

TraveLER: A Multi-LMM Agent Framework for Video Question-Answering

no code implementations • 1 Apr 2024 • Chuyi Shang, Amos You, Sanjay Subramanian, Trevor Darrell, Roei Herzig

Specifically, we propose TraveLER, a model that can create a plan to "Traverse" through the video, ask questions about individual frames to "Locate" and store key information, and then "Evaluate" if there is enough information to answer the question.

Question Answering Video Question Answering

Paper
Add Code

When Do We Not Need Larger Vision Models?

1 code implementation • 19 Mar 2024 • Baifeng Shi, Ziyang Wu, Maolin Mao, Xin Wang, Trevor Darrell

Our results show that a multi-scale smaller model has comparable learning capacity to a larger model, and pre-training smaller models with S$^2$ can match or even exceed the advantage of larger models.

Depth Estimation

172

Paper
Code

xT: Nested Tokenization for Larger Context in Large Images

1 code implementation • 4 Mar 2024 • Ritwik Gupta, Shufan Li, Tyler Zhu, Jitendra Malik, Trevor Darrell, Karttikeya Mangalam

Modern computer vision pipelines handle large images in one of two sub-optimal ways: down-sampling or cropping.

Paper
Code

Humanoid Locomotion as Next Token Prediction

no code implementations • 29 Feb 2024 • Ilija Radosavovic, Bike Zhang, Baifeng Shi, Jathushan Rajasegaran, Sarthak Kamat, Trevor Darrell, Koushil Sreenath, Jitendra Malik

We cast real-world humanoid control as a next token prediction problem, akin to predicting the next word in language.

Humanoid Control

Paper
Add Code

Neural Network Diffusion

1 code implementation • 20 Feb 2024 • Kai Wang, Zhaopan Xu, Yukun Zhou, Zelin Zang, Trevor Darrell, Zhuang Liu, Yang You

The autoencoder extracts latent representations of a subset of the trained network parameters.

740

Paper
Code

InstanceDiffusion: Instance-level Control for Image Generation

1 code implementation • 5 Feb 2024 • Xudong Wang, Trevor Darrell, Sai Saketh Rambhatla, Rohit Girdhar, Ishan Misra

Text-to-image diffusion models produce high quality images but do not offer control over individual instances in the image.

Ranked #3 on Conditional Text-to-Image Synthesis on COCO-MIG

Conditional Text-to-Image Synthesis Image Generation +2

291

Paper
Code

Rethinking Patch Dependence for Masked Autoencoders

1 code implementation • 25 Jan 2024 • Letian Fu, Long Lian, Renhao Wang, Baifeng Shi, Xudong Wang, Adam Yala, Trevor Darrell, Alexei A. Efros, Ken Goldberg

In this work, we re-examine inter-patch dependencies in the decoding mechanism of masked autoencoders (MAE).

Instance Segmentation Representation Learning +1

Paper
Code

From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations

1 code implementation • 3 Jan 2024 • Evonne Ng, Javier Romero, Timur Bagautdinov, Shaojie Bai, Trevor Darrell, Angjoo Kanazawa, Alexander Richard

We present a framework for generating full-bodied photorealistic avatars that gesture according to the conversational dynamics of a dyadic interaction.

Quantization

2,488

Paper
Code

Unsupervised Universal Image Segmentation

1 code implementation • 28 Dec 2023 • Dantong Niu, Xudong Wang, Xinyang Han, Long Lian, Roei Herzig, Trevor Darrell

Several unsupervised image segmentation approaches have been proposed which eliminate the need for dense manually-annotated segmentation masks; current models separately handle either semantic segmentation (e. g., STEGO) or class-agnostic instance segmentation (e. g., CutLER), but not both (i. e., panoptic segmentation).

Ranked #1 on Unsupervised Panoptic Segmentation on COCO val2017

Image Segmentation Instance Segmentation +7

125

Paper
Code

See, Say, and Segment: Teaching LMMs to Overcome False Premises

no code implementations • 13 Dec 2023 • Tsung-Han Wu, Giscard Biamby, David Chan, Lisa Dunlap, Ritwik Gupta, Xudong Wang, Joseph E. Gonzalez, Trevor Darrell

Current open-source Large Multimodal Models (LMMs) excel at tasks such as open-vocabulary language grounding and segmentation but can suffer under false premises when queries imply the existence of something that is not actually present in the image.

Paper
Add Code

Describing Differences in Image Sets with Natural Language

1 code implementation • 5 Dec 2023 • Lisa Dunlap, Yuhui Zhang, Xiaohan Wang, Ruiqi Zhong, Trevor Darrell, Jacob Steinhardt, Joseph E. Gonzalez, Serena Yeung-Levy

To aid in this discovery process, we explore the task of automatically describing the differences between two $\textbf{sets}$ of images, which we term Set Difference Captioning.

Language Modelling

Paper
Code

Recursive Visual Programming

no code implementations • 4 Dec 2023 • Jiaxin Ge, Sanjay Subramanian, Baifeng Shi, Roei Herzig, Trevor Darrell

Visual Programming (VP) has emerged as a powerful framework for Visual Question Answering (VQA).

Code Generation Question Answering +1

Paper
Add Code

Readout Guidance: Learning Control from Diffusion Features

no code implementations • 4 Dec 2023 • Grace Luo, Trevor Darrell, Oliver Wang, Dan B Goldman, Aleksander Holynski

We present Readout Guidance, a method for controlling text-to-image diffusion models with learned signals.

Paper
Add Code

IMProv: Inpainting-based Multimodal Prompting for Computer Vision Tasks

no code implementations • 4 Dec 2023 • Jiarui Xu, Yossi Gandelsman, Amir Bar, Jianwei Yang, Jianfeng Gao, Trevor Darrell, Xiaolong Wang

Given a textual description of a visual task (e. g. "Left: input image, Right: foreground segmentation"), a few input-output visual examples, or both, the model in-context learns to solve it for a new test input.

Colorization Foreground Segmentation +3

Paper
Add Code

Sequential Modeling Enables Scalable Learning for Large Vision Models

1 code implementation • 1 Dec 2023 • Yutong Bai, Xinyang Geng, Karttikeya Mangalam, Amir Bar, Alan Yuille, Trevor Darrell, Jitendra Malik, Alexei A Efros

We introduce a novel sequential modeling approach which enables learning a Large Vision Model (LVM) without making use of any linguistic data.

1,586

Paper
Code

Initializing Models with Larger Ones

1 code implementation • 30 Nov 2023 • Zhiqiu Xu, Yanjie Chen, Kirill Vishniakov, Yida Yin, Zhiqiang Shen, Trevor Darrell, Lingjie Liu, Zhuang Liu

Weight selection offers a new approach to leverage the power of pretrained models in resource-constrained settings, and we hope it can be a useful tool for training small models in the large-model era.

Knowledge Distillation

154

Paper
Code

Object-based (yet Class-agnostic) Video Domain Adaptation

no code implementations • 29 Nov 2023 • Dantong Niu, Amir Bar, Roei Herzig, Trevor Darrell, Anna Rohrbach

Existing video-based action recognition systems typically require dense annotation and struggle in environments when there is significant distribution shift relative to the training data.

Action Recognition Domain Adaptation +1

Paper
Add Code

Compositional Chain-of-Thought Prompting for Large Multimodal Models

1 code implementation • 27 Nov 2023 • Chancharik Mitra, Brandon Huang, Trevor Darrell, Roei Herzig

The combination of strong visual backbones and Large Language Model (LLM) reasoning has led to Large Multimodal Models (LMMs) becoming the current standard for a wide range of vision and language (VL) tasks.

Ranked #30 on Visual Reasoning on Winoground

Language Modelling Large Language Model +1

Paper
Code

Self-correcting LLM-controlled Diffusion Models

no code implementations • 27 Nov 2023 • Tsung-Han Wu, Long Lian, Joseph E. Gonzalez, Boyi Li, Trevor Darrell

Steered by an LLM controller, SLD turns text-to-image generation into an iterative closed-loop process, ensuring correctness in the resulting image.

Attribute Text-to-Image Generation

Paper
Add Code

From Wrong To Right: A Recursive Approach Towards Vision-Language Explanation

no code implementations • 21 Nov 2023 • Jiaxin Ge, Sanjay Subramanian, Trevor Darrell, Boyi Li

Addressing the challenge of adapting pre-trained vision-language models for generating insightful explanations for visual reasoning tasks with limited annotations, we present ReVisE: a $\textbf{Re}$cursive $\textbf{Vis}$ual $\textbf{E}$xplanation algorithm.

Explanation Generation Visual Question Answering (VQA) +1

Paper
Add Code

Which One? Leveraging Context Between Objects and Multiple Views for Language Grounding

2 code implementations • 12 Nov 2023 • Chancharik Mitra, Abrar Anwar, Rodolfo Corona, Dan Klein, Trevor Darrell, Jesse Thomason

When connecting objects and their language referents in an embodied 3D environment, it is important to note that: (1) an object can be better characterized by leveraging comparative information between itself and other objects, and (2) an object's appearance can vary with camera position.

Object Position

Paper
Code

A Coefficient Makes SVRG Effective

1 code implementation • 9 Nov 2023 • Yida Yin, Zhiqiu Xu, Zhiyuan Li, Trevor Darrell, Zhuang Liu

Stochastic Variance Reduced Gradient (SVRG), introduced by Johnson & Zhang (2013), is a theoretically compelling optimization method.

Image Classification

Paper
Code

Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game

no code implementations • 2 Nov 2023 • Sam Toyer, Olivia Watkins, Ethan Adrian Mendes, Justin Svegliato, Luke Bailey, Tiffany Wang, Isaac Ong, Karim Elmaaroufi, Pieter Abbeel, Trevor Darrell, Alan Ritter, Stuart Russell

Our benchmark results show that many models are vulnerable to the attack strategies in the Tensor Trust dataset.

Instruction Following

Paper
Add Code

CLAIR: Evaluating Image Captions with Large Language Models

no code implementations • 19 Oct 2023 • David Chan, Suzanne Petryk, Joseph E. Gonzalez, Trevor Darrell, John Canny

The evaluation of machine-generated image captions poses an interesting yet persistent challenge.

Image Captioning Language Modelling +1

Paper
Add Code

LLM-grounded Video Diffusion Models

no code implementations • 29 Sep 2023 • Long Lian, Baifeng Shi, Adam Yala, Trevor Darrell, Boyi Li

We show that LLMs are able to understand complex spatiotemporal dynamics from text alone and generate layouts that align closely with both the prompts and the object motion patterns typically observed in the real world.

Language Modelling Large Language Model +1

Paper
Add Code

Aligning Large Multimodal Models with Factually Augmented RLHF

no code implementations • 25 Sep 2023 • Zhiqing Sun, Sheng Shen, Shengcao Cao, Haotian Liu, Chunyuan Li, Yikang Shen, Chuang Gan, Liang-Yan Gui, Yu-Xiong Wang, Yiming Yang, Kurt Keutzer, Trevor Darrell

Large Multimodal Models (LMM) are built across modalities and the misalignment between two modalities can result in "hallucination", generating textual outputs that are not grounded by the multimodal information in context.

Hallucination Image Captioning +1

Paper
Add Code

VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation

1 code implementation • 28 Aug 2023 • Xudong Wang, Ishan Misra, Ziyun Zeng, Rohit Girdhar, Trevor Darrell

Existing approaches to unsupervised video instance segmentation typically rely on motion estimates and experience difficulties tracking small or divergent motions.

Instance Segmentation Optical Flow Estimation +5

864

Paper
Code

Can Language Models Learn to Listen?

no code implementations • ICCV 2023 • Evonne Ng, Sanjay Subramanian, Dan Klein, Angjoo Kanazawa, Trevor Darrell, Shiry Ginosar

We present a framework for generating appropriate facial responses from a listener in dyadic social interactions based on the speaker's words.

Language Modelling Large Language Model

Paper
Add Code

Stochastic positional embeddings improve masked image modeling

no code implementations • 31 Jul 2023 • Amir Bar, Florian Bordes, Assaf Shocher, Mahmoud Assran, Pascal Vincent, Nicolas Ballas, Trevor Darrell, Amir Globerson, Yann Lecun

Masked Image Modeling (MIM) is a promising self-supervised learning approach that enables learning from unlabeled images.

Language Modelling Masked Language Modeling +3

Paper
Add Code

Hierarchical Open-vocabulary Universal Image Segmentation

1 code implementation • NeurIPS 2023 • Xudong Wang, Shufan Li, Konstantinos Kallidromitis, Yusuke Kato, Kazuki Kozuka, Trevor Darrell

Open-vocabulary image segmentation aims to partition an image into semantic regions according to arbitrary text descriptions.

Ranked #1 on Image Segmentation on Pascal Panoptic Parts

Image Comprehension Image Segmentation +8

232

Paper
Code

Hyperbolic Active Learning for Semantic Segmentation under Domain Shift

no code implementations • 19 Jun 2023 • Luca Franco, Paolo Mandica, Konstantinos Kallidromitis, Devin Guillory, Yu-Teng Li, Trevor Darrell, Fabio Galasso

In HALO (Hyperbolic Active Learning Optimization), for the first time, we propose the use of epistemic uncertainty as a data acquisition strategy, following the intuition of selecting data points that are the least known.

Active Learning Domain Adaptation +2

Paper
Add Code

Robot Learning with Sensorimotor Pre-training

no code implementations • 16 Jun 2023 • Ilija Radosavovic, Baifeng Shi, Letian Fu, Ken Goldberg, Trevor Darrell, Jitendra Malik

We present a self-supervised sensorimotor pre-training approach for robotics.

Motion Planning

Paper
Add Code

Neural Relighting with Subsurface Scattering by Learning the Radiance Transfer Gradient

no code implementations • 15 Jun 2023 • Shizhan Zhu, Shunsuke Saito, Aljaz Bozic, Carlos Aliaga, Trevor Darrell, Christop Lassner

Reconstructing and relighting objects and scenes under varying lighting conditions is challenging: existing neural rendering methods often cannot handle the complex interactions between materials and light.

Neural Rendering

Paper
Add Code

Modular Visual Question Answering via Code Generation

1 code implementation • 8 Jun 2023 • Sanjay Subramanian, Medhini Narasimhan, Kushal Khangaonkar, Kevin Yang, Arsha Nagrani, Cordelia Schmid, Andy Zeng, Trevor Darrell, Dan Klein

We present a framework that formulates visual question answering as modular code generation.

Code Generation In-Context Learning +2

Paper
Code

Diversify Your Vision Datasets with Automatic Diffusion-Based Augmentation

1 code implementation • NeurIPS 2023 • Lisa Dunlap, Alyssa Umino, Han Zhang, Jiezhi Yang, Joseph E. Gonzalez, Trevor Darrell

As such, we explore how natural language descriptions of the domains seen in training data can be used with large vision models trained on diverse pretraining datasets to generate useful variations of the training data.

Domain Generalization Image Augmentation

Paper
Code

Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for Large Language Models

no code implementations • 24 May 2023 • Sheng Shen, Le Hou, Yanqi Zhou, Nan Du, Shayne Longpre, Jason Wei, Hyung Won Chung, Barret Zoph, William Fedus, Xinyun Chen, Tu Vu, Yuexin Wu, Wuyang Chen, Albert Webson, Yunxuan Li, Vincent Zhao, Hongkun Yu, Kurt Keutzer, Trevor Darrell, Denny Zhou

Sparse Mixture-of-Experts (MoE) is a neural architecture design that can be utilized to add learnable parameters to Large Language Models (LLMs) without increasing inference cost.

Zero-shot Generalization

Paper
Add Code

TOAST: Transfer Learning via Attention Steering

1 code implementation • 24 May 2023 • Baifeng Shi, Siyu Gai, Trevor Darrell, Xin Wang

We introduce Top-Down Attention Steering (TOAST), a novel transfer learning algorithm that keeps the pre-trained backbone frozen, selects task-relevant features in the output, and feeds those features back to the model to steer the attention to the task-specific features.

Fine-Grained Image Classification Instruction Following +2

181

Paper
Code

Diffusion Hyperfeatures: Searching Through Time and Space for Semantic Correspondence

no code implementations • NeurIPS 2023 • Grace Luo, Lisa Dunlap, Dong Huk Park, Aleksander Holynski, Trevor Darrell

We propose Diffusion Hyperfeatures, a framework for consolidating multi-scale and multi-timestep feature maps into per-pixel feature descriptors that can be used for downstream tasks.

Semantic correspondence

Paper
Add Code

LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models

1 code implementation • 23 May 2023 • Long Lian, Boyi Li, Adam Yala, Trevor Darrell

Our method significantly outperforms the base diffusion model and several strong baselines in accurately generating images according to prompts that require various capabilities, doubling the generation accuracy across four tasks on average.

Common Sense Reasoning Language Modelling +2

352

Paper
Code

Simple Token-Level Confidence Improves Caption Correctness

no code implementations • 11 May 2023 • Suzanne Petryk, Spencer Whitehead, Joseph E. Gonzalez, Trevor Darrell, Anna Rohrbach, Marcus Rohrbach

The ability to judge whether a caption correctly describes an image is a critical part of vision-language understanding.

Ranked #62 on Visual Reasoning on Winoground

Hallucination Image Captioning +2

Paper
Add Code

Incorporating Structured Representations into Pretrained Vision & Language Models Using Scene Graphs

no code implementations • 10 May 2023 • Roei Herzig, Alon Mendelson, Leonid Karlinsky, Assaf Arbelle, Rogerio Feris, Trevor Darrell, Amir Globerson

For the visual side, we incorporate a special "SG Component" in the image transformer trained to predict SG information, while for the textual side, we utilize SGs to generate fine-grained captions that highlight different compositional aspects of the scene.

Ranked #24 on Visual Reasoning on Winoground

Scene Understanding Visual Reasoning

Paper
Add Code

PAIR-Diffusion: A Comprehensive Multimodal Object-Level Image Editor

1 code implementation • 30 Mar 2023 • Vidit Goel, Elia Peruzzo, Yifan Jiang, Dejia Xu, Xingqian Xu, Nicu Sebe, Trevor Darrell, Zhangyang Wang, Humphrey Shi

We propose PAIR Diffusion, a generic framework that can enable a diffusion model to control the structure and appearance properties of each object in the image.

Object

467

Paper
Code

Top-Down Visual Attention from Analysis by Synthesis

1 code implementation • CVPR 2023 • Baifeng Shi, Trevor Darrell, Xin Wang

In this paper, we consider top-down attention from a classic Analysis-by-Synthesis (AbS) perspective of vision.

Retrieval Semantic Segmentation +1

157

Paper
Code

Learning and Verification of Task Structure in Instructional Videos

no code implementations • 23 Mar 2023 • Medhini Narasimhan, Licheng Yu, Sean Bell, Ning Zhang, Trevor Darrell

We introduce a new pre-trained video model, VideoTaskformer, focused on representing the semantics and structure of instructional videos.

Activity Recognition

Paper
Add Code

Scaling Vision-Language Models with Sparse Mixture of Experts

no code implementations • 13 Mar 2023 • Sheng Shen, Zhewei Yao, Chunyuan Li, Trevor Darrell, Kurt Keutzer, Yuxiong He

The field of natural language processing (NLP) has made significant strides in recent years, particularly in the development of large-scale vision-language models (VLMs).

Paper
Add Code

Real-World Humanoid Locomotion with Reinforcement Learning

no code implementations • 6 Mar 2023 • Ilija Radosavovic, Tete Xiao, Bike Zhang, Trevor Darrell, Jitendra Malik, Koushil Sreenath

Humanoid robots that can autonomously operate in diverse environments have the potential to help address labour shortages in factories, assist elderly at homes, and colonize new planets.

reinforcement-learning

Paper
Add Code

Dropout Reduces Underfitting

1 code implementation • 2 Mar 2023 • Zhuang Liu, Zhiqiu Xu, Joseph Jin, Zhiqiang Shen, Trevor Darrell

Additionally, we explore a symmetric technique for regularizing overfitting models - late dropout, where dropout is not used in the early iterations and is only activated later in training.

308

Paper
Code

Guiding Pretraining in Reinforcement Learning with Large Language Models

1 code implementation • 13 Feb 2023 • Yuqing Du, Olivia Watkins, Zihan Wang, Cédric Colas, Trevor Darrell, Pieter Abbeel, Abhishek Gupta, Jacob Andreas

Reinforcement learning algorithms typically struggle in the absence of a dense, well-shaped reward function.

Common Sense Reasoning Language Modelling +2

Paper
Code

Back to the Source: Diffusion-Driven Adaptation To Test-Time Corruption

no code implementations • CVPR 2023 • Jin Gao, Jialing Zhang, Xihui Liu, Trevor Darrell, Evan Shelhamer, Dequan Wang

We update the target data instead, and project all test inputs toward the source domain with a generative diffusion model.

Test-time Adaptation

Paper
Add Code

Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial Representation Learning

no code implementations • ICCV 2023 • Colorado J. Reed, Ritwik Gupta, Shufan Li, Sarah Brockman, Christopher Funk, Brian Clipp, Kurt Keutzer, Salvatore Candido, Matt Uyttendaele, Trevor Darrell

Large, pretrained models are commonly finetuned with imagery that is heavily augmented to mimic different conditions and scales, with the resulting models used for various tasks with imagery from a range of spatial scales.

Representation Learning

Paper
Add Code

Re-evaluating the Need for Multimodal Signals in Unsupervised Grammar Induction

no code implementations • 20 Dec 2022 • Boyi Li, Rodolfo Corona, Karttikeya Mangalam, Catherine Chen, Daniel Flaherty, Serge Belongie, Kilian Q. Weinberger, Jitendra Malik, Trevor Darrell, Dan Klein

Are multimodal inputs necessary for grammar induction?

Constituency Parsing

Paper
Add Code

PromptonomyViT: Multi-Task Prompt Learning Improves Video Transformers using Synthetic Scene Data

no code implementations • 8 Dec 2022 • Roei Herzig, Ofir Abramovich, Elad Ben-Avraham, Assaf Arbelle, Leonid Karlinsky, Ariel Shamir, Trevor Darrell, Amir Globerson

In this work, we propose an approach to leverage synthetic scene data for improving video understanding.

Action Recognition Video Understanding

Paper
Add Code

Shape-Guided Diffusion with Inside-Outside Attention

no code implementations • 1 Dec 2022 • Dong Huk Park, Grace Luo, Clayton Toste, Samaneh Azadi, Xihui Liu, Maka Karalashvili, Anna Rohrbach, Trevor Darrell

We introduce precise object silhouette as a new form of user control in text-to-image diffusion models, which we dub Shape-Guided Diffusion.

Object

Paper
Add Code

G^3: Geolocation via Guidebook Grounding

1 code implementation • 28 Nov 2022 • Grace Luo, Giscard Biamby, Trevor Darrell, Daniel Fried, Anna Rohrbach

We propose the task of Geolocation via Guidebook Grounding that uses a dataset of StreetView images from a diverse set of locations and an associated textual guidebook for GeoGuessr, a popular interactive geolocation game.

Paper
Code

Multitask Vision-Language Prompt Tuning

1 code implementation • 21 Nov 2022 • Sheng Shen, Shijia Yang, Tianjun Zhang, Bohan Zhai, Joseph E. Gonzalez, Kurt Keutzer, Trevor Darrell

Specifically, (i) we demonstrate the effectiveness of learning a single transferable prompt from multiple source tasks to initialize the prompt for each target task; (ii) we show many target tasks can benefit each other from sharing prompt vectors and thus can be jointly learned via multitask prompt tuning.

Visual Prompt Tuning

Paper
Code

Using Language to Extend to Unseen Domains

1 code implementation • 18 Oct 2022 • Lisa Dunlap, Clara Mohri, Devin Guillory, Han Zhang, Trevor Darrell, Joseph E. Gonzalez, aditi raghunathan, Anja Rohrbach

It is expensive to collect training data for every possible domain that a vision model may encounter when deployed.

Domain Adaptation

Paper
Code

QDTrack: Quasi-Dense Similarity Learning for Appearance-Only Multiple Object Tracking

2 code implementations • 12 Oct 2022 • Tobias Fischer, Thomas E. Huang, Jiangmiao Pang, Linlu Qiu, Haofeng Chen, Trevor Darrell, Fisher Yu

In this paper, we present Quasi-Dense Similarity Learning, which densely samples hundreds of object regions on a pair of images for contrastive learning.

Ranked #4 on Multiple Object Tracking on BDD100K test

Contrastive Learning Multiple Object Tracking +1

377

Paper
Code

Real-World Robot Learning with Masked Visual Pre-training

1 code implementation • 6 Oct 2022 • Ilija Radosavovic, Tete Xiao, Stephen James, Pieter Abbeel, Jitendra Malik, Trevor Darrell

Finally, we train a 307M parameter vision transformer on a massive collection of 4. 5M images from the Internet and egocentric videos, and demonstrate clearly the benefits of scaling visual pre-training for robot learning.

191

Paper
Code

Decentralized Vehicle Coordination: The Berkeley DeepDrive Drone Dataset

1 code implementation • 19 Sep 2022 • Fangyu Wu, Dequan Wang, Minjune Hwang, Chenhui Hao, Jiawei Lu, Jiamu Zhang, Christopher Chou, Trevor Darrell, Alexandre Bayen

Decentralized multiagent planning has been an important field of research in robotics.

Common Sense Reasoning Navigate

Paper
Code

Prior Knowledge-Guided Attention in Self-Supervised Vision Transformers

no code implementations • 7 Sep 2022 • Kevin Miao, Akash Gokul, Raghav Singh, Suzanne Petryk, Joseph Gonzalez, Kurt Keutzer, Trevor Darrell, Colorado Reed

SPAN operates by regularizing attention masks from separate transformer heads to follow various priors over semantic regions.

Heart Segmentation Representation Learning

Paper
Add Code

Studying Bias in GANs through the Lens of Race

no code implementations • 6 Sep 2022 • Vongani H. Maluleke, Neerja Thakkar, Tim Brooks, Ethan Weber, Trevor Darrell, Alexei A. Efros, Angjoo Kanazawa, Devin Guillory

In this work, we study how the performance and evaluation of generative image models are impacted by the racial composition of their training datasets.

Paper
Add Code

Visual Prompting via Image Inpainting

1 code implementation • 1 Sep 2022 • Amir Bar, Yossi Gandelsman, Trevor Darrell, Amir Globerson, Alexei A. Efros

How does one adapt a pre-trained visual model to novel downstream tasks without task-specific finetuning or any model modification?

Ranked #5 on Personalized Segmentation on PerSeg

Colorization Edge Detection +6

273

Paper
Code

Refine and Represent: Region-to-Object Representation Learning

1 code implementation • 25 Aug 2022 • Akash Gokul, Konstantinos Kallidromitis, Shufan Li, Yusuke Kato, Kazuki Kozuka, Trevor Darrell, Colorado J Reed

Recent works in self-supervised learning have demonstrated strong performance on scene-level dense prediction tasks by pretraining with object-centric or region-based correspondence objectives.

Object Representation Learning +4

Paper
Code

TL;DW? Summarizing Instructional Videos with Task Relevance & Cross-Modal Saliency

no code implementations • 14 Aug 2022 • Medhini Narasimhan, Arsha Nagrani, Chen Sun, Michael Rubinstein, Trevor Darrell, Anna Rohrbach, Cordelia Schmid

In this work, we focus on summarizing instructional videos, an under-explored area of video summarization.

Video Summarization

Paper
Add Code

Back to the Source: Diffusion-Driven Test-Time Adaptation

1 code implementation • 7 Jul 2022 • Jin Gao, Jialing Zhang, Xihui Liu, Trevor Darrell, Evan Shelhamer, Dequan Wang

We instead update the target data, by projecting all test inputs toward the source domain with a generative diffusion model.

Test-time Adaptation

Paper
Code

Disentangled Action Recognition with Knowledge Bases

no code implementations • NAACL 2022 • Zhekun Luo, Shalini Ghosh, Devin Guillory, Keizo Kato, Trevor Darrell, Huijuan Xu

In this paper, we aim to improve the generalization ability of the compositional action recognition model to novel verbs or novel nouns that are unseen during training time, by leveraging the power of knowledge graphs.

Action Recognition Knowledge Graphs

Paper
Add Code

Structured Video Tokens @ Ego4D PNR Temporal Localization Challenge 2022

no code implementations • 15 Jun 2022 • Elad Ben-Avraham, Roei Herzig, Karttikeya Mangalam, Amir Bar, Anna Rohrbach, Leonid Karlinsky, Trevor Darrell, Amir Globerson

First, as both images and videos contain structured information, we enrich a transformer model with a set of \emph{object tokens} that can be used across images and videos.

Point- of-no-return (PNR) temporal localization Temporal Localization

Paper
Add Code

Bringing Image Scene Structure to Video via Frame-Clip Consistency of Object Tokens

no code implementations • 13 Jun 2022 • Elad Ben-Avraham, Roei Herzig, Karttikeya Mangalam, Amir Bar, Anna Rohrbach, Leonid Karlinsky, Trevor Darrell, Amir Globerson

We explore a particular instantiation of scene structure, namely a \emph{Hand-Object Graph}, consisting of hands and objects with their locations as nodes, and physical relations of contact/no-contact as edges.

Action Recognition Video Understanding

Paper
Add Code

Voxel-informed Language Grounding

2 code implementations • ACL 2022 • Rodolfo Corona, Shizhan Zhu, Dan Klein, Trevor Darrell

Natural language applied to natural 2D images describes a fundamentally 3D world.

Paper
Code

Reliable Visual Question Answering: Abstain Rather Than Answer Incorrectly

1 code implementation • 28 Apr 2022 • Spencer Whitehead, Suzanne Petryk, Vedaad Shakib, Joseph Gonzalez, Trevor Darrell, Anna Rohrbach, Marcus Rohrbach

We first enable abstention capabilities for several VQA models, and analyze both their coverage, the portion of questions answered, and risk, the error on that portion.

Question Answering Visual Question Answering

Paper
Code

Visual Attention Emerges from Recurrent Sparse Reconstruction

1 code implementation • 23 Apr 2022 • Baifeng Shi, Yale Song, Neel Joshi, Trevor Darrell, Xin Wang

We present VARS, Visual Attention from Recurrent Sparse reconstruction, a new attention formulation built on two prominent features of the human visual attention mechanism: recurrency and sparsity.

Paper
Code

Contrastive Test-Time Adaptation

1 code implementation • CVPR 2022 • Dian Chen, Dequan Wang, Trevor Darrell, Sayna Ebrahimi

We propose a novel way to leverage self-supervised contrastive learning to facilitate target feature learning, along with an online pseudo labeling scheme with refinement that significantly denoises pseudo labels.

Contrastive Learning Test-time Adaptation +1

Paper
Code

K-LITE: Learning Transferable Visual Models with External Knowledge

2 code implementations • 20 Apr 2022 • Sheng Shen, Chunyuan Li, Xiaowei Hu, Jianwei Yang, Yujia Xie, Pengchuan Zhang, Zhe Gan, Lijuan Wang, Lu Yuan, Ce Liu, Kurt Keutzer, Trevor Darrell, Anna Rohrbach, Jianfeng Gao

We propose K-LITE, a simple strategy to leverage external knowledge for building transferable visual systems: In training, it enriches entities in text with WordNet and Wiktionary knowledge, leading to an efficient and scalable approach to learning image representations that uses knowledge about the visual concepts.

Benchmarking Descriptive +4

369

Paper
Code

Learning to Listen: Modeling Non-Deterministic Dyadic Facial Motion

no code implementations • CVPR 2022 • Evonne Ng, Hanbyul Joo, Liwen Hu, Hao Li, Trevor Darrell, Angjoo Kanazawa, Shiry Ginosar

We present a framework for modeling interactional communication in dyadic conversations: given multimodal inputs of a speaker, we autoregressively output multiple possibilities of corresponding listener motion.

Paper
Add Code

ReCLIP: A Strong Zero-Shot Baseline for Referring Expression Comprehension

2 code implementations • ACL 2022 • Sanjay Subramanian, William Merrill, Trevor Darrell, Matt Gardner, Sameer Singh, Anna Rohrbach

Training a referring expression comprehension (ReC) model for a new visual domain requires collecting referring expressions, and potentially corresponding bounding boxes, for images in the domain.

Image Classification Referring Expression +1

Paper
Code

Teachable Reinforcement Learning via Advice Distillation

1 code implementation • NeurIPS 2021 • Olivia Watkins, Trevor Darrell, Pieter Abbeel, Jacob Andreas, Abhishek Gupta

Training automated agents to complete complex tasks in interactive environments is challenging: reinforcement learning requires careful hand-engineering of reward functions, imitation learning requires specialized infrastructure and access to a human expert, and learning from intermediate forms of supervision (like binary preferences) is time-consuming and extracts little information from each human intervention.

Imitation Learning reinforcement-learning +1

Paper
Code

Masked Visual Pre-training for Motor Control

1 code implementation • 11 Mar 2022 • Tete Xiao, Ilija Radosavovic, Trevor Darrell, Jitendra Malik

This paper shows that self-supervised visual pre-training from real-world images is effective for learning motor control tasks from pixels.

191

Paper
Code

On Guiding Visual Attention with Language Specification

no code implementations • CVPR 2022 • Suzanne Petryk, Lisa Dunlap, Keyan Nasseri, Joseph Gonzalez, Trevor Darrell, Anna Rohrbach

To do this, we ground task-relevant words or phrases with attention maps from a pretrained large-scale model.

Classification Fairness

Paper
Add Code

Explaining Reinforcement Learning Policies through Counterfactual Trajectories

1 code implementation • 29 Jan 2022 • Julius Frost, Olivia Watkins, Eric Weiner, Pieter Abbeel, Trevor Darrell, Bryan Plummer, Kate Saenko

In order for humans to confidently decide where to employ RL agents for real-world tasks, a human developer must validate that the agent will perform well at test-time.

counterfactual Decision Making +2

Paper
Code

A ConvNet for the 2020s

45 code implementations • CVPR 2022 • Zhuang Liu, Hanzi Mao, Chao-yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie

The "Roaring 20s" of visual recognition began with the introduction of Vision Transformers (ViTs), which quickly superseded ConvNets as the state-of-the-art image classification model.

Ranked #1 on Classification on InDL

Classification Domain Generalization +3

60,884

Paper
Code

Watch Those Words: Video Falsification Detection Using Word-Conditioned Facial Motion

1 code implementation • 21 Dec 2021 • Shruti Agarwal, Liwen Hu, Evonne Ng, Trevor Darrell, Hao Li, Anna Rohrbach

In today's era of digital misinformation, we are increasingly faced with new threats posed by video falsification techniques.

Misinformation

Paper
Code

Twitter-COMMs: Detecting Climate, COVID, and Military Multimodal Misinformation

1 code implementation • NAACL 2022 • Giscard Biamby, Grace Luo, Trevor Darrell, Anna Rohrbach

Detecting out-of-context media, such as "mis-captioned" images on Twitter, is a relevant problem, especially in domains of high public significance.

Misinformation

Paper
Code

More Control for Free! Image Synthesis with Semantic Diffusion Guidance

no code implementations • 10 Dec 2021 • Xihui Liu, Dong Huk Park, Samaneh Azadi, Gong Zhang, Arman Chopikyan, Yuxiao Hu, Humphrey Shi, Anna Rohrbach, Trevor Darrell

We investigate fine-grained, continuous control of this model class, and introduce a novel unified framework for semantic diffusion guidance, which allows either language or image guidance, or both.

Continuous Control Denoising +1

Paper
Add Code

Learning to Detect Every Thing in an Open World

no code implementations • 3 Dec 2021 • Kuniaki Saito, Ping Hu, Trevor Darrell, Kate Saenko

LDET leads to significant improvements on many datasets in the open-world instance segmentation task, outperforming baselines on cross-category generalization on COCO, as well as cross-dataset evaluation on UVO and Cityscapes.

Data Augmentation object-detection +3

Paper
Add Code

Discovering Non-monotonic Autoregressive Orderings with Variational Inference

1 code implementation • 27 Oct 2021 • Xuanlin Li, Brandon Trabucco, Dong Huk Park, Michael Luo, Sheng Shen, Trevor Darrell, Yang Gao

Permutations then serve as target generation orders for training an insertion-based Transformer language model.

Image Captioning Language Modelling +3

Paper
Code

Object-Region Video Transformers

1 code implementation • CVPR 2022 • Roei Herzig, Elad Ben-Avraham, Karttikeya Mangalam, Amir Bar, Gal Chechik, Anna Rohrbach, Trevor Darrell, Amir Globerson

In this work, we present Object-Region Video Transformers (ORViT), an \emph{object-centric} approach that extends video transformer layers with a block that directly incorporates object representations.

Ranked #6 on Action Recognition on Diving-48

Action Detection Few-Shot action recognition +3

Paper
Code

Differentiable Gradient Sampling for Learning Implicit 3D Scene Reconstructions from a Single Image

no code implementations • ICLR 2022 • Shizhan Zhu, Sayna Ebrahimi, Angjoo Kanazawa, Trevor Darrell

Existing approaches for single object reconstruction impose supervision signals based on the loss of the signed distance value from all locations in a scene, posing difficulties when extending to real-world scenarios.

Indoor Scene Reconstruction Object Reconstruction +1

Paper
Add Code

Pyramid Mini-Batching for Optimal Transport

no code implementations • 29 Sep 2021 • Devin Guillory, Kuniaki Saito, Eric Tzeng, Yannik Pitcan, Kate Saenko, Trevor Darrell

Optimal transport theory provides a useful tool to measure the differences between two distributions.

Domain Adaptation

Paper
Add Code

Zero-Shot Reward Specification via Grounded Natural Language

no code implementations • 29 Sep 2021 • Parsa Mahmoudieh, Sayna Ebrahimi, Deepak Pathak, Trevor Darrell

Reward signals in reinforcement learning can be expensive signals in many tasks and often require access to direct state.

Reinforcement Learning (RL)

Paper
Add Code

On-target Adaptation

1 code implementation • 2 Sep 2021 • Dequan Wang, Shaoteng Liu, Sayna Ebrahimi, Evan Shelhamer, Trevor Darrell

Domain adaptation seeks to mitigate the shift between training on the \emph{source} domain and testing on the \emph{target} domain.

Domain Adaptation

198

Paper
Code

Tune it the Right Way: Unsupervised Validation of Domain Adaptation via Soft Neighborhood Density

2 code implementations • ICCV 2021 • Kuniaki Saito, Donghyun Kim, Piotr Teterwak, Stan Sclaroff, Trevor Darrell, Kate Saenko

Unsupervised domain adaptation (UDA) methods can dramatically improve generalization on unlabeled target domains.

Image Classification Semantic Segmentation +1

319

Paper
Code

Region-level Active Detector Learning

no code implementations • 20 Aug 2021 • Michael Laielli, Giscard Biamby, Dian Chen, Ritwik Gupta, Adam Loeffler, Phat Dat Nguyen, Ross Luo, Trevor Darrell, Sayna Ebrahimi

Active learning for object detection is conventionally achieved by applying techniques developed for classification in a way that aggregates individual detections into image-level selection criteria.

Active Learning Object +2

Paper
Add Code

Predicting with Confidence on Unseen Distributions

no code implementations • ICCV 2021 • Devin Guillory, Vaishaal Shankar, Sayna Ebrahimi, Trevor Darrell, Ludwig Schmidt

Our work connects techniques from domain adaptation and predictive uncertainty literature, and allows us to predict model accuracy on challenging unseen distributions without access to labeled data.

Domain Adaptation

Paper
Add Code

CLIP-It! Language-Guided Video Summarization

1 code implementation • NeurIPS 2021 • Medhini Narasimhan, Anna Rohrbach, Trevor Darrell

A generic video summary is an abridged version of a video that conveys the whole story and features the most important scenes.

Query-focused Summarization Video Summarization

Paper
Code

Early Convolutions Help Transformers See Better

1 code implementation • NeurIPS 2021 • Tete Xiao, Mannat Singh, Eric Mintun, Trevor Darrell, Piotr Dollár, Ross Girshick

To test whether this atypical design choice causes an issue, we analyze the optimization behavior of ViT models with their original patchify stem versus a simple counterpart where we replace the ViT stem by a small number of stacked stride-two 3*3 convolutions.

Paper
Code

DETReg: Unsupervised Pretraining with Region Priors for Object Detection

1 code implementation • CVPR 2022 • Amir Bar, Xin Wang, Vadim Kantorov, Colorado J Reed, Roei Herzig, Gal Chechik, Anna Rohrbach, Trevor Darrell, Amir Globerson

Recent self-supervised pretraining methods for object detection largely focus on pretraining the backbone of the object detector, neglecting key parts of detection architecture.

Ranked #1 on Few-Shot Object Detection on COCO 2017

Few-Shot Learning Few-Shot Object Detection +6

332

Paper
Code

Towards Learning to Play Piano with Dexterous Hands and Touch

1 code implementation • 3 Jun 2021 • Huazhe Xu, Yuping Luo, Shaoxiong Wang, Trevor Darrell, Roberto Calandra

The virtuoso plays the piano with passion, poetry and extraordinary technical ability.

Reinforcement Learning (RL)

Paper
Code

PyTouch: A Machine Learning Library for Touch Processing

1 code implementation • 26 May 2021 • Mike Lambeta, Huazhe Xu, Jingwei Xu, Po-Wei Chou, Shaoxiong Wang, Trevor Darrell, Roberto Calandra

With the increased availability of rich tactile sensors, there is an equally proportional need for open-source and integrated software capable of efficiently and effectively processing raw touch measurements into high-level signals that can be used for control and decision-making.

BIG-bench Machine Learning Decision Making +1

223

Paper
Code

Fighting Gradients with Gradients: Dynamic Defenses against Adversarial Attacks

2 code implementations • NeurIPS 2021 • Dequan Wang, An Ju, Evan Shelhamer, David Wagner, Trevor Darrell

Adversarial attacks optimize against models to defeat defenses.

318

Paper
Code

Robust Object Detection via Instance-Level Temporal Cycle Confusion

1 code implementation • ICCV 2021 • Xin Wang, Thomas E. Huang, Benlin Liu, Fisher Yu, Xiaolong Wang, Joseph E. Gonzalez, Trevor Darrell

Building reliable object detectors that are robust to domain shifts, such as various changes in context, viewpoint, and object appearances, is critical for real-world applications.

Object object-detection +2

Paper
Code

Auto-Tuned Sim-to-Real Transfer

1 code implementation • 15 Apr 2021 • Yuqing Du, Olivia Watkins, Trevor Darrell, Pieter Abbeel, Deepak Pathak

Policies trained in simulation often fail when transferred to the real world due to the `reality gap' where the simulator is unable to accurately capture the dynamics and visual properties of the real world.

Paper
Code

NewsCLIPpings: Automatic Generation of Out-of-Context Multimodal Media

1 code implementation • EMNLP 2021 • Grace Luo, Trevor Darrell, Anna Rohrbach

Online misinformation is a prevalent societal issue, with adversaries relying on tools ranging from cheap fakes to sophisticated deep fakes.

Misinformation

Paper
Code

Strumming to the Beat: Audio-Conditioned Contrastive Video Textures

no code implementations • 6 Apr 2021 • Medhini Narasimhan, Shiry Ginosar, Andrew Owens, Alexei A. Efros, Trevor Darrell

We learn representations for video frames and frame-to-frame transition probabilities by fitting a video-specific model trained using contrastive learning.

Contrastive Learning Self-Supervised Learning +1

Paper
Add Code

Anytime Dense Prediction with Confidence Adaptivity

1 code implementation • ICLR 2022 • Zhuang Liu, Zhiqiu Xu, Hung-Ju Wang, Trevor Darrell, Evan Shelhamer

A cascade of "exits" is attached to the model to make multiple predictions.

Image Classification Pose Estimation +1

Paper
Code

Prototypical Cross-domain Self-supervised Learning for Few-shot Unsupervised Domain Adaptation

1 code implementation • CVPR 2021 • Xiangyu Yue, Zangwei Zheng, Shanghang Zhang, Yang Gao, Trevor Darrell, Kurt Keutzer, Alberto Sangiovanni Vincentelli

In this paper, we propose an end-to-end Prototypical Cross-domain Self-Supervised Learning (PCS) framework for Few-shot Unsupervised Domain Adaptation (FUDA).

Ranked #6 on Semantic Segmentation on DensePASS

Contrastive Learning Self-Supervised Learning +2

Paper
Code

Region Similarity Representation Learning

1 code implementation • ICCV 2021 • Tete Xiao, Colorado J Reed, Xiaolong Wang, Kurt Keutzer, Trevor Darrell

We present Region Similarity Representation Learning (ReSim), a new approach to self-supervised representation learning for localization-based tasks such as object detection and segmentation.

Instance Segmentation Object +5

Paper
Code

Self-Supervised Pretraining Improves Self-Supervised Pretraining

1 code implementation • 23 Mar 2021 • Colorado J. Reed, Xiangyu Yue, Ani Nrusimha, Sayna Ebrahimi, Vivek Vijaykumar, Richard Mao, Bo Li, Shanghang Zhang, Devin Guillory, Sean Metzger, Kurt Keutzer, Trevor Darrell

Through experimentation on 16 diverse vision datasets, we show HPT converges up to 80x faster, improves accuracy across tasks, and improves the robustness of the self-supervised pretraining process to changes in the image augmentation policy or amount of pretraining data.

Image Augmentation

Paper
Code

Monocular Quasi-Dense 3D Object Tracking

1 code implementation • 12 Mar 2021 • Hou-Ning Hu, Yung-Hsu Yang, Tobias Fischer, Trevor Darrell, Fisher Yu, Min Sun

Experiments on our proposed simulation data and real-world benchmarks, including KITTI, nuScenes, and Waymo datasets, show that our tracking framework offers robust object association and tracking on urban-driving scenarios.

Ranked #7 on Multiple Object Tracking on KITTI Tracking test

3D Object Tracking Autonomous Driving +3

504

Paper
Code

Instance-Aware Predictive Navigation in Multi-Agent Environments

1 code implementation • 14 Jan 2021 • Jinkun Cao, Xin Wang, Trevor Darrell, Fisher Yu

To decide the action at each step, we seek the action sequence that can lead to safe future states based on the prediction module outputs by repeatedly sampling likely action sequences.

Paper
Code

Regularization Matters in Policy Optimization - An Empirical Study on Continuous Control

1 code implementation • ICLR 2021 • Zhuang Liu, Xuanlin Li, Bingyi Kang, Trevor Darrell

In this work, we present the first comprehensive study of regularization techniques with multiple policy optimization algorithms on continuous control tasks.

Continuous Control

Paper
Code

Novelty Detection with Rotated Contrastive Predictive Coding

no code implementations • 1 Jan 2021 • Dong Huk Park, Trevor Darrell

To this end, reconstruction-based learning is often used in which the normality of an observation is expressed in how well it can be reconstructed.

Contrastive Learning Novelty Detection

Paper
Add Code

Unconditional Synthesis of Complex Scenes Using a Semantic Bottleneck

no code implementations • 1 Jan 2021 • Samaneh Azadi, Michael Tschannen, Eric Tzeng, Sylvain Gelly, Trevor Darrell, Mario Lucic

Coupling the high-fidelity generation capabilities of label-conditional image synthesis methods with the flexibility of unconditional generative models, we propose a semantic bottleneck GAN model for unconditional synthesis of complex scenes.

Image Generation Segmentation

Paper
Add Code

Discovering Autoregressive Orderings with Variational Inference

1 code implementation • ICLR 2021 • Xuanlin Li, Brandon Trabucco, Dong Huk Park, Michael Luo, Sheng Shen, Trevor Darrell, Yang Gao

One strategy to recover this information is to decode both the content and location of tokens.

Code Generation Image Captioning +2

Paper
Code

Contrastive Video Textures

no code implementations • 1 Jan 2021 • Medhini Narasimhan, Shiry Ginosar, Andrew Owens, Alexei A Efros, Trevor Darrell

By randomly traversing edges with high transition probabilities, we generate diverse temporally smooth videos with novel sequences and transitions.

Contrastive Learning Video Generation

Paper
Add Code

Minimax Active Learning

no code implementations • 18 Dec 2020 • Sayna Ebrahimi, William Gan, Dian Chen, Giscard Biamby, Kamyar Salahi, Michael Laielli, Shizhan Zhu, Trevor Darrell

Active learning aims to develop label-efficient algorithms by querying the most representative samples to be labeled by a human annotator.

Active Learning Clustering +2

Paper
Add Code

Temporal Action Detection with Multi-level Supervision

no code implementations • ICCV 2021 • Baifeng Shi, Qi Dai, Judy Hoffman, Kate Saenko, Trevor Darrell, Huijuan Xu

We extensively benchmark against the baselines for SSAD and OSAD on our created data splits in THUMOS14 and ActivityNet1. 2, and demonstrate the effectiveness of the proposed UFA and IB methods.

Action Detection Semi-Supervised Action Detection

Paper
Add Code

Fighting Copycat Agents in Behavioral Cloning from Observation Histories

no code implementations • NeurIPS 2020 • Chuan Wen, Jierui Lin, Trevor Darrell, Dinesh Jayaraman, Yang Gao

Imitation learning trains policies to map from input observations to the actions that an expert would choose.

Imitation Learning

Paper
Add Code

Modular Networks for Compositional Instruction Following

no code implementations • NAACL 2021 • Rodolfo Corona, Daniel Fried, Coline Devin, Dan Klein, Trevor Darrell

In our approach, subgoal modules each carry out natural language instructions for a specific subgoal type.

Instruction Following

Paper
Add Code

Auxiliary Task Reweighting for Minimum-data Learning

no code implementations • NeurIPS 2020 • Baifeng Shi, Judy Hoffman, Kate Saenko, Trevor Darrell, Huijuan Xu

By adjusting the auxiliary task weights to minimize the divergence between the surrogate prior and the true prior of the main task, we obtain a more accurate prior estimation, achieving the goal of minimizing the required amount of training data for the main task and avoiding a costly grid search.

Domain Adaptation Multi-Label Classification

Paper
Add Code

Learning Invariant Representations and Risks for Semi-supervised Domain Adaptation

no code implementations • CVPR 2021 • Bo Li, Yezhen Wang, Shanghang Zhang, Dongsheng Li, Trevor Darrell, Kurt Keutzer, Han Zhao

First, we provide a finite sample bound for both classification and regression problems under Semi-DA.

regression Semi-supervised Domain Adaptation +2

Paper
Add Code

Remembering for the Right Reasons: Explanations Reduce Catastrophic Forgetting

1 code implementation • ICLR 2021 • Sayna Ebrahimi, Suzanne Petryk, Akash Gokul, William Gan, Joseph E. Gonzalez, Marcus Rohrbach, Trevor Darrell

The goal of continual learning (CL) is to learn a sequence of tasks without suffering from the phenomenon of catastrophic forgetting.

Continual Learning

Paper
Code

Reducing Class Collapse in Metric Learning with Easy Positive Sampling

no code implementations • 28 Sep 2020 • Elad Levi, Tete Xiao, Xiaolong Wang, Trevor Darrell

We theoretically prove and empirically show that under reasonable noise assumptions, prevalent embedding losses in metric learning, e. g., triplet loss, tend to project all samples of a class with various modes onto a single point in the embedding space, resulting in a class collapse that usually renders the space ill-sorted for classification or retrieval.

Image Retrieval Metric Learning +1

Paper
Add Code

SelfAugment: Automatic Augmentation Policies for Self-Supervised Learning

1 code implementation • CVPR 2021 • Colorado J Reed, Sean Metzger, Aravind Srinivas, Trevor Darrell, Kurt Keutzer

A common practice in unsupervised representation learning is to use labeled data to evaluate the quality of the learned representations.

Data Augmentation Representation Learning +1

Paper
Code

ePointDA: An End-to-End Simulation-to-Real Domain Adaptation Framework for LiDAR Point Cloud Segmentation

no code implementations • 7 Sep 2020 • Sicheng Zhao, Yezhen Wang, Bo Li, Bichen Wu, Yang Gao, Pengfei Xu, Trevor Darrell, Kurt Keutzer

They require prior knowledge of real-world statistics and ignore the pixel-level dropout noise gap and the spatial feature gap between different domains.

Autonomous Driving Domain Adaptation +3

Paper
Add Code

Hierarchical Style-based Networks for Motion Synthesis

no code implementations • ECCV 2020 • Jingwei Xu, Huazhe Xu, Bingbing Ni, Xiaokang Yang, Xiaolong Wang, Trevor Darrell

Generating diverse and natural human motion is one of the long-standing goals for creating intelligent characters in the animated world.

Motion Synthesis

Paper
Add Code

Identity-Aware Multi-Sentence Video Description

1 code implementation • ECCV 2020 • Jae Sung Park, Trevor Darrell, Anna Rohrbach

This auxiliary task allows us to propose a two-stage approach to Identity-Aware Video Description.

Gender Prediction Sentence +1

Paper
Code

What Should Not Be Contrastive in Contrastive Learning

no code implementations • ICLR 2021 • Tete Xiao, Xiaolong Wang, Alexei A. Efros, Trevor Darrell

Recent self-supervised contrastive methods have been able to produce impressive transferable visual representations by learning to be invariant to different data augmentations.

Contrastive Learning

Paper
Add Code

Body2Hands: Learning to Infer 3D Hands from Conversational Gesture Body Dynamics

1 code implementation • CVPR 2021 • Evonne Ng, Shiry Ginosar, Trevor Darrell, Hanbyul Joo

We demonstrate the efficacy of our method on hand gesture synthesis from body motion input, and as a strong body prior for single-view image-based 3D hand pose estimation.

3D Hand Pose Estimation

104

Paper
Code

Seeing the Un-Scene: Learning Amodal Semantic Maps for Room Navigation

no code implementations • ECCV 2020 • Medhini Narasimhan, Erik Wijmans, Xinlei Chen, Trevor Darrell, Dhruv Batra, Devi Parikh, Amanpreet Singh

We also demonstrate that reducing the task of room navigation to point navigation improves the performance further.

Navigate

Paper
Add Code

Video Prediction via Example Guidance

1 code implementation • ICML 2020 • Jingwei Xu, Huazhe Xu, Bingbing Ni, Xiaokang Yang, Trevor Darrell

In video prediction tasks, one major challenge is to capture the multi-modal nature of future contents and dynamics.

Video Prediction

Paper
Code

Compositional Video Synthesis with Action Graphs

1 code implementation • 27 Jun 2020 • Amir Bar, Roei Herzig, Xiaolong Wang, Anna Rohrbach, Gal Chechik, Trevor Darrell, Amir Globerson

Our generative model for this task (AG2Vid) disentangles motion and appearance features, and by incorporating a scheduling mechanism for actions facilitates a timely and coordinated video generation.

Scheduling Video Generation +2

Paper
Code

Tent: Fully Test-time Adaptation by Entropy Minimization

2 code implementations • ICLR 2021 • Dequan Wang, Evan Shelhamer, Shaoteng Liu, Bruno Olshausen, Trevor Darrell

A model must adapt itself to generalize to new and different data during testing.

General Classification Image Classification +3

318

Paper
Code

Quasi-Dense Similarity Learning for Multiple Object Tracking

3 code implementations • CVPR 2021 • Jiangmiao Pang, Linlu Qiu, Xia Li, Haofeng Chen, Qi Li, Trevor Darrell, Fisher Yu

Compared to methods with similar detectors, it boosts almost 10 points of MOTA and significantly decreases the number of ID switches on BDD100K and Waymo datasets.

Ranked #1 on One-Shot Object Detection on PASCAL VOC 2012 val

Contrastive Learning Metric Learning +4

377

Paper
Code

Rethinking preventing class-collapsing in metric learning with margin-based losses

no code implementations • ICCV 2021 • Elad Levi, Tete Xiao, Xiaolong Wang, Trevor Darrell

We theoretically prove and empirically show that under reasonable noise assumptions, margin-based losses tend to project all samples of a class with various modes onto a single point in the embedding space, resulting in a class collapse that usually renders the space ill-sorted for classification or retrieval.

Image Retrieval Metric Learning +1

Paper
Add Code

ParkPredict: Motion and Intent Prediction of Vehicles in Parking Lots

no code implementations • 21 Apr 2020 • Xu Shen, Ivo Batkovic, Vijay Govindarajan, Paolo Falcone, Trevor Darrell, Francesco Borrelli

We investigate the problem of predicting driver behavior in parking lots, an environment which is less structured than typical road networks and features complex, interactive maneuvers in a compact space.

Paper
Add Code

Contrastive Examples for Addressing the Tyranny of the Majority

no code implementations • 14 Apr 2020 • Viktoriia Sharmanska, Lisa Anne Hendricks, Trevor Darrell, Novi Quadrianto

Computer vision algorithms, e. g. for face recognition, favour groups of individuals that are better represented in the training data.

Face Recognition

Paper
Add Code

Spatio-Temporal Action Detection with Multi-Object Interaction

no code implementations • 1 Apr 2020 • Huijuan Xu, Lizhi Yang, Stan Sclaroff, Kate Saenko, Trevor Darrell

Spatio-temporal action detection in videos requires localizing the action both spatially and temporally in the form of an "action tube".

Action Detection Human Detection +2

Paper
Add Code

Revisiting Few-shot Activity Detection with Class Similarity Control

no code implementations • 31 Mar 2020 • Huijuan Xu, Ximeng Sun, Eric Tzeng, Abir Das, Kate Saenko, Trevor Darrell

In this paper, we present a conceptually simple and general yet novel framework for few-shot temporal activity detection based on proposal regression which detects the start and end time of the activities in untrimmed videos.

Action Detection Activity Detection +1

Paper
Add Code

Weakly-Supervised Action Localization with Expectation-Maximization Multi-Instance Learning

1 code implementation • ECCV 2020 • Zhekun Luo, Devin Guillory, Baifeng Shi, Wei Ke, Fang Wan, Trevor Darrell, Huijuan Xu

Weakly-supervised action localization requires training a model to localize the action segments in the video given only video level action label.

Ranked #9 on Weakly Supervised Action Localization on THUMOS’14

Action Localization Multiple Instance Learning +2

Paper
Code

Adversarial Continual Learning

1 code implementation • ECCV 2020 • Sayna Ebrahimi, Franziska Meier, Roberto Calandra, Trevor Darrell, Marcus Rohrbach

We show that shared features are significantly less prone to forgetting and propose a novel hybrid continual learning framework that learns a disjoint representation for task-invariant and task-specific features required to solve a sequence of tasks.

Continual Learning Image Classification

251

Paper
Code

Frustratingly Simple Few-Shot Object Detection

5 code implementations • ICML 2020 • Xin Wang, Thomas E. Huang, Trevor Darrell, Joseph E. Gonzalez, Fisher Yu

Such a simple approach outperforms the meta-learning methods by roughly 2~20 points on current benchmarks and sometimes even doubles the accuracy of the prior methods.

Ranked #17 on Few-Shot Object Detection on MS-COCO (30-shot)

Few-Shot Object Detection Meta-Learning +2

1,057

Paper
Code

Un-Mix: Rethinking Image Mixtures for Unsupervised Visual Representation Learning

3 code implementations • 11 Mar 2020 • Zhiqiang Shen, Zechun Liu, Zhuang Liu, Marios Savvides, Trevor Darrell, Eric Xing

This drawback hinders the model from learning subtle variance and fine-grained information.

Representation Learning

148

Paper
Code

Meta-Baseline: Exploring Simple Meta-Learning for Few-Shot Learning

10 code implementations • ICCV 2021 • Yinbo Chen, Zhuang Liu, Huijuan Xu, Trevor Darrell, Xiaolong Wang

The edge between these two lines of works has yet been underexplored, and the effectiveness of meta-learning in few-shot learning remains unclear.

Few-Shot Learning General Classification

587

Paper
Code

Towards Practical Multi-Object Manipulation using Relational Reinforcement Learning

1 code implementation • 23 Dec 2019 • Richard Li, Allan Jabri, Trevor Darrell, Pulkit Agrawal

Learning robotic manipulation tasks using reinforcement learning with sparse rewards is currently impractical due to the outrageous data requirements.

Object reinforcement-learning +2

Paper
Code

Something-Else: Compositional Action Recognition with Spatial-Temporal Interaction Networks

1 code implementation • CVPR 2020 • Joanna Materzynska, Tete Xiao, Roei Herzig, Huijuan Xu, Xiaolong Wang, Trevor Darrell

Human action is naturally compositional: humans can easily recognize and perform actions with objects that are different from those used in training demonstrations.

Action Recognition Object

138

Paper
Code

Learning Canonical Representations for Scene Graph to Image Generation

2 code implementations • ECCV 2020 • Roei Herzig, Amir Bar, Huijuan Xu, Gal Chechik, Trevor Darrell, Amir Globerson

Generating realistic images of complex visual scenes becomes challenging when one wishes to control the structure of the generated images.

Ranked #3 on Layout-to-Image Generation on Visual Genome 256x256

Layout-to-Image Generation Scene Generation

Paper
Code

Compositional Plan Vectors

1 code implementation • NeurIPS 2019 • Coline Devin, Daniel Geng, Pieter Abbeel, Trevor Darrell, Sergey Levine

We show that CPVs can be learned within a one-shot imitation learning framework without any additional supervision or information about task hierarchy, and enable a demonstration-conditioned policy to generalize to tasks that sequence twice as many skills as the tasks seen during training.

Imitation Learning

Paper
Code

Semantic Bottleneck Scene Generation

2 code implementations • 26 Nov 2019 • Samaneh Azadi, Michael Tschannen, Eric Tzeng, Sylvain Gelly, Trevor Darrell, Mario Lucic

For the former, we use an unconditional progressive segmentation generation network that captures the distribution of realistic semantic scene layouts.

Ranked #1 on Image Generation on Cityscapes-5K 256x512

Conditional Image Generation Image-to-Image Translation +2

Paper
Code

Iterative Answer Prediction with Pointer-Augmented Multimodal Transformers for TextVQA

1 code implementation • CVPR 2020 • Ronghang Hu, Amanpreet Singh, Trevor Darrell, Marcus Rohrbach

Recent work has explored the TextVQA task that requires reading and understanding text in images to answer a question.

General Classification

Paper
Code

Plan Arithmetic: Compositional Plan Vectors for Multi-Task Control

no code implementations • 30 Oct 2019 • Coline Devin, Daniel Geng, Pieter Abbeel, Trevor Darrell, Sergey Levine

Imitation Learning

Paper
Add Code

Regularization Matters in Policy Optimization

2 code implementations • 21 Oct 2019 • Zhuang Liu, Xuanlin Li, Bingyi Kang, Trevor Darrell

In this work, we present the first comprehensive study of regularization techniques with multiple policy optimization algorithms on continuous control tasks.

Continuous Control Reinforcement Learning (RL)

Paper
Code

Exploring Simple and Transferable Recognition-Aware Image Processing

1 code implementation • 21 Oct 2019 • Zhuang Liu, Hung-Ju Wang, Tinghui Zhou, Zhiqiang Shen, Bingyi Kang, Evan Shelhamer, Trevor Darrell

Interestingly, the processing model's ability to enhance recognition quality can transfer when evaluated on models of different architectures, recognized categories, tasks and training datasets.

Image Retrieval Recommendation Systems

Paper
Code

Zero-shot Policy Learning with Spatial Temporal RewardDecomposition on Contingency-aware Observation

1 code implementation • 17 Oct 2019 • Huazhe Xu, Boyuan Chen, Yang Gao, Trevor Darrell

The agent is first presented with previous experiences in the training environment, along with task description in the form of trajectory-level sparse rewards.

Continuous Control Model Predictive Control +2

Paper
Code

Unsupervised Domain Adaptation through Self-Supervision

3 code implementations • 26 Sep 2019 • Yu Sun, Eric Tzeng, Trevor Darrell, Alexei A. Efros

This paper addresses unsupervised domain adaptation, the setting where labeled training data is available on a source domain, but the goal is to have good performance on a target domain with only unlabeled data.

Ranked #63 on Synthetic-to-Real Translation on GTAV-to-Cityscapes Labels

Unsupervised Domain Adaptation

Paper
Code

Blurring Structure and Learning to Optimize and Adapt Receptive Fields

no code implementations • 25 Sep 2019 • Evan Shelhamer, Dequan Wang, Trevor Darrell

Adapting receptive fields by dynamic Gaussian structure further improves results, equaling the accuracy of free-form deformation while improving efficiency.

Semantic Segmentation

Paper
Add Code

Composable Semi-parametric Modelling for Long-range Motion Generation

no code implementations • 25 Sep 2019 • Jingwei Xu, Huazhe Xu, Bingbing Ni, Xiaokang Yang, Trevor Darrell

Learning diverse and natural behaviors is one of the longstanding goal for creating intelligent characters in the animated world.

Paper
Add Code

Scoring-Aggregating-Planning: Learning task-agnostic priors from interactions and sparse rewards for zero-shot generalization

no code implementations • 25 Sep 2019 • Huazhe Xu, Boyuan Chen, Yang Gao, Trevor Darrell

In this paper, we propose Scoring-Aggregating-Planning (SAP), a framework that can learn task-agnostic semantics and dynamics priors from arbitrary quality interactions as well as the corresponding sparse rewards and then plan on unseen tasks in zero-shot condition.

Zero-shot Generalization

Paper
Add Code

Weakly-Supervised Trajectory Segmentation for Learning Reusable Skills

no code implementations • 25 Sep 2019 • Parsa Mahmoudieh, Trevor Darrell, Deepak Pathak

Instead of direct manual supervision which is tedious and prone to bias, in this work, our goal is to extract reusable skills from a collection of human demonstrations collected directly for several end-tasks.

Multiple Instance Learning Segmentation

Paper
Add Code

Dynamic Scale Inference by Entropy Minimization

no code implementations • 8 Aug 2019 • Dequan Wang, Evan Shelhamer, Bruno Olshausen, Trevor Darrell

Given the variety of the visual world there is not one true scale for recognition: objects may appear at drastically different sizes across the visual field.

Semantic Segmentation

Paper
Add Code

Task-Aware Feature Generation for Zero-Shot Compositional Learning

1 code implementation • 11 Jun 2019 • Xin Wang, Fisher Yu, Trevor Darrell, Joseph E. Gonzalez

In this work, we propose a task-aware feature generation (TFG) framework for compositional learning, which generates features of novel visual concepts by transferring knowledge from previously seen concepts.

Novel Concepts Zero-Shot Learning

Paper
Code

Uncertainty-guided Continual Learning with Bayesian Neural Networks

2 code implementations • ICLR 2020 • Sayna Ebrahimi, Mohamed Elhoseiny, Trevor Darrell, Marcus Rohrbach

Continual learning aims to learn new tasks without forgetting previously learned ones.

Continual Learning

Paper
Code

Are You Looking? Grounding to Multiple Modalities in Vision-and-Language Navigation

no code implementations • ACL 2019 • Ronghang Hu, Daniel Fried, Anna Rohrbach, Dan Klein, Trevor Darrell, Kate Saenko

The actual grounding can connect language to the environment through multiple modalities, e. g. "stop at the door" might ground into visual objects, while "turn right" might rely only on the geometric structure of a route.

Vision and Language Navigation

Paper
Add Code

Monocular Plan View Networks for Autonomous Driving

no code implementations • 16 May 2019 • Dequan Wang, Coline Devin, Qi-Zhi Cai, Philipp Krähenbühl, Trevor Darrell

Convolutions on monocular dash cam videos capture spatial invariances in the image plane but do not explicitly reason about distances and depth.

3D Object Detection Autonomous Driving +1

Paper
Add Code

Language-Conditioned Graph Networks for Relational Reasoning

1 code implementation • ICCV 2019 • Ronghang Hu, Anna Rohrbach, Trevor Darrell, Kate Saenko

E. g., conditioning on the "on" relationship to the plate, the object "mug" gathers messages from the object "plate" to update its representation to "mug on the plate", which can be easily consumed by a simple classifier for answer prediction.

Ranked #3 on Referring Expression Comprehension on CLEVR-Ref+

Object Referring Expression Comprehension +2

Paper
Code

Accurate Visual Localization for Automotive Applications

1 code implementation • 1 May 2019 • Eli Brosh, Matan Friedmann, Ilan Kadar, Lev Yitzhak Lavy, Elad Levi, Shmuel Rippa, Yair Lempert, Bruno Fernandez-Ruiz, Roei Herzig, Trevor Darrell

We propose a hybrid coarse-to-fine approach that leverages visual and GPS location cues.

Retrieval Visual Localization

Paper
Code

Meta-Learning to Guide Segmentation

no code implementations • ICLR 2019 • Kate Rakelly*, Evan Shelhamer*, Trevor Darrell, Alexei A. Efros, Sergey Levine

To explore generalization, we analyze guidance as a bridge between different levels of supervision to segment classes as the union of instances.

Meta-Learning Segmentation

Paper
Add Code

Blurring the Line Between Structure and Learning to Optimize and Adapt Receptive Fields

no code implementations • 25 Apr 2019 • Evan Shelhamer, Dequan Wang, Trevor Darrell

Adapting receptive fields by dynamic Gaussian structure further improves results, equaling the accuracy of free-form deformation while improving efficiency.

Semantic Segmentation

Paper
Add Code

Semi-supervised Domain Adaptation via Minimax Entropy

3 code implementations • ICCV 2019 • Kuniaki Saito, Donghyun Kim, Stan Sclaroff, Trevor Darrell, Kate Saenko

Contemporary domain adaptation methods are very effective at aligning feature distributions of source and target domains without any target supervision.

Domain Adaptation Semi-supervised Domain Adaptation

288

Paper
Code

TAFE-Net: Task-Aware Feature Embeddings for Low Shot Learning

1 code implementation • CVPR 2019 • Xin Wang, Fisher Yu, Ruth Wang, Trevor Darrell, Joseph E. Gonzalez

We show that TAFE-Net is highly effective in generalizing to new tasks or concepts and evaluate the TAFE-Net on a range of benchmarks in zero-shot and few-shot learning.

Ranked #1 on Few-Shot Image Classification on aPY - 0-Shot

Attribute Few-Shot Learning +1

Paper
Code

Variational Adversarial Active Learning

6 code implementations • ICCV 2019 • Samarth Sinha, Sayna Ebrahimi, Trevor Darrell

Unlike conventional active learning algorithms, our approach is task agnostic, i. e., it does not depend on the performance of the task for which we are trying to acquire labeled data.

Active Learning Image Classification +1

217

Paper
Code

Compositional GAN (Extended Abstract): Learning Image-Conditional Binary Composition

no code implementations • ICLR Workshop DeepGenStruct 2019 • Samaneh Azadi, Deepak Pathak, Sayna Ebrahimi, Trevor Darrell

Generative Adversarial Networks (GANs) can produce images of surprising complexity and realism but are generally structured to sample from a single latent source ignoring the explicit spatial interaction between multiple entities that could be present in a scene.

Paper
Add Code

Cross-Linked Variational Autoencoders for Generalized Zero-Shot Learning

no code implementations • ICLR Workshop LLD 2019 • Edgar Schönfeld, Sayna Ebrahimi, Samarth Sinha, Trevor Darrell, Zeynep Akata

While following the same direction, we also take artificial feature generation one step further and propose a model where a shared latent space of image features and class embeddings is learned by aligned variational autoencoders, for the purpose of generating latent features to train a softmax classifier.

Few-Shot Learning Generalized Zero-Shot Learning

Paper
Add Code

Efficient Receptive Field Learning by Dynamic Gaussian Structure

no code implementations • ICLR Workshop LLD 2019 • Evan Shelhamer, Dequan Wang, Trevor Darrell

The visual world is vast and varied, but its variations divide into structured and unstructured factors.

Representation Learning

Paper
Add Code

Learning to Control Self-Assembling Morphologies: A Study of Generalization via Modularity

1 code implementation • NeurIPS 2019 • Deepak Pathak, Chris Lu, Trevor Darrell, Phillip Isola, Alexei A. Efros

We evaluate the performance of these dynamic and modular agents in simulated environments.

111

Paper
Code

Robust Change Captioning

1 code implementation • ICCV 2019 • Dong Huk Park, Trevor Darrell, Anna Rohrbach

We present a novel Dual Dynamic Attention Model (DUDA) to perform robust Change Captioning.

Natural Language Visual Grounding

Paper
Code

Similarity R-C3D for Few-shot Temporal Activity Detection

no code implementations • 25 Dec 2018 • Huijuan Xu, Bingyi Kang, Ximeng Sun, Jiashi Feng, Kate Saenko, Trevor Darrell

In this paper, we present a conceptually simple and general yet novel framework for few-shot temporal activity detection which detects the start and end time of the few-shot input activities in an untrimmed video.

Action Detection Activity Detection

Paper
Add Code

Hierarchical Discrete Distribution Decomposition for Match Density Estimation

2 code implementations • CVPR 2019 • Zhichao Yin, Trevor Darrell, Fisher Yu

Explicit representations of the global match distributions of pixel-wise correspondences between pairs of images are desirable for uncertainty estimation and downstream applications.

Ranked #13 on Optical Flow Estimation on KITTI 2015 (train)

Density Estimation Optical Flow Estimation +2

202

Paper
Code

Adversarial Inference for Multi-Sentence Video Description

1 code implementation • CVPR 2019 • Jae Sung Park, Marcus Rohrbach, Trevor Darrell, Anna Rohrbach

Among the main issues are the fluency and coherence of the generated descriptions, and their relevance to the video.

Image Captioning Sentence +1

Paper
Code

Generalized Zero- and Few-Shot Learning via Aligned Variational Autoencoders

2 code implementations • 5 Dec 2018 • Edgar Schönfeld, Sayna Ebrahimi, Samarth Sinha, Trevor Darrell, Zeynep Akata

Many approaches in generalized zero-shot learning rely on cross-modal mapping between the image feature space and the class embedding space.

Ranked #2 on Generalized Few-Shot Learning on AwA2

Few-Shot Learning Generalized Few-Shot Learning +1

281

Paper
Code

Few-shot Object Detection via Feature Reweighting

4 code implementations • ICCV 2019 • Bingyi Kang, Zhuang Liu, Xin Wang, Fisher Yu, Jiashi Feng, Trevor Darrell

The feature learner extracts meta features that are generalizable to detect novel object classes, using training data from base classes with sufficient samples.

Ranked #21 on Few-Shot Object Detection on MS-COCO (30-shot)

Few-Shot Learning Few-Shot Object Detection +3

520

Paper
Code

Spatio-Temporal Action Graph Networks

1 code implementation • 4 Dec 2018 • Roei Herzig, Elad Levi, Huijuan Xu, Hang Gao, Eli Brosh, Xiaolong Wang, Amir Globerson, Trevor Darrell

Events defined by the interaction of objects in a scene are often of critical importance; yet important events may have insufficient labeled examples to train a conventional deep model to generalize to future object appearance.

Activity Recognition Autonomous Driving +3

Paper
Code

SPLAT: Semantic Pixel-Level Adaptation Transforms for Detection

no code implementations • 3 Dec 2018 • Eric Tzeng, Kaylee Burns, Kate Saenko, Trevor Darrell

Without dense labels, as is the case when only detection labels are available in the source, transformations are learned using CycleGAN alignment.

Domain Adaptation Pseudo Label +1

Paper
Add Code

Disentangling Propagation and Generation for Video Prediction

1 code implementation • ICCV 2019 • Hang Gao, Huazhe Xu, Qi-Zhi Cai, Ruth Wang, Fisher Yu, Trevor Darrell

A dynamic scene has two types of elements: those that move fluidly and can be predicted from previous frames, and those which are disoccluded (exposed) and cannot be extrapolated.

Predict Future Video Frames

Paper
Code

Joint Monocular 3D Vehicle Detection and Tracking

1 code implementation • ICCV 2019 • Hou-Ning Hu, Qi-Zhi Cai, Dequan Wang, Ji Lin, Min Sun, Philipp Krähenbühl, Trevor Darrell, Fisher Yu

The framework can not only associate detections of vehicles in motion over time, but also estimate their complete 3D bounding box information from a sequence of 2D images captured on a moving platform.

Ranked #12 on Multiple Object Tracking on KITTI Tracking test

3D Object Detection 3D Pose Estimation +4

652

Paper
Code

Deep Object-Centric Policies for Autonomous Driving

no code implementations • 13 Nov 2018 • Dequan Wang, Coline Devin, Qi-Zhi Cai, Fisher Yu, Trevor Darrell

While learning visuomotor skills in an end-to-end manner is appealing, deep neural networks are often uninterpretable and fail in surprising ways.

Autonomous Driving Object

Paper
Add Code

Modular Architecture for StarCraft II with Deep Reinforcement Learning

no code implementations • 8 Nov 2018 • Dennis Lee, Haoran Tang, Jeffrey O. Zhang, Huazhe Xu, Trevor Darrell, Pieter Abbeel

We present a novel modular architecture for StarCraft II AI.

reinforcement-learning Reinforcement Learning (RL) +2

Paper
Add Code

Discriminator Rejection Sampling

1 code implementation • ICLR 2019 • Samaneh Azadi, Catherine Olsson, Trevor Darrell, Ian Goodfellow, Augustus Odena

We propose a rejection sampling scheme using the discriminator of a GAN to approximately correct errors in the GAN generator distribution.

Image Generation

Paper
Code

Rethinking the Value of Network Pruning

2 code implementations • ICLR 2019 • Zhuang Liu, Ming-Jie Sun, Tinghui Zhou, Gao Huang, Trevor Darrell

Our observations are consistent for multiple network architectures, datasets, and tasks, which imply that: 1) training a large, over-parameterized model is often not necessary to obtain an efficient final model, 2) learned "important" weights of the large model are typically not useful for the small pruned model, 3) the pruned architecture itself, rather than a set of inherited "important" weights, is more crucial to the efficiency in the final model, which suggests that in some cases pruning can be useful as an architecture search paradigm.

Network Pruning Neural Architecture Search

1,495

Paper
Code

Uncertainty-guided Lifelong Learning in Bayesian Networks

no code implementations • 27 Sep 2018 • Sayna Ebrahimi, Mohamed Elhoseiny, Trevor Darrell, Marcus Rohrbach

Sequentially learning of tasks arriving in a continuous stream is a complex problem and becomes more challenging when the model has a fixed capacity.

Continual Learning

Paper
Add Code

Object Hallucination in Image Captioning

1 code implementation • EMNLP 2018 • Anna Rohrbach, Lisa Anne Hendricks, Kaylee Burns, Trevor Darrell, Kate Saenko

Despite continuously improving performance, contemporary image captioning models are prone to "hallucinating" objects that are not actually in a scene.

Hallucination Image Captioning +2

Paper
Code

Localizing Moments in Video with Temporal Language

1 code implementation • EMNLP 2018 • Lisa Anne Hendricks, Oliver Wang, Eli Shechtman, Josef Sivic, Trevor Darrell, Bryan Russell

To benchmark whether our model, and other recent video localization models, can effectively reason about temporal language, we collect the novel TEMPOral reasoning in video and language (TEMPO) dataset.

Natural Language Queries Retrieval +1

Paper
Code

Large-Scale Study of Curiosity-Driven Learning

4 code implementations • ICLR 2019 • Yuri Burda, Harri Edwards, Deepak Pathak, Amos Storkey, Trevor Darrell, Alexei A. Efros

However, annotating each environment with hand-designed, dense rewards is not scalable, motivating the need for developing reward functions that are intrinsic to the agent.

Ranked #14 on Atari Games on Atari 2600 Montezuma's Revenge

Atari Games SNES Games

800

Paper
Code

Textual Explanations for Self-Driving Vehicles

2 code implementations • ECCV 2018 • Jinkyu Kim, Anna Rohrbach, Trevor Darrell, John Canny, Zeynep Akata

Finally, we explore a version of our model that generates rationalizations, and compare with introspective explanations on the same video segments.

Paper
Code

Grounding Visual Explanations

no code implementations • ECCV 2018 • Lisa Anne Hendricks, Ronghang Hu, Trevor Darrell, Zeynep Akata

Our model improves the textual explanation quality of fine-grained classification decisions on the CUB dataset by mentioning phrases that are grounded in the image.

General Classification Sentence

Paper
Add Code

Explainable Neural Computation via Stack Neural Module Networks

1 code implementation • ECCV 2018 • Ronghang Hu, Jacob Andreas, Trevor Darrell, Kate Saenko

In complex inferential tasks like question answering, machine learning models must confront two challenges: the need to implement a compositional reasoning process, and, in many applications, the need for this reasoning process to be interpretable to assist users in both development and prediction.

Ranked #14 on Referring Expression Comprehension on Talk2Car

Decision Making Question Answering +1

Paper
Code

Compositional GAN: Learning Image-Conditional Binary Composition

1 code implementation • 19 Jul 2018 • Samaneh Azadi, Deepak Pathak, Sayna Ebrahimi, Trevor Darrell

Generative Adversarial Networks (GANs) can produce images of remarkable complexity and realism but are generally structured to sample from a single latent source ignoring the explicit spatial interaction between multiple entities that could be present in a scene.

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.