1 code implementation • NAACL 2022 • Jae Sung Park, Sheng Shen, Ali Farhadi, Trevor Darrell, Yejin Choi, Anna Rohrbach
We test the robustness of recent methods on the proposed automatic contrast sets, and compare them to additionally collected human-generated counterparts, to assess their effectiveness.
1 code implementation • 25 May 2023 • Lisa Dunlap, Alyssa Umino, Han Zhang, Jiezhi Yang, Joseph E. Gonzalez, Trevor Darrell
We introduce ALIA (Automated Language-guided Image Augmentation), a method which utilizes large vision and language models to automatically generate natural language descriptions of a dataset's domains and augment the training data via language-guided image editing.
1 code implementation • 24 May 2023 • Baifeng Shi, Siyu Gai, Trevor Darrell, Xin Wang
We introduce Top-Down Attention Steering (TOAST), a novel transfer learning algorithm that keeps the pre-trained backbone frozen, while selecting the task-relevant elements in the output and feeding them back to the model to steer its attention to the task-specific features.
no code implementations • 24 May 2023 • Sheng Shen, Le Hou, Yanqi Zhou, Nan Du, Shayne Longpre, Jason Wei, Hyung Won Chung, Barret Zoph, William Fedus, Xinyun Chen, Tu Vu, Yuexin Wu, Wuyang Chen, Albert Webson, Yunxuan Li, Vincent Zhao, Hongkun Yu, Kurt Keutzer, Trevor Darrell, Denny Zhou
In this paper, we introduce Flan-MoE, a set of Instruction-Finetuned Sparse Mixture-of-Expert (MoE) models.
no code implementations • 23 May 2023 • Grace Luo, Lisa Dunlap, Dong Huk Park, Aleksander Holynski, Trevor Darrell
We propose Diffusion Hyperfeatures, a framework for consolidating multi-scale and multi-timestep feature maps into per-pixel feature descriptors that can be used for downstream tasks.
no code implementations • 23 May 2023 • Long Lian, Boyi Li, Adam Yala, Trevor Darrell
We validate the superiority of our design by demonstrating its ability to outperform the base diffusion model in accurately generating images according to prompts that necessitate both language and spatial reasoning.
no code implementations • 11 May 2023 • Suzanne Petryk, Spencer Whitehead, Joseph E. Gonzalez, Trevor Darrell, Anna Rohrbach, Marcus Rohrbach
The ability to judge whether a caption correctly describes an image is a critical part of vision-language understanding.
no code implementations • 10 May 2023 • Roei Herzig, Alon Mendelson, Leonid Karlinsky, Assaf Arbelle, Rogerio Feris, Trevor Darrell, Amir Globerson
Vision and Language (VL) models have demonstrated remarkable zero-shot performance in a variety of tasks.
1 code implementation • 30 Mar 2023 • Vidit Goel, Elia Peruzzo, Yifan Jiang, Dejia Xu, Nicu Sebe, Trevor Darrell, Zhangyang Wang, Humphrey Shi
Nevertheless, most of them lack fine-grained control over the properties of the different objects present in the image, i. e. object-level image editing.
1 code implementation • CVPR 2023 • Baifeng Shi, Trevor Darrell, Xin Wang
In this paper, we consider top-down attention from a classic Analysis-by-Synthesis (AbS) perspective of vision.
no code implementations • 23 Mar 2023 • Medhini Narasimhan, Licheng Yu, Sean Bell, Ning Zhang, Trevor Darrell
We introduce a new pre-trained video model, VideoTaskformer, focused on representing the semantics and structure of instructional videos.
no code implementations • 13 Mar 2023 • Sheng Shen, Zhewei Yao, Chunyuan Li, Trevor Darrell, Kurt Keutzer, Yuxiong He
The field of natural language processing (NLP) has made significant strides in recent years, particularly in the development of large-scale vision-language models (VLMs).
no code implementations • 6 Mar 2023 • Ilija Radosavovic, Tete Xiao, Bike Zhang, Trevor Darrell, Jitendra Malik, Koushil Sreenath
We present a sim-to-real learning-based approach for real-world humanoid locomotion.
1 code implementation • 2 Mar 2023 • Zhuang Liu, Zhiqiu Xu, Joseph Jin, Zhiqiang Shen, Trevor Darrell
Additionally, we explore a symmetric technique for regularizing overfitting models - late dropout, where dropout is not used in the early iterations and is only activated later in training.
no code implementations • 13 Feb 2023 • Yuqing Du, Olivia Watkins, Zihan Wang, Cédric Colas, Trevor Darrell, Pieter Abbeel, Abhishek Gupta, Jacob Andreas
Reinforcement learning algorithms typically struggle in the absence of a dense, well-shaped reward function.
no code implementations • CVPR 2023 • Jin Gao, Jialing Zhang, Xihui Liu, Trevor Darrell, Evan Shelhamer, Dequan Wang
We update the target data instead, and project all test inputs toward the source domain with a generative diffusion model.
no code implementations • 30 Dec 2022 • Colorado J. Reed, Ritwik Gupta, Shufan Li, Sarah Brockman, Christopher Funk, Brian Clipp, Kurt Keutzer, Salvatore Candido, Matt Uyttendaele, Trevor Darrell
Large, pretrained models are commonly finetuned with imagery that is heavily augmented to mimic different conditions and scales, with the resulting models used for various tasks with imagery from a range of spatial scales.
no code implementations • 20 Dec 2022 • Boyi Li, Rodolfo Corona, Karttikeya Mangalam, Catherine Chen, Daniel Flaherty, Serge Belongie, Kilian Q. Weinberger, Jitendra Malik, Trevor Darrell, Dan Klein
Are extralinguistic signals such as image pixels crucial for inducing constituency grammars?
no code implementations • 8 Dec 2022 • Roei Herzig, Ofir Abramovich, Elad Ben-Avraham, Assaf Arbelle, Leonid Karlinsky, Ariel Shamir, Trevor Darrell, Amir Globerson
We present a multi-task prompt learning approach for video transformers, where a shared video transformer backbone is enhanced by a small set of specialized parameters for each task.
no code implementations • 1 Dec 2022 • Dong Huk Park, Grace Luo, Clayton Toste, Samaneh Azadi, Xihui Liu, Maka Karalashvili, Anna Rohrbach, Trevor Darrell
When manipulating an object, existing text-to-image diffusion models often ignore the shape of the object and generate content that is incorrectly scaled, cut off, or replaced with background content.
1 code implementation • 28 Nov 2022 • Grace Luo, Giscard Biamby, Trevor Darrell, Daniel Fried, Anna Rohrbach
We propose the task of Geolocation via Guidebook Grounding that uses a dataset of StreetView images from a diverse set of locations and an associated textual guidebook for GeoGuessr, a popular interactive geolocation game.
1 code implementation • 21 Nov 2022 • Sheng Shen, Shijia Yang, Tianjun Zhang, Bohan Zhai, Joseph E. Gonzalez, Kurt Keutzer, Trevor Darrell
Specifically, (i) we demonstrate the effectiveness of learning a single transferable prompt from multiple source tasks to initialize the prompt for each target task; (ii) we show many target tasks can benefit each other from sharing prompt vectors and thus can be jointly learned via multitask prompt tuning.
1 code implementation • 18 Oct 2022 • Lisa Dunlap, Clara Mohri, Devin Guillory, Han Zhang, Trevor Darrell, Joseph E. Gonzalez, aditi raghunathan, Anja Rohrbach
It is expensive to collect training data for every possible domain that a vision model may encounter when deployed.
2 code implementations • 12 Oct 2022 • Tobias Fischer, Jiangmiao Pang, Thomas E. Huang, Linlu Qiu, Haofeng Chen, Trevor Darrell, Fisher Yu
In this paper, we present Quasi-Dense Similarity Learning, which densely samples hundreds of object regions on a pair of images for contrastive learning.
Ranked #4 on
Multiple Object Tracking
on BDD100K val
1 code implementation • 6 Oct 2022 • Ilija Radosavovic, Tete Xiao, Stephen James, Pieter Abbeel, Jitendra Malik, Trevor Darrell
Finally, we train a 307M parameter vision transformer on a massive collection of 4. 5M images from the Internet and egocentric videos, and demonstrate clearly the benefits of scaling visual pre-training for robot learning.
1 code implementation • 19 Sep 2022 • Fangyu Wu, Dequan Wang, Minjune Hwang, Chenhui Hao, Jiawei Lu, Jiamu Zhang, Christopher Chou, Trevor Darrell, Alexandre Bayen
Decentralized multiagent planning has been an important field of research in robotics.
no code implementations • 7 Sep 2022 • Kevin Miao, Akash Gokul, Raghav Singh, Suzanne Petryk, Joseph Gonzalez, Kurt Keutzer, Trevor Darrell, Colorado Reed
SPAN operates by regularizing attention masks from separate transformer heads to follow various priors over semantic regions.
no code implementations • 6 Sep 2022 • Vongani H. Maluleke, Neerja Thakkar, Tim Brooks, Ethan Weber, Trevor Darrell, Alexei A. Efros, Angjoo Kanazawa, Devin Guillory
In this work, we study how the performance and evaluation of generative image models are impacted by the racial composition of their training datasets.
1 code implementation • 1 Sep 2022 • Amir Bar, Yossi Gandelsman, Trevor Darrell, Amir Globerson, Alexei A. Efros
How does one adapt a pre-trained visual model to novel downstream tasks without task-specific finetuning or any model modification?
Ranked #5 on
Personalized Segmentation
on PerSeg
1 code implementation • 25 Aug 2022 • Akash Gokul, Konstantinos Kallidromitis, Shufan Li, Yusuke Kato, Kazuki Kozuka, Trevor Darrell, Colorado J Reed
Recent works in self-supervised learning have demonstrated strong performance on scene-level dense prediction tasks by pretraining with object-centric or region-based correspondence objectives.
no code implementations • 14 Aug 2022 • Medhini Narasimhan, Arsha Nagrani, Chen Sun, Michael Rubinstein, Trevor Darrell, Anna Rohrbach, Cordelia Schmid
In this work, we focus on summarizing instructional videos, an under-explored area of video summarization.
1 code implementation • 7 Jul 2022 • Jin Gao, Jialing Zhang, Xihui Liu, Trevor Darrell, Evan Shelhamer, Dequan Wang
We instead update the target data, by projecting all test inputs toward the source domain with a generative diffusion model.
no code implementations • NAACL 2022 • Zhekun Luo, Shalini Ghosh, Devin Guillory, Keizo Kato, Trevor Darrell, Huijuan Xu
In this paper, we aim to improve the generalization ability of the compositional action recognition model to novel verbs or novel nouns that are unseen during training time, by leveraging the power of knowledge graphs.
no code implementations • 15 Jun 2022 • Elad Ben-Avraham, Roei Herzig, Karttikeya Mangalam, Amir Bar, Anna Rohrbach, Leonid Karlinsky, Trevor Darrell, Amir Globerson
First, as both images and videos contain structured information, we enrich a transformer model with a set of \emph{object tokens} that can be used across images and videos.
no code implementations • 13 Jun 2022 • Elad Ben-Avraham, Roei Herzig, Karttikeya Mangalam, Amir Bar, Anna Rohrbach, Leonid Karlinsky, Trevor Darrell, Amir Globerson
We explore a particular instantiation of scene structure, namely a \emph{Hand-Object Graph}, consisting of hands and objects with their locations as nodes, and physical relations of contact/no-contact as edges.
2 code implementations • ACL 2022 • Rodolfo Corona, Shizhan Zhu, Dan Klein, Trevor Darrell
Natural language applied to natural 2D images describes a fundamentally 3D world.
1 code implementation • 28 Apr 2022 • Spencer Whitehead, Suzanne Petryk, Vedaad Shakib, Joseph Gonzalez, Trevor Darrell, Anna Rohrbach, Marcus Rohrbach
We first enable abstention capabilities for several VQA models, and analyze both their coverage, the portion of questions answered, and risk, the error on that portion.
1 code implementation • 23 Apr 2022 • Baifeng Shi, Yale Song, Neel Joshi, Trevor Darrell, Xin Wang
We present VARS, Visual Attention from Recurrent Sparse reconstruction, a new attention formulation built on two prominent features of the human visual attention mechanism: recurrency and sparsity.
1 code implementation • CVPR 2022 • Dian Chen, Dequan Wang, Trevor Darrell, Sayna Ebrahimi
We propose a novel way to leverage self-supervised contrastive learning to facilitate target feature learning, along with an online pseudo labeling scheme with refinement that significantly denoises pseudo labels.
2 code implementations • 20 Apr 2022 • Sheng Shen, Chunyuan Li, Xiaowei Hu, Jianwei Yang, Yujia Xie, Pengchuan Zhang, Zhe Gan, Lijuan Wang, Lu Yuan, Ce Liu, Kurt Keutzer, Trevor Darrell, Anna Rohrbach, Jianfeng Gao
We propose K-LITE, a simple strategy to leverage external knowledge for building transferable visual systems: In training, it enriches entities in text with WordNet and Wiktionary knowledge, leading to an efficient and scalable approach to learning image representations that uses knowledge about the visual concepts.
no code implementations • CVPR 2022 • Evonne Ng, Hanbyul Joo, Liwen Hu, Hao Li, Trevor Darrell, Angjoo Kanazawa, Shiry Ginosar
We present a framework for modeling interactional communication in dyadic conversations: given multimodal inputs of a speaker, we autoregressively output multiple possibilities of corresponding listener motion.
1 code implementation • ACL 2022 • Sanjay Subramanian, William Merrill, Trevor Darrell, Matt Gardner, Sameer Singh, Anna Rohrbach
Training a referring expression comprehension (ReC) model for a new visual domain requires collecting referring expressions, and potentially corresponding bounding boxes, for images in the domain.
1 code implementation • NeurIPS 2021 • Olivia Watkins, Trevor Darrell, Pieter Abbeel, Jacob Andreas, Abhishek Gupta
Training automated agents to complete complex tasks in interactive environments is challenging: reinforcement learning requires careful hand-engineering of reward functions, imitation learning requires specialized infrastructure and access to a human expert, and learning from intermediate forms of supervision (like binary preferences) is time-consuming and extracts little information from each human intervention.
1 code implementation • 11 Mar 2022 • Tete Xiao, Ilija Radosavovic, Trevor Darrell, Jitendra Malik
This paper shows that self-supervised visual pre-training from real-world images is effective for learning motor control tasks from pixels.
no code implementations • CVPR 2022 • Suzanne Petryk, Lisa Dunlap, Keyan Nasseri, Joseph Gonzalez, Trevor Darrell, Anna Rohrbach
To do this, we ground task-relevant words or phrases with attention maps from a pretrained large-scale model.
1 code implementation • 29 Jan 2022 • Julius Frost, Olivia Watkins, Eric Weiner, Pieter Abbeel, Trevor Darrell, Bryan Plummer, Kate Saenko
In order for humans to confidently decide where to employ RL agents for real-world tasks, a human developer must validate that the agent will perform well at test-time.
41 code implementations • CVPR 2022 • Zhuang Liu, Hanzi Mao, Chao-yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie
The "Roaring 20s" of visual recognition began with the introduction of Vision Transformers (ViTs), which quickly superseded ConvNets as the state-of-the-art image classification model.
Ranked #1 on
Classification
on InDL
1 code implementation • 21 Dec 2021 • Shruti Agarwal, Liwen Hu, Evonne Ng, Trevor Darrell, Hao Li, Anna Rohrbach
In today's era of digital misinformation, we are increasingly faced with new threats posed by video falsification techniques.
1 code implementation • NAACL 2022 • Giscard Biamby, Grace Luo, Trevor Darrell, Anna Rohrbach
Detecting out-of-context media, such as "mis-captioned" images on Twitter, is a relevant problem, especially in domains of high public significance.
no code implementations • 10 Dec 2021 • Xihui Liu, Dong Huk Park, Samaneh Azadi, Gong Zhang, Arman Chopikyan, Yuxiao Hu, Humphrey Shi, Anna Rohrbach, Trevor Darrell
We investigate fine-grained, continuous control of this model class, and introduce a novel unified framework for semantic diffusion guidance, which allows either language or image guidance, or both.
no code implementations • 3 Dec 2021 • Kuniaki Saito, Ping Hu, Trevor Darrell, Kate Saenko
LDET leads to significant improvements on many datasets in the open-world instance segmentation task, outperforming baselines on cross-category generalization on COCO, as well as cross-dataset evaluation on UVO and Cityscapes.
1 code implementation • 27 Oct 2021 • Xuanlin Li, Brandon Trabucco, Dong Huk Park, Michael Luo, Sheng Shen, Trevor Darrell, Yang Gao
Permutations then serve as target generation orders for training an insertion-based Transformer language model.
no code implementations • CVPR 2022 • Roei Herzig, Elad Ben-Avraham, Karttikeya Mangalam, Amir Bar, Gal Chechik, Anna Rohrbach, Trevor Darrell, Amir Globerson
In this work, we present Object-Region Video Transformers (ORViT), an \emph{object-centric} approach that extends video transformer layers with a block that directly incorporates object representations.
Ranked #3 on
Action Recognition
on Diving-48
no code implementations • ICLR 2022 • Shizhan Zhu, Sayna Ebrahimi, Angjoo Kanazawa, Trevor Darrell
Existing approaches for single object reconstruction impose supervision signals based on the loss of the signed distance value from all locations in a scene, posing difficulties when extending to real-world scenarios.
no code implementations • 29 Sep 2021 • Parsa Mahmoudieh, Sayna Ebrahimi, Deepak Pathak, Trevor Darrell
Reward signals in reinforcement learning can be expensive signals in many tasks and often require access to direct state.
no code implementations • 29 Sep 2021 • Devin Guillory, Kuniaki Saito, Eric Tzeng, Yannik Pitcan, Kate Saenko, Trevor Darrell
Optimal transport theory provides a useful tool to measure the differences between two distributions.
1 code implementation • 2 Sep 2021 • Dequan Wang, Shaoteng Liu, Sayna Ebrahimi, Evan Shelhamer, Trevor Darrell
Domain adaptation seeks to mitigate the shift between training on the \emph{source} domain and testing on the \emph{target} domain.
2 code implementations • ICCV 2021 • Kuniaki Saito, Donghyun Kim, Piotr Teterwak, Stan Sclaroff, Trevor Darrell, Kate Saenko
Unsupervised domain adaptation (UDA) methods can dramatically improve generalization on unlabeled target domains.
no code implementations • 20 Aug 2021 • Michael Laielli, Giscard Biamby, Dian Chen, Ritwik Gupta, Adam Loeffler, Phat Dat Nguyen, Ross Luo, Trevor Darrell, Sayna Ebrahimi
Active learning for object detection is conventionally achieved by applying techniques developed for classification in a way that aggregates individual detections into image-level selection criteria.
no code implementations • ICCV 2021 • Devin Guillory, Vaishaal Shankar, Sayna Ebrahimi, Trevor Darrell, Ludwig Schmidt
Our work connects techniques from domain adaptation and predictive uncertainty literature, and allows us to predict model accuracy on challenging unseen distributions without access to labeled data.
no code implementations • NeurIPS 2021 • Medhini Narasimhan, Anna Rohrbach, Trevor Darrell
A generic video summary is an abridged version of a video that conveys the whole story and features the most important scenes.
1 code implementation • NeurIPS 2021 • Tete Xiao, Mannat Singh, Eric Mintun, Trevor Darrell, Piotr Dollár, Ross Girshick
To test whether this atypical design choice causes an issue, we analyze the optimization behavior of ViT models with their original patchify stem versus a simple counterpart where we replace the ViT stem by a small number of stacked stride-two 3*3 convolutions.
1 code implementation • CVPR 2022 • Amir Bar, Xin Wang, Vadim Kantorov, Colorado J Reed, Roei Herzig, Gal Chechik, Anna Rohrbach, Trevor Darrell, Amir Globerson
Recent self-supervised pretraining methods for object detection largely focus on pretraining the backbone of the object detector, neglecting key parts of detection architecture.
Ranked #1 on
Few-Shot Object Detection
on MS-COCO (10-shot)
1 code implementation • 3 Jun 2021 • Huazhe Xu, Yuping Luo, Shaoxiong Wang, Trevor Darrell, Roberto Calandra
The virtuoso plays the piano with passion, poetry and extraordinary technical ability.
1 code implementation • 26 May 2021 • Mike Lambeta, Huazhe Xu, Jingwei Xu, Po-Wei Chou, Shaoxiong Wang, Trevor Darrell, Roberto Calandra
With the increased availability of rich tactile sensors, there is an equally proportional need for open-source and integrated software capable of efficiently and effectively processing raw touch measurements into high-level signals that can be used for control and decision-making.
2 code implementations • NeurIPS 2021 • Dequan Wang, An Ju, Evan Shelhamer, David Wagner, Trevor Darrell
Adversarial attacks optimize against models to defeat defenses.
1 code implementation • ICCV 2021 • Xin Wang, Thomas E. Huang, Benlin Liu, Fisher Yu, Xiaolong Wang, Joseph E. Gonzalez, Trevor Darrell
Building reliable object detectors that are robust to domain shifts, such as various changes in context, viewpoint, and object appearances, is critical for real-world applications.
1 code implementation • 15 Apr 2021 • Yuqing Du, Olivia Watkins, Trevor Darrell, Pieter Abbeel, Deepak Pathak
Policies trained in simulation often fail when transferred to the real world due to the `reality gap' where the simulator is unable to accurately capture the dynamics and visual properties of the real world.
1 code implementation • EMNLP 2021 • Grace Luo, Trevor Darrell, Anna Rohrbach
Online misinformation is a prevalent societal issue, with adversaries relying on tools ranging from cheap fakes to sophisticated deep fakes.
no code implementations • 6 Apr 2021 • Medhini Narasimhan, Shiry Ginosar, Andrew Owens, Alexei A. Efros, Trevor Darrell
We learn representations for video frames and frame-to-frame transition probabilities by fitting a video-specific model trained using contrastive learning.
1 code implementation • ICLR 2022 • Zhuang Liu, Zhiqiu Xu, Hung-Ju Wang, Trevor Darrell, Evan Shelhamer
A cascade of "exits" is attached to the model to make multiple predictions.
1 code implementation • CVPR 2021 • Xiangyu Yue, Zangwei Zheng, Shanghang Zhang, Yang Gao, Trevor Darrell, Kurt Keutzer, Alberto Sangiovanni Vincentelli
In this paper, we propose an end-to-end Prototypical Cross-domain Self-Supervised Learning (PCS) framework for Few-shot Unsupervised Domain Adaptation (FUDA).
Ranked #6 on
Semantic Segmentation
on DensePASS
1 code implementation • ICCV 2021 • Tete Xiao, Colorado J Reed, Xiaolong Wang, Kurt Keutzer, Trevor Darrell
We present Region Similarity Representation Learning (ReSim), a new approach to self-supervised representation learning for localization-based tasks such as object detection and segmentation.
1 code implementation • 23 Mar 2021 • Colorado J. Reed, Xiangyu Yue, Ani Nrusimha, Sayna Ebrahimi, Vivek Vijaykumar, Richard Mao, Bo Li, Shanghang Zhang, Devin Guillory, Sean Metzger, Kurt Keutzer, Trevor Darrell
Through experimentation on 16 diverse vision datasets, we show HPT converges up to 80x faster, improves accuracy across tasks, and improves the robustness of the self-supervised pretraining process to changes in the image augmentation policy or amount of pretraining data.
1 code implementation • 12 Mar 2021 • Hou-Ning Hu, Yung-Hsu Yang, Tobias Fischer, Trevor Darrell, Fisher Yu, Min Sun
Experiments on our proposed simulation data and real-world benchmarks, including KITTI, nuScenes, and Waymo datasets, show that our tracking framework offers robust object association and tracking on urban-driving scenarios.
Ranked #6 on
Multiple Object Tracking
on KITTI Tracking test
1 code implementation • 14 Jan 2021 • Jinkun Cao, Xin Wang, Trevor Darrell, Fisher Yu
To decide the action at each step, we seek the action sequence that can lead to safe future states based on the prediction module outputs by repeatedly sampling likely action sequences.
no code implementations • 1 Jan 2021 • Dong Huk Park, Trevor Darrell
To this end, reconstruction-based learning is often used in which the normality of an observation is expressed in how well it can be reconstructed.
1 code implementation • ICLR 2021 • Zhuang Liu, Xuanlin Li, Bingyi Kang, Trevor Darrell
In this work, we present the first comprehensive study of regularization techniques with multiple policy optimization algorithms on continuous control tasks.
1 code implementation • ICLR 2021 • Xuanlin Li, Brandon Trabucco, Dong Huk Park, Michael Luo, Sheng Shen, Trevor Darrell, Yang Gao
One strategy to recover this information is to decode both the content and location of tokens.
no code implementations • 1 Jan 2021 • Medhini Narasimhan, Shiry Ginosar, Andrew Owens, Alexei A Efros, Trevor Darrell
By randomly traversing edges with high transition probabilities, we generate diverse temporally smooth videos with novel sequences and transitions.
no code implementations • 1 Jan 2021 • Samaneh Azadi, Michael Tschannen, Eric Tzeng, Sylvain Gelly, Trevor Darrell, Mario Lucic
Coupling the high-fidelity generation capabilities of label-conditional image synthesis methods with the flexibility of unconditional generative models, we propose a semantic bottleneck GAN model for unconditional synthesis of complex scenes.
no code implementations • 18 Dec 2020 • Sayna Ebrahimi, William Gan, Dian Chen, Giscard Biamby, Kamyar Salahi, Michael Laielli, Shizhan Zhu, Trevor Darrell
Active learning aims to develop label-efficient algorithms by querying the most representative samples to be labeled by a human annotator.
no code implementations • ICCV 2021 • Baifeng Shi, Qi Dai, Judy Hoffman, Kate Saenko, Trevor Darrell, Huijuan Xu
We extensively benchmark against the baselines for SSAD and OSAD on our created data splits in THUMOS14 and ActivityNet1. 2, and demonstrate the effectiveness of the proposed UFA and IB methods.
no code implementations • NeurIPS 2020 • Chuan Wen, Jierui Lin, Trevor Darrell, Dinesh Jayaraman, Yang Gao
Imitation learning trains policies to map from input observations to the actions that an expert would choose.
no code implementations • NAACL 2021 • Rodolfo Corona, Daniel Fried, Coline Devin, Dan Klein, Trevor Darrell
In our approach, subgoal modules each carry out natural language instructions for a specific subgoal type.
no code implementations • NeurIPS 2020 • Baifeng Shi, Judy Hoffman, Kate Saenko, Trevor Darrell, Huijuan Xu
By adjusting the auxiliary task weights to minimize the divergence between the surrogate prior and the true prior of the main task, we obtain a more accurate prior estimation, achieving the goal of minimizing the required amount of training data for the main task and avoiding a costly grid search.
no code implementations • CVPR 2021 • Bo Li, Yezhen Wang, Shanghang Zhang, Dongsheng Li, Trevor Darrell, Kurt Keutzer, Han Zhao
First, we provide a finite sample bound for both classification and regression problems under Semi-DA.
1 code implementation • ICLR 2021 • Sayna Ebrahimi, Suzanne Petryk, Akash Gokul, William Gan, Joseph E. Gonzalez, Marcus Rohrbach, Trevor Darrell
The goal of continual learning (CL) is to learn a sequence of tasks without suffering from the phenomenon of catastrophic forgetting.
no code implementations • 28 Sep 2020 • Elad Levi, Tete Xiao, Xiaolong Wang, Trevor Darrell
We theoretically prove and empirically show that under reasonable noise assumptions, prevalent embedding losses in metric learning, e. g., triplet loss, tend to project all samples of a class with various modes onto a single point in the embedding space, resulting in a class collapse that usually renders the space ill-sorted for classification or retrieval.
1 code implementation • CVPR 2021 • Colorado J Reed, Sean Metzger, Aravind Srinivas, Trevor Darrell, Kurt Keutzer
A common practice in unsupervised representation learning is to use labeled data to evaluate the quality of the learned representations.
no code implementations • 7 Sep 2020 • Sicheng Zhao, Yezhen Wang, Bo Li, Bichen Wu, Yang Gao, Pengfei Xu, Trevor Darrell, Kurt Keutzer
They require prior knowledge of real-world statistics and ignore the pixel-level dropout noise gap and the spatial feature gap between different domains.
no code implementations • ECCV 2020 • Jingwei Xu, Huazhe Xu, Bingbing Ni, Xiaokang Yang, Xiaolong Wang, Trevor Darrell
Generating diverse and natural human motion is one of the long-standing goals for creating intelligent characters in the animated world.
1 code implementation • ECCV 2020 • Jae Sung Park, Trevor Darrell, Anna Rohrbach
This auxiliary task allows us to propose a two-stage approach to Identity-Aware Video Description.
no code implementations • ICLR 2021 • Tete Xiao, Xiaolong Wang, Alexei A. Efros, Trevor Darrell
Recent self-supervised contrastive methods have been able to produce impressive transferable visual representations by learning to be invariant to different data augmentations.
1 code implementation • CVPR 2021 • Evonne Ng, Shiry Ginosar, Trevor Darrell, Hanbyul Joo
We demonstrate the efficacy of our method on hand gesture synthesis from body motion input, and as a strong body prior for single-view image-based 3D hand pose estimation.
no code implementations • ECCV 2020 • Medhini Narasimhan, Erik Wijmans, Xinlei Chen, Trevor Darrell, Dhruv Batra, Devi Parikh, Amanpreet Singh
We also demonstrate that reducing the task of room navigation to point navigation improves the performance further.
1 code implementation • ICML 2020 • Jingwei Xu, Huazhe Xu, Bingbing Ni, Xiaokang Yang, Trevor Darrell
In video prediction tasks, one major challenge is to capture the multi-modal nature of future contents and dynamics.
1 code implementation • 27 Jun 2020 • Amir Bar, Roei Herzig, Xiaolong Wang, Anna Rohrbach, Gal Chechik, Trevor Darrell, Amir Globerson
Our generative model for this task (AG2Vid) disentangles motion and appearance features, and by incorporating a scheduling mechanism for actions facilitates a timely and coordinated video generation.
2 code implementations • ICLR 2021 • Dequan Wang, Evan Shelhamer, Shaoteng Liu, Bruno Olshausen, Trevor Darrell
A model must adapt itself to generalize to new and different data during testing.
2 code implementations • CVPR 2021 • Jiangmiao Pang, Linlu Qiu, Xia Li, Haofeng Chen, Qi Li, Trevor Darrell, Fisher Yu
Compared to methods with similar detectors, it boosts almost 10 points of MOTA and significantly decreases the number of ID switches on BDD100K and Waymo datasets.
Ranked #1 on
One-Shot Object Detection
on PASCAL VOC 2012 val
no code implementations • ICCV 2021 • Elad Levi, Tete Xiao, Xiaolong Wang, Trevor Darrell
We theoretically prove and empirically show that under reasonable noise assumptions, margin-based losses tend to project all samples of a class with various modes onto a single point in the embedding space, resulting in a class collapse that usually renders the space ill-sorted for classification or retrieval.
no code implementations • 21 Apr 2020 • Xu Shen, Ivo Batkovic, Vijay Govindarajan, Paolo Falcone, Trevor Darrell, Francesco Borrelli
We investigate the problem of predicting driver behavior in parking lots, an environment which is less structured than typical road networks and features complex, interactive maneuvers in a compact space.
no code implementations • 14 Apr 2020 • Viktoriia Sharmanska, Lisa Anne Hendricks, Trevor Darrell, Novi Quadrianto
Computer vision algorithms, e. g. for face recognition, favour groups of individuals that are better represented in the training data.
no code implementations • 1 Apr 2020 • Huijuan Xu, Lizhi Yang, Stan Sclaroff, Kate Saenko, Trevor Darrell
Spatio-temporal action detection in videos requires localizing the action both spatially and temporally in the form of an "action tube".
no code implementations • 31 Mar 2020 • Huijuan Xu, Ximeng Sun, Eric Tzeng, Abir Das, Kate Saenko, Trevor Darrell
In this paper, we present a conceptually simple and general yet novel framework for few-shot temporal activity detection based on proposal regression which detects the start and end time of the activities in untrimmed videos.
1 code implementation • ECCV 2020 • Zhekun Luo, Devin Guillory, Baifeng Shi, Wei Ke, Fang Wan, Trevor Darrell, Huijuan Xu
Weakly-supervised action localization requires training a model to localize the action segments in the video given only video level action label.
Ranked #9 on
Weakly Supervised Action Localization
on THUMOS’14
1 code implementation • ECCV 2020 • Sayna Ebrahimi, Franziska Meier, Roberto Calandra, Trevor Darrell, Marcus Rohrbach
We show that shared features are significantly less prone to forgetting and propose a novel hybrid continual learning framework that learns a disjoint representation for task-invariant and task-specific features required to solve a sequence of tasks.
5 code implementations • ICML 2020 • Xin Wang, Thomas E. Huang, Trevor Darrell, Joseph E. Gonzalez, Fisher Yu
Such a simple approach outperforms the meta-learning methods by roughly 2~20 points on current benchmarks and sometimes even doubles the accuracy of the prior methods.
Ranked #14 on
Few-Shot Object Detection
on MS-COCO (30-shot)
2 code implementations • 11 Mar 2020 • Zhiqiang Shen, Zechun Liu, Zhuang Liu, Marios Savvides, Trevor Darrell, Eric Xing
This drawback hinders the model from learning subtle variance and fine-grained information.
5 code implementations • ICCV 2021 • Yinbo Chen, Zhuang Liu, Huijuan Xu, Trevor Darrell, Xiaolong Wang
The edge between these two lines of works has yet been underexplored, and the effectiveness of meta-learning in few-shot learning remains unclear.
1 code implementation • 23 Dec 2019 • Richard Li, Allan Jabri, Trevor Darrell, Pulkit Agrawal
Learning robotic manipulation tasks using reinforcement learning with sparse rewards is currently impractical due to the outrageous data requirements.
1 code implementation • CVPR 2020 • Joanna Materzynska, Tete Xiao, Roei Herzig, Huijuan Xu, Xiaolong Wang, Trevor Darrell
Human action is naturally compositional: humans can easily recognize and perform actions with objects that are different from those used in training demonstrations.
2 code implementations • ECCV 2020 • Roei Herzig, Amir Bar, Huijuan Xu, Gal Chechik, Trevor Darrell, Amir Globerson
Generating realistic images of complex visual scenes becomes challenging when one wishes to control the structure of the generated images.
Ranked #3 on
Layout-to-Image Generation
on Visual Genome 256x256
1 code implementation • NeurIPS 2019 • Coline Devin, Daniel Geng, Pieter Abbeel, Trevor Darrell, Sergey Levine
We show that CPVs can be learned within a one-shot imitation learning framework without any additional supervision or information about task hierarchy, and enable a demonstration-conditioned policy to generalize to tasks that sequence twice as many skills as the tasks seen during training.
2 code implementations • 26 Nov 2019 • Samaneh Azadi, Michael Tschannen, Eric Tzeng, Sylvain Gelly, Trevor Darrell, Mario Lucic
For the former, we use an unconditional progressive segmentation generation network that captures the distribution of realistic semantic scene layouts.
Ranked #1 on
Image Generation
on Cityscapes-5K 256x512
1 code implementation • CVPR 2020 • Ronghang Hu, Amanpreet Singh, Trevor Darrell, Marcus Rohrbach
Recent work has explored the TextVQA task that requires reading and understanding text in images to answer a question.
no code implementations • 30 Oct 2019 • Coline Devin, Daniel Geng, Pieter Abbeel, Trevor Darrell, Sergey Levine
We show that CPVs can be learned within a one-shot imitation learning framework without any additional supervision or information about task hierarchy, and enable a demonstration-conditioned policy to generalize to tasks that sequence twice as many skills as the tasks seen during training.
2 code implementations • 21 Oct 2019 • Zhuang Liu, Xuanlin Li, Bingyi Kang, Trevor Darrell
In this work, we present the first comprehensive study of regularization techniques with multiple policy optimization algorithms on continuous control tasks.
1 code implementation • 21 Oct 2019 • Zhuang Liu, Hung-Ju Wang, Tinghui Zhou, Zhiqiang Shen, Bingyi Kang, Evan Shelhamer, Trevor Darrell
Interestingly, the processing model's ability to enhance recognition quality can transfer when evaluated on models of different architectures, recognized categories, tasks and training datasets.
1 code implementation • 17 Oct 2019 • Huazhe Xu, Boyuan Chen, Yang Gao, Trevor Darrell
The agent is first presented with previous experiences in the training environment, along with task description in the form of trajectory-level sparse rewards.
3 code implementations • 26 Sep 2019 • Yu Sun, Eric Tzeng, Trevor Darrell, Alexei A. Efros
This paper addresses unsupervised domain adaptation, the setting where labeled training data is available on a source domain, but the goal is to have good performance on a target domain with only unlabeled data.
no code implementations • 25 Sep 2019 • Evan Shelhamer, Dequan Wang, Trevor Darrell
Adapting receptive fields by dynamic Gaussian structure further improves results, equaling the accuracy of free-form deformation while improving efficiency.
no code implementations • 25 Sep 2019 • Jingwei Xu, Huazhe Xu, Bingbing Ni, Xiaokang Yang, Trevor Darrell
Learning diverse and natural behaviors is one of the longstanding goal for creating intelligent characters in the animated world.
no code implementations • 25 Sep 2019 • Parsa Mahmoudieh, Trevor Darrell, Deepak Pathak
Instead of direct manual supervision which is tedious and prone to bias, in this work, our goal is to extract reusable skills from a collection of human demonstrations collected directly for several end-tasks.
no code implementations • 25 Sep 2019 • Huazhe Xu, Boyuan Chen, Yang Gao, Trevor Darrell
In this paper, we propose Scoring-Aggregating-Planning (SAP), a framework that can learn task-agnostic semantics and dynamics priors from arbitrary quality interactions as well as the corresponding sparse rewards and then plan on unseen tasks in zero-shot condition.
no code implementations • 8 Aug 2019 • Dequan Wang, Evan Shelhamer, Bruno Olshausen, Trevor Darrell
Given the variety of the visual world there is not one true scale for recognition: objects may appear at drastically different sizes across the visual field.
1 code implementation • 11 Jun 2019 • Xin Wang, Fisher Yu, Trevor Darrell, Joseph E. Gonzalez
In this work, we propose a task-aware feature generation (TFG) framework for compositional learning, which generates features of novel visual concepts by transferring knowledge from previously seen concepts.
2 code implementations • ICLR 2020 • Sayna Ebrahimi, Mohamed Elhoseiny, Trevor Darrell, Marcus Rohrbach
Continual learning aims to learn new tasks without forgetting previously learned ones.
no code implementations • ACL 2019 • Ronghang Hu, Daniel Fried, Anna Rohrbach, Dan Klein, Trevor Darrell, Kate Saenko
The actual grounding can connect language to the environment through multiple modalities, e. g. "stop at the door" might ground into visual objects, while "turn right" might rely only on the geometric structure of a route.
no code implementations • 16 May 2019 • Dequan Wang, Coline Devin, Qi-Zhi Cai, Philipp Krähenbühl, Trevor Darrell
Convolutions on monocular dash cam videos capture spatial invariances in the image plane but do not explicitly reason about distances and depth.
1 code implementation • ICCV 2019 • Ronghang Hu, Anna Rohrbach, Trevor Darrell, Kate Saenko
E. g., conditioning on the "on" relationship to the plate, the object "mug" gathers messages from the object "plate" to update its representation to "mug on the plate", which can be easily consumed by a simple classifier for answer prediction.
Ranked #3 on
Referring Expression Comprehension
on CLEVR-Ref+
1 code implementation • 1 May 2019 • Eli Brosh, Matan Friedmann, Ilan Kadar, Lev Yitzhak Lavy, Elad Levi, Shmuel Rippa, Yair Lempert, Bruno Fernandez-Ruiz, Roei Herzig, Trevor Darrell
We propose a hybrid coarse-to-fine approach that leverages visual and GPS location cues.
no code implementations • ICLR 2019 • Kate Rakelly*, Evan Shelhamer*, Trevor Darrell, Alexei A. Efros, Sergey Levine
To explore generalization, we analyze guidance as a bridge between different levels of supervision to segment classes as the union of instances.
no code implementations • 25 Apr 2019 • Evan Shelhamer, Dequan Wang, Trevor Darrell
Adapting receptive fields by dynamic Gaussian structure further improves results, equaling the accuracy of free-form deformation while improving efficiency.
3 code implementations • ICCV 2019 • Kuniaki Saito, Donghyun Kim, Stan Sclaroff, Trevor Darrell, Kate Saenko
Contemporary domain adaptation methods are very effective at aligning feature distributions of source and target domains without any target supervision.
1 code implementation • CVPR 2019 • Xin Wang, Fisher Yu, Ruth Wang, Trevor Darrell, Joseph E. Gonzalez
We show that TAFE-Net is highly effective in generalizing to new tasks or concepts and evaluate the TAFE-Net on a range of benchmarks in zero-shot and few-shot learning.
6 code implementations • ICCV 2019 • Samarth Sinha, Sayna Ebrahimi, Trevor Darrell
Unlike conventional active learning algorithms, our approach is task agnostic, i. e., it does not depend on the performance of the task for which we are trying to acquire labeled data.
no code implementations • ICLR Workshop DeepGenStruct 2019 • Samaneh Azadi, Deepak Pathak, Sayna Ebrahimi, Trevor Darrell
Generative Adversarial Networks (GANs) can produce images of surprising complexity and realism but are generally structured to sample from a single latent source ignoring the explicit spatial interaction between multiple entities that could be present in a scene.
no code implementations • ICLR Workshop LLD 2019 • Edgar Schönfeld, Sayna Ebrahimi, Samarth Sinha, Trevor Darrell, Zeynep Akata
While following the same direction, we also take artificial feature generation one step further and propose a model where a shared latent space of image features and class embeddings is learned by aligned variational autoencoders, for the purpose of generating latent features to train a softmax classifier.
no code implementations • ICLR Workshop LLD 2019 • Evan Shelhamer, Dequan Wang, Trevor Darrell
The visual world is vast and varied, but its variations divide into structured and unstructured factors.
1 code implementation • NeurIPS 2019 • Deepak Pathak, Chris Lu, Trevor Darrell, Phillip Isola, Alexei A. Efros
We evaluate the performance of these dynamic and modular agents in simulated environments.
1 code implementation • ICCV 2019 • Dong Huk Park, Trevor Darrell, Anna Rohrbach
We present a novel Dual Dynamic Attention Model (DUDA) to perform robust Change Captioning.
no code implementations • 25 Dec 2018 • Huijuan Xu, Bingyi Kang, Ximeng Sun, Jiashi Feng, Kate Saenko, Trevor Darrell
In this paper, we present a conceptually simple and general yet novel framework for few-shot temporal activity detection which detects the start and end time of the few-shot input activities in an untrimmed video.
2 code implementations • CVPR 2019 • Zhichao Yin, Trevor Darrell, Fisher Yu
Explicit representations of the global match distributions of pixel-wise correspondences between pairs of images are desirable for uncertainty estimation and downstream applications.
Ranked #10 on
Optical Flow Estimation
on KITTI 2015 (train)
1 code implementation • CVPR 2019 • Jae Sung Park, Marcus Rohrbach, Trevor Darrell, Anna Rohrbach
Among the main issues are the fluency and coherence of the generated descriptions, and their relevance to the video.
2 code implementations • 5 Dec 2018 • Edgar Schönfeld, Sayna Ebrahimi, Samarth Sinha, Trevor Darrell, Zeynep Akata
Many approaches in generalized zero-shot learning rely on cross-modal mapping between the image feature space and the class embedding space.
4 code implementations • ICCV 2019 • Bingyi Kang, Zhuang Liu, Xin Wang, Fisher Yu, Jiashi Feng, Trevor Darrell
The feature learner extracts meta features that are generalizable to detect novel object classes, using training data from base classes with sufficient samples.
Ranked #18 on
Few-Shot Object Detection
on MS-COCO (30-shot)
1 code implementation • 4 Dec 2018 • Roei Herzig, Elad Levi, Huijuan Xu, Hang Gao, Eli Brosh, Xiaolong Wang, Amir Globerson, Trevor Darrell
Events defined by the interaction of objects in a scene are often of critical importance; yet important events may have insufficient labeled examples to train a conventional deep model to generalize to future object appearance.
no code implementations • 3 Dec 2018 • Eric Tzeng, Kaylee Burns, Kate Saenko, Trevor Darrell
Without dense labels, as is the case when only detection labels are available in the source, transformations are learned using CycleGAN alignment.
1 code implementation • ICCV 2019 • Hang Gao, Huazhe Xu, Qi-Zhi Cai, Ruth Wang, Fisher Yu, Trevor Darrell
A dynamic scene has two types of elements: those that move fluidly and can be predicted from previous frames, and those which are disoccluded (exposed) and cannot be extrapolated.
1 code implementation • ICCV 2019 • Hou-Ning Hu, Qi-Zhi Cai, Dequan Wang, Ji Lin, Min Sun, Philipp Krähenbühl, Trevor Darrell, Fisher Yu
The framework can not only associate detections of vehicles in motion over time, but also estimate their complete 3D bounding box information from a sequence of 2D images captured on a moving platform.
Ranked #11 on
Multiple Object Tracking
on KITTI Tracking test
no code implementations • 13 Nov 2018 • Dequan Wang, Coline Devin, Qi-Zhi Cai, Fisher Yu, Trevor Darrell
While learning visuomotor skills in an end-to-end manner is appealing, deep neural networks are often uninterpretable and fail in surprising ways.
no code implementations • 8 Nov 2018 • Dennis Lee, Haoran Tang, Jeffrey O. Zhang, Huazhe Xu, Trevor Darrell, Pieter Abbeel
We present a novel modular architecture for StarCraft II AI.
1 code implementation • ICLR 2019 • Samaneh Azadi, Catherine Olsson, Trevor Darrell, Ian Goodfellow, Augustus Odena
We propose a rejection sampling scheme using the discriminator of a GAN to approximately correct errors in the GAN generator distribution.
2 code implementations • ICLR 2019 • Zhuang Liu, Ming-Jie Sun, Tinghui Zhou, Gao Huang, Trevor Darrell
Our observations are consistent for multiple network architectures, datasets, and tasks, which imply that: 1) training a large, over-parameterized model is often not necessary to obtain an efficient final model, 2) learned "important" weights of the large model are typically not useful for the small pruned model, 3) the pruned architecture itself, rather than a set of inherited "important" weights, is more crucial to the efficiency in the final model, which suggests that in some cases pruning can be useful as an architecture search paradigm.
no code implementations • 27 Sep 2018 • Sayna Ebrahimi, Mohamed Elhoseiny, Trevor Darrell, Marcus Rohrbach
Sequentially learning of tasks arriving in a continuous stream is a complex problem and becomes more challenging when the model has a fixed capacity.
1 code implementation • EMNLP 2018 • Anna Rohrbach, Lisa Anne Hendricks, Kaylee Burns, Trevor Darrell, Kate Saenko
Despite continuously improving performance, contemporary image captioning models are prone to "hallucinating" objects that are not actually in a scene.
1 code implementation • EMNLP 2018 • Lisa Anne Hendricks, Oliver Wang, Eli Shechtman, Josef Sivic, Trevor Darrell, Bryan Russell
To benchmark whether our model, and other recent video localization models, can effectively reason about temporal language, we collect the novel TEMPOral reasoning in video and language (TEMPO) dataset.
4 code implementations • ICLR 2019 • Yuri Burda, Harri Edwards, Deepak Pathak, Amos Storkey, Trevor Darrell, Alexei A. Efros
However, annotating each environment with hand-designed, dense rewards is not scalable, motivating the need for developing reward functions that are intrinsic to the agent.
Ranked #11 on
Atari Games
on Atari 2600 Montezuma's Revenge
2 code implementations • ECCV 2018 • Jinkyu Kim, Anna Rohrbach, Trevor Darrell, John Canny, Zeynep Akata
Finally, we explore a version of our model that generates rationalizations, and compare with introspective explanations on the same video segments.
no code implementations • ECCV 2018 • Lisa Anne Hendricks, Ronghang Hu, Trevor Darrell, Zeynep Akata
Our model improves the textual explanation quality of fine-grained classification decisions on the CUB dataset by mentioning phrases that are grounded in the image.
1 code implementation • ECCV 2018 • Ronghang Hu, Jacob Andreas, Trevor Darrell, Kate Saenko
In complex inferential tasks like question answering, machine learning models must confront two challenges: the need to implement a compositional reasoning process, and, in many applications, the need for this reasoning process to be interpretable to assist users in both development and prediction.
Ranked #13 on
Referring Expression Comprehension
on Talk2Car