Search Results for author: Kate Saenko

Found 174 papers, 90 papers with code

COLA: How to adapt vision-language models to Compose Objects Localized with Attributes?

1 code implementation5 May 2023 Arijit Ray, Filip Radenovic, Abhimanyu Dubey, Bryan A. Plummer, Ranjay Krishna, Kate Saenko

Compositional reasoning is a hallmark of human visual intelligence; yet despite the size of large vision-language models, they struggle to represent simple compositions by combining objects with their attributes.

CoLA Image Retrieval +1

ERM++: An Improved Baseline for Domain Generalization

no code implementations4 Apr 2023 Piotr Teterwak, Kuniaki Saito, Theodoros Tsiligkaridis, Kate Saenko, Bryan A. Plummer

We call the resulting method ERM++, and show it significantly improves the performance of DG on five multi-source datasets by over 5% compared to standard ERM, and beats state-of-the-art despite being less computationally expensive.

Domain Generalization

DIME-FM: DIstilling Multimodal and Efficient Foundation Models

no code implementations31 Mar 2023 Ximeng Sun, Pengchuan Zhang, Peizhao Zhang, Hardik Shah, Kate Saenko, Xide Xia

We transfer the knowledge from the pre-trained CLIP-ViTL/14 model to a ViT-B/32 model, with only 40M public images and 28. 4M unpaired public sentences.

Image Classification

Language-Guided Audio-Visual Source Separation via Trimodal Consistency

no code implementations CVPR 2023 Reuben Tan, Arijit Ray, Andrea Burns, Bryan A. Plummer, Justin Salamon, Oriol Nieto, Bryan Russell, Kate Saenko

We propose a self-supervised approach for learning to perform audio source separation in videos based on natural language queries, using only unlabeled video and audio pairs as training data.

Audio Source Separation Natural Language Queries

Mind the Backbone: Minimizing Backbone Distortion for Robust Object Detection

1 code implementation26 Mar 2023 Kuniaki Saito, Donghyun Kim, Piotr Teterwak, Rogerio Feris, Kate Saenko

We propose to use Relative Gradient Norm (RGN) as a way to measure the vulnerability of a backbone to feature distortion, and show that high RGN is indeed correlated with lower OOD performance.

object-detection Robust Object Detection

MaskSketch: Unpaired Structure-guided Masked Image Generation

2 code implementations CVPR 2023 Dina Bashkirova, Jose Lezama, Kihyuk Sohn, Kate Saenko, Irfan Essa

We show that intermediate self-attention maps of a masked generative transformer encode important structural information of the input image, such as scene layout and object shape, and we propose a novel sampling method based on this observation to enable structure-guided generation.

Conditional Image Generation Image-to-Image Translation +2

Pic2Word: Mapping Pictures to Words for Zero-shot Composed Image Retrieval

1 code implementation CVPR 2023 Kuniaki Saito, Kihyuk Sohn, Xiang Zhang, Chun-Liang Li, Chen-Yu Lee, Kate Saenko, Tomas Pfister

Existing methods rely on supervised learning of CIR models using labeled triplets consisting of the query image, text specification, and the target image.

Composed Image Retrieval (CoIR) Retrieval

The SwaNNFlight System: On-the-Fly Sim-to-Real Adaptation via Anchored Learning

1 code implementation17 Jan 2023 Bassel El Mabsout, Shahin Roozkhosh, Siddharth Mysore, Kate Saenko, Renato Mancuso

And (iii) anchor critics to help stabilize the fine-tuning of agents during sim-to-real transfer, online learning from real data while retaining behavior optimized in simulation.

Reinforcement Learning (RL)

Finding Differences Between Transformers and ConvNets Using Counterfactual Simulation Testing

no code implementations29 Nov 2022 Nataniel Ruiz, Sarah Adel Bargal, Cihang Xie, Kate Saenko, Stan Sclaroff

One shortcoming of this is the fact that these deep neural networks cannot be easily evaluated for robustness issues with respect to specific scene variations.

Exploring Consistency in Cross-Domain Transformer for Domain Adaptive Semantic Segmentation

no code implementations27 Nov 2022 Kaihong Wang, Donghyun Kim, Rogerio Feris, Kate Saenko, Margrit Betke

We propose to perform adaptation on attention maps with cross-domain attention layers that share features between the source and the target domains.

Semantic Segmentation Unsupervised Domain Adaptation

Human Evaluation of Text-to-Image Models on a Multi-Task Benchmark

no code implementations22 Nov 2022 Vitali Petsiuk, Alexander E. Siemenn, Saisamrit Surbehera, Zad Chin, Keith Tyser, Gregory Hunter, Arvind Raghavan, Yann Hicke, Bryan A. Plummer, Ori Kerret, Tonio Buonassisi, Kate Saenko, Armando Solar-Lezama, Iddo Drori

For example, asking a model to generate a varying number of the same object to measure its ability to count or providing a text prompt with several objects that each have a different attribute to identify its ability to match objects and attributes correctly.

Text-to-Image Generation

Bias Mimicking: A Simple Sampling Approach for Bias Mitigation

1 code implementation CVPR 2023 Maan Qraitem, Kate Saenko, Bryan A. Plummer

Using this notion, BM, through a novel training procedure, ensures that the model is exposed to the entire distribution per epoch without repeating samples.

FETA: Towards Specializing Foundation Models for Expert Task Applications

1 code implementation8 Sep 2022 Amit Alfassy, Assaf Arbelle, Oshri Halimi, Sivan Harary, Roei Herzig, Eli Schwartz, Rameswar Panda, Michele Dolfi, Christoph Auer, Kate Saenko, PeterW. J. Staar, Rogerio Feris, Leonid Karlinsky

However, as we show in this paper, FMs still have poor out-of-the-box performance on expert tasks (e. g. retrieval of car manuals technical illustrations from language queries), data for which is either unseen or belonging to a long-tail part of the data distribution of the huge datasets used for FM pre-training.

Domain Generalization Image Retrieval +6

NewsStories: Illustrating articles with visual summaries

1 code implementation26 Jul 2022 Reuben Tan, Bryan A. Plummer, Kate Saenko, JP Lewis, Avneesh Sud, Thomas Leung

Thus, we explore a novel setting where the goal is to learn a self-supervised visual-language representation that is robust to varying text length and the number of images.


DualCoOp: Fast Adaptation to Multi-Label Recognition with Limited Annotations

1 code implementation20 Jun 2022 Ximeng Sun, Ping Hu, Kate Saenko

Solving multi-label recognition (MLR) for images in the low-label regime is a challenging task with many real-world applications.

Prefix Conditioning Unifies Language and Label Supervision

no code implementations CVPR 2023 Kuniaki Saito, Kihyuk Sohn, Xiang Zhang, Chun-Liang Li, Chen-Yu Lee, Kate Saenko, Tomas Pfister

In experiments, we show that this simple technique improves the performance in zero-shot image recognition accuracy and robustness to the image-level distribution shift.

Classification Contrastive Learning +2

Temporal Relevance Analysis for Video Action Models

no code implementations25 Apr 2022 Quanfu Fan, Donghyun Kim, Chun-Fu, Chen, Stan Sclaroff, Kate Saenko, Sarah Adel Bargal

In this paper, we provide a deep analysis of temporal modeling for action recognition, an important but underexplored problem in the literature.

Action Recognition

Many-to-many Splatting for Efficient Video Frame Interpolation

1 code implementation CVPR 2022 Ping Hu, Simon Niklaus, Stan Sclaroff, Kate Saenko

Motion-based video frame interpolation commonly relies on optical flow to warp pixels from the inputs to the desired interpolation instant.

Motion Estimation Optical Flow Estimation +1

A Unified Framework for Domain Adaptive Pose Estimation

1 code implementation1 Apr 2022 Donghyun Kim, Kaihong Wang, Kate Saenko, Margrit Betke, Stan Sclaroff

In this paper, we investigate the problem of domain adaptive 2D pose estimation that transfers knowledge learned on a synthetic source domain to a target domain without supervision.

2D Pose Estimation Animal Pose Estimation +2

A Broad Study of Pre-training for Domain Generalization and Adaptation

1 code implementation22 Mar 2022 Donghyun Kim, Kaihong Wang, Stan Sclaroff, Kate Saenko

In this paper, we provide a broad study and in-depth analysis of pre-training for domain adaptation and generalization, namely: network architectures, size, pre-training loss, and datasets.

Domain Generalization

The Abduction of Sherlock Holmes: A Dataset for Visual Abductive Reasoning

no code implementations10 Feb 2022 Jack Hessel, Jena D. Hwang, Jae Sung Park, Rowan Zellers, Chandra Bhagavatula, Anna Rohrbach, Kate Saenko, Yejin Choi

We present Sherlock, an annotated corpus of 103K images for testing machine capacity for abductive reasoning beyond literal image contents.

Visual Abductive Reasoning Visual Reasoning

A Dataset for Interactive Vision-Language Navigation with Unknown Command Feasibility

1 code implementation4 Feb 2022 Andrea Burns, Deniz Arsan, Sanjna Agrawal, Ranjitha Kumar, Kate Saenko, Bryan A. Plummer

To study VLN with unknown command feasibility, we introduce a new dataset Mobile app Tasks with Iterative Feedback (MoTIF), where the goal is to complete a natural language command in a mobile app.

Common Sense Reasoning Question Answering +1

Explaining Reinforcement Learning Policies through Counterfactual Trajectories

1 code implementation29 Jan 2022 Julius Frost, Olivia Watkins, Eric Weiner, Pieter Abbeel, Trevor Darrell, Bryan Plummer, Kate Saenko

In order for humans to confidently decide where to employ RL agents for real-world tasks, a human developer must validate that the agent will perform well at test-time.

Decision Making reinforcement-learning +1

Extending the WILDS Benchmark for Unsupervised Adaptation

1 code implementation ICLR 2022 Shiori Sagawa, Pang Wei Koh, Tony Lee, Irena Gao, Sang Michael Xie, Kendrick Shen, Ananya Kumar, Weihua Hu, Michihiro Yasunaga, Henrik Marklund, Sara Beery, Etienne David, Ian Stavness, Wei Guo, Jure Leskovec, Kate Saenko, Tatsunori Hashimoto, Sergey Levine, Chelsea Finn, Percy Liang

Unlabeled data can be a powerful point of leverage for mitigating these distribution shifts, as it is frequently much more available than labeled data and can often be obtained from distributions beyond the source distribution as well.

Unsupervised Domain Generalization by Learning a Bridge Across Domains

1 code implementation CVPR 2022 Sivan Harary, Eli Schwartz, Assaf Arbelle, Peter Staar, Shady Abu-Hussein, Elad Amrani, Roei Herzig, Amit Alfassy, Raja Giryes, Hilde Kuehne, Dina Katabi, Kate Saenko, Rogerio Feris, Leonid Karlinsky

The ability to generalize learned representations across significantly different visual domains, such as between real photos, clipart, paintings, and sketches, is a fundamental capacity of the human visual system.

Domain Generalization Self-Supervised Learning

Learning to Detect Every Thing in an Open World

no code implementations3 Dec 2021 Kuniaki Saito, Ping Hu, Trevor Darrell, Kate Saenko

LDET leads to significant improvements on many datasets in the open-world instance segmentation task, outperforming baselines on cross-category generalization on COCO, as well as cross-dataset evaluation on UVO and Cityscapes.

Data Augmentation Instance Segmentation +3

Look at What I’m Doing: Self-Supervised Spatial Grounding of Narrations in Instructional Videos

no code implementations NeurIPS 2021 Reuben Tan, Bryan Plummer, Kate Saenko, Hailin Jin, Bryan Russell

Key to our approach is the ability to learn to spatially localize interactions with self-supervision on a large corpus of videos with accompanying transcribed narrations.

OpenMatch: Open-Set Semi-supervised Learning with Open-set Consistency Regularization

1 code implementation NeurIPS 2021 Kuniaki Saito, Donghyun Kim, Kate Saenko

\ours achieves state-of-the-art performance on three datasets, and even outperforms a fully supervised model in detecting outliers unseen in unlabeled data on CIFAR10.

Outlier Detection

Disentangled Unsupervised Image Translation via Restricted Information Flow

no code implementations26 Nov 2021 Ben Usman, Dina Bashkirova, Kate Saenko

Unsupervised image-to-image translation methods aim to map images from one domain into plausible examples from another domain while preserving structures shared across two domains.

Translation Unsupervised Image-To-Image Translation

Look at What I'm Doing: Self-Supervised Spatial Grounding of Narrations in Instructional Videos

no code implementations20 Oct 2021 Reuben Tan, Bryan A. Plummer, Kate Saenko, Hailin Jin, Bryan Russell

Key to our approach is the ability to learn to spatially localize interactions with self-supervision on a large corpus of videos with accompanying transcribed narrations.

Multi-Critic Actor Learning: Teaching RL Policies to Act with Style

no code implementations ICLR 2022 Siddharth Mysore, George Cheng, Yunqi Zhao, Kate Saenko, Meng Wu

MultiCriticAL is tested in the context of multi-style learning, a special case of MTRL where agents are trained to behave with different distinct behavior styles, and yields up to 45% performance gains over the single-critic baselines and even successfully learns behavior styles in cases where single-critic approaches may simply fail to learn.

Pyramid Mini-Batching for Optimal Transport

no code implementations29 Sep 2021 Devin Guillory, Kuniaki Saito, Eric Tzeng, Yannik Pitcan, Kate Saenko, Trevor Darrell

Optimal transport theory provides a useful tool to measure the differences between two distributions.

Domain Adaptation

MixtureEnsembles: Leveraging Parameter Sharing for Efficient Ensembles

no code implementations29 Sep 2021 Piotr Teterwak, Nikoli Dryden, Dina Bashkirova, Kate Saenko, Bryan A. Plummer

We improve on these methods with MixtureEnsembles, which learns to factorize ensemble members with shared parameters by constructing each layer with a linear combination of templates.

Dynamic Network Quantization for Efficient Video Inference

1 code implementation ICCV 2021 Ximeng Sun, Rameswar Panda, Chun-Fu Chen, Aude Oliva, Rogerio Feris, Kate Saenko

Deep convolutional networks have recently achieved great success in video recognition, yet their practical realization remains a challenge due to the large amount of computational resources required to achieve robust recognition.

Quantization Video Recognition

MetaPose: Fast 3D Pose from Multiple Views without 3D Supervision

1 code implementation CVPR 2022 Ben Usman, Andrea Tagliasacchi, Kate Saenko, Avneesh Sud

In the era of deep learning, human pose estimation from multiple cameras with unknown calibration has received little attention to date.

Weakly-supervised 3D Human Pose Estimation

ZeroWaste Dataset: Towards Deformable Object Segmentation in Cluttered Scenes

1 code implementation CVPR 2022 Dina Bashkirova, Mohamed Abdelfattah, Ziliang Zhu, James Akl, Fadi Alladkani, Ping Hu, Vitaly Ablavsky, Berk Calli, Sarah Adel Bargal, Kate Saenko

Recyclable waste detection poses a unique computer vision challenge as it requires detection of highly deformable and often translucent objects in cluttered scenes without the kind of context information usually present in human-centric datasets.

object-detection Object Detection +3

OpenMatch: Open-set Consistency Regularization for Semi-supervised Learning with Outliers

1 code implementation28 May 2021 Kuniaki Saito, Donghyun Kim, Kate Saenko

OpenMatch achieves state-of-the-art performance on three datasets, and even outperforms a fully supervised model in detecting outliers unseen in unlabeled data on CIFAR10.

Outlier Detection

AdaMML: Adaptive Multi-Modal Learning for Efficient Video Recognition

1 code implementation ICCV 2021 Rameswar Panda, Chun-Fu Chen, Quanfu Fan, Ximeng Sun, Kate Saenko, Aude Oliva, Rogerio Feris

Specifically, given a video segment, a multi-modal policy network is used to decide what modalities should be used for processing by the recognition model, with the goal of improving both accuracy and efficiency.

Video Recognition

Mobile App Tasks with Iterative Feedback (MoTIF): Addressing Task Feasibility in Interactive Visual Environments

1 code implementation17 Apr 2021 Andrea Burns, Deniz Arsan, Sanjna Agrawal, Ranjitha Kumar, Kate Saenko, Bryan A. Plummer

In recent years, vision-language research has shifted to study tasks which require more complex reasoning, such as interactive question answering, visual common sense reasoning, and question-answer plausibility prediction.

Common Sense Reasoning Question Answering

OVANet: One-vs-All Network for Universal Domain Adaptation

2 code implementations ICCV 2021 Kuniaki Saito, Kate Saenko

In this paper, we propose a method to learn the threshold using source samples and to adapt it to the target domain.

Universal Domain Adaptation

Evaluation of Correctness in Unsupervised Many-to-Many Image Translation

1 code implementation29 Mar 2021 Dina Bashkirova, Ben Usman, Kate Saenko

Given an input image from a source domain and a guidance image from a target domain, unsupervised many-to-many image-to-image (UMMI2I) translation methods seek to generate a plausible example from the target domain that preserves domain-invariant information of the input source image and inherits the domain-specific information from the guidance image.


Improved Techniques for Quantizing Deep Networks with Adaptive Bit-Widths

no code implementations2 Mar 2021 Ximeng Sun, Rameswar Panda, Chun-Fu Chen, Naigang Wang, Bowen Pan, Kailash Gopalakrishnan, Aude Oliva, Rogerio Feris, Kate Saenko

Second, to effectively transfer knowledge, we develop a dynamic block swapping method by randomly replacing the blocks in the lower-precision student network with the corresponding blocks in the higher-precision teacher network.

Image Classification Quantization +2

Honey, I Shrunk The Actor: A Case Study on Preserving Performance with Smaller Actors in Actor-Critic RL

no code implementations23 Feb 2021 Siddharth Mysore, Bassel Mabsout, Renato Mancuso, Kate Saenko

Actors and critics in actor-critic reinforcement learning algorithms are functionally separate, yet they often use the same network architectures.

Reinforcement Learning (RL)

VA-RED$^2$: Video Adaptive Redundancy Reduction

no code implementations ICLR 2021 Bowen Pan, Rameswar Panda, Camilo Fosco, Chung-Ching Lin, Alex Andonian, Yue Meng, Kate Saenko, Aude Oliva, Rogerio Feris

An inherent property of real-world videos is the high correlation of information across frames which can translate into redundancy in either temporal or spatial feature maps of the models, or both.

Semi-Supervised Action Recognition with Temporal Contrastive Learning

1 code implementation CVPR 2021 Ankit Singh, Omprakash Chakraborty, Ashutosh Varshney, Rameswar Panda, Rogerio Feris, Kate Saenko, Abir Das

We approach this problem by learning a two-pathway temporal contrastive model using unlabeled videos at two different speeds leveraging the fact that changing video speed does not change an action.

Action Recognition Contrastive Learning

Surprisingly Simple Semi-Supervised Domain Adaptation with Pretraining and Consistency

1 code implementation29 Jan 2021 Samarth Mishra, Kate Saenko, Venkatesh Saligrama

With our Pretraining and Consistency (PAC) approach, we achieve state of the art target accuracy on this semi-supervised domain adaptation task, surpassing multiple adversarial domain alignment methods, across multiple datasets.

Unsupervised Domain Adaptation

CDS: Cross-Domain Self-Supervised Pre-Training

no code implementations ICCV 2021 Donghyun Kim, Kuniaki Saito, Tae-Hyun Oh, Bryan A. Plummer, Stan Sclaroff, Kate Saenko

We present a two-stage pre-training approach that improves the generalization ability of standard single-domain pre-training.

Domain Adaptation Transfer Learning

Regularizing Action Policies for Smooth Control with Reinforcement Learning

no code implementations11 Dec 2020 Siddharth Mysore, Bassel Mabsout, Renato Mancuso, Kate Saenko

A critical problem with the practical utility of controllers trained with deep Reinforcement Learning (RL) is the notable lack of smoothness in the actions learned by the RL policies.

reinforcement-learning Reinforcement Learning (RL)

Select, Label, and Mix: Learning Discriminative Invariant Feature Representations for Partial Domain Adaptation

no code implementations6 Dec 2020 Aadarsh Sahoo, Rameswar Panda, Rogerio Feris, Kate Saenko, Abir Das

Partial domain adaptation which assumes that the unknown target label space is a subset of the source label space has attracted much attention in computer vision.

Partial Domain Adaptation

Uncertainty-Aware Learning for Zero-Shot Semantic Segmentation

no code implementations NeurIPS 2020 Ping Hu, Stan Sclaroff, Kate Saenko

Recently, most ZSS methods focus on learning the visual-semantic correspondence to transfer knowledge from seen classes to unseen classes at the pixel level.

Semantic correspondence Semantic Segmentation

Temporal Action Detection with Multi-level Supervision

no code implementations ICCV 2021 Baifeng Shi, Qi Dai, Judy Hoffman, Kate Saenko, Trevor Darrell, Huijuan Xu

We extensively benchmark against the baselines for SSAD and OSAD on our created data splits in THUMOS14 and ActivityNet1. 2, and demonstrate the effectiveness of the proposed UFA and IB methods.

Action Detection Semi-Supervised Action Detection

Auxiliary Task Reweighting for Minimum-data Learning

no code implementations NeurIPS 2020 Baifeng Shi, Judy Hoffman, Kate Saenko, Trevor Darrell, Huijuan Xu

By adjusting the auxiliary task weights to minimize the divergence between the surrogate prior and the true prior of the main task, we obtain a more accurate prior estimation, achieving the goal of minimizing the required amount of training data for the main task and avoiding a costly grid search.

Domain Adaptation Multi-Label Classification

Detecting Cross-Modal Inconsistency to Defend Against Neural Fake News

1 code implementation EMNLP 2020 Reuben Tan, Bryan A. Plummer, Kate Saenko

In addition to the valuable insights gleaned from our user study experiments, we provide a relatively effective approach based on detecting visual-semantic inconsistencies, which will serve as an effective first line of defense and a useful reference for future work in defending against machine-generated disinformation.

Self-supervised Visual Attribute Learning for Fashion Compatibility

no code implementations1 Aug 2020 Donghyun Kim, Kuniaki Saito, Samarth Mishra, Stan Sclaroff, Kate Saenko, Bryan A Plummer

Our approach consists of three self-supervised tasks designed to capture different concepts that are neglected in prior work that we can select from depending on the needs of our downstream tasks.

Object Recognition Retrieval +2

AR-Net: Adaptive Frame Resolution for Efficient Action Recognition

1 code implementation ECCV 2020 Yue Meng, Chung-Ching Lin, Rameswar Panda, Prasanna Sattigeri, Leonid Karlinsky, Aude Oliva, Kate Saenko, Rogerio Feris

Specifically, given a video frame, a policy network is used to decide what input resolution should be used for processing by the action recognition model, with the goal of improving both accuracy and efficiency.

Action Recognition

Domain2Vec: Domain Embedding for Unsupervised Domain Adaptation

1 code implementation ECCV 2020 Xingchao Peng, Yichen Li, Kate Saenko

Extensive experiments are conducted to demonstrate the power of our new datasets in benchmarking state-of-the-art multi-source domain adaptation methods, as well as the advantage of our proposed model.

Benchmarking Disentanglement +2

COCO-FUNIT: Few-Shot Unsupervised Image Translation with a Content Conditioned Style Encoder

1 code implementation ECCV 2020 Kuniaki Saito, Kate Saenko, Ming-Yu Liu

Unsupervised image-to-image translation intends to learn a mapping of an image in a given domain to an analogous image in a different domain, without explicit supervision of the mapping.

Translation Unsupervised Image-To-Image Translation

Real-time Semantic Segmentation with Fast Attention

1 code implementation7 Jul 2020 Ping Hu, Federico Perazzi, Fabian Caba Heilbron, Oliver Wang, Zhe Lin, Kate Saenko, Stan Sclaroff

The proposed architecture relies on our fast spatial attention, which is a simple yet efficient modification of the popular self-attention mechanism and captures the same rich spatial context at a small fraction of the computational cost, by changing the order of operations.

Real-Time Semantic Segmentation

Neural Parameter Allocation Search

1 code implementation ICLR 2022 Bryan A. Plummer, Nikoli Dryden, Julius Frost, Torsten Hoefler, Kate Saenko

We introduce Neural Parameter Allocation Search (NPAS), a novel task where the goal is to train a neural network given an arbitrary, fixed parameter budget.

Image Classification Phrase Grounding

Learning visual servo policies via planner cloning

no code implementations24 May 2020 Ulrich Viereck, Kate Saenko, Robert Platt

Learning control policies for visual servoing in novel environments is an important problem.

Learning to Scale Multilingual Representations for Vision-Language Tasks

no code implementations ECCV 2020 Andrea Burns, Donghyun Kim, Derry Wijaya, Kate Saenko, Bryan A. Plummer

Current multilingual vision-language models either require a large number of additional parameters for each supported language, or suffer performance degradation as languages are added.

Language Modelling Machine Translation +2

Spatio-Temporal Action Detection with Multi-Object Interaction

no code implementations1 Apr 2020 Huijuan Xu, Lizhi Yang, Stan Sclaroff, Kate Saenko, Trevor Darrell

Spatio-temporal action detection in videos requires localizing the action both spatially and temporally in the form of an "action tube".

Action Detection Human Detection +1

Revisiting Few-shot Activity Detection with Class Similarity Control

no code implementations31 Mar 2020 Huijuan Xu, Ximeng Sun, Eric Tzeng, Abir Das, Kate Saenko, Trevor Darrell

In this paper, we present a conceptually simple and general yet novel framework for few-shot temporal activity detection based on proposal regression which detects the start and end time of the activities in untrimmed videos.

Action Detection Activity Detection +1

Log-Likelihood Ratio Minimizing Flows: Towards Robust and Quantifiable Neural Distribution Alignment

1 code implementation NeurIPS 2020 Ben Usman, Avneesh Sud, Nick Dufour, Kate Saenko

We show that, under certain assumptions, this combination yields a deep neural likelihood-based minimization objective that attains a known lower bound upon convergence.

Domain Adaptation Translation +1

Cross-domain Self-supervised Learning for Domain Adaptation with Few Source Labels

no code implementations18 Mar 2020 Donghyun Kim, Kuniaki Saito, Tae-Hyun Oh, Bryan A. Plummer, Stan Sclaroff, Kate Saenko

We show that when labeled source examples are limited, existing methods often fail to learn discriminative features applicable for both source and target domains.

Self-Supervised Learning Unsupervised Domain Adaptation

Universal Domain Adaptation through Self Supervision

1 code implementation NeurIPS 2020 Kuniaki Saito, Donghyun Kim, Stan Sclaroff, Kate Saenko

While some methods address target settings with either partial or open-set categories, they assume that the particular setting is known a priori.

Partial Domain Adaptation Universal Domain Adaptation +1

A Broader Study of Cross-Domain Few-Shot Learning

2 code implementations ECCV 2020 Yunhui Guo, Noel C. Codella, Leonid Karlinsky, James V. Codella, John R. Smith, Kate Saenko, Tajana Rosing, Rogerio Feris

Extensive experiments on the proposed benchmark are performed to evaluate state-of-art meta-learning approaches, transfer learning approaches, and newer methods for cross-domain few-shot learning.

cross-domain few-shot learning Few-Shot Image Classification +1

Federated Adversarial Domain Adaptation

no code implementations ICLR 2020 Xingchao Peng, Zijun Huang, Yizhe Zhu, Kate Saenko

In this work, we present a principled approach to the problem of federated domain adaptation, which aims to align the representations learned among the different nodes with the data distribution of the target node.

Disentanglement Domain Adaptation +4

LoGAN: Latent Graph Co-Attention Network for Weakly-Supervised Video Moment Retrieval

no code implementations27 Sep 2019 Reuben Tan, Huijuan Xu, Kate Saenko, Bryan A. Plummer

However, while such approaches tend to focus on identifying relationships between elements of the video and language modalities, there is less emphasis on modeling relational context between video frames given the semantic context of the query.

Moment Retrieval Retrieval

Generalized Domain Adaptation with Covariate and Label Shift CO-ALignment

no code implementations25 Sep 2019 Shuhan Tan, Xingchao Peng, Kate Saenko

In this paper, we explore the task of Generalized Domain Adaptation (GDA): How to transfer knowledge across different domains in the presence of both covariate and label shift?

Domain Adaptation Transfer Learning


no code implementations25 Sep 2019 Reuben Tan, Huijuan Xu, Kate Saenko, Bryan A. Plummer

Given a video and a sentence, the goal of weakly-supervised video moment retrieval is to locate the video segment which is described by the sentence without having access to temporal annotations during training.

Moment Retrieval Retrieval

MULE: Multimodal Universal Language Embedding

no code implementations8 Sep 2019 Donghyun Kim, Kuniaki Saito, Kate Saenko, Stan Sclaroff, Bryan A. Plummer

In this paper, we present a modular approach which can easily be incorporated into existing vision-language methods in order to support many languages.

Data Augmentation Machine Translation +1

Learning Similarity Conditions Without Explicit Supervision

1 code implementation ICCV 2019 Reuben Tan, Mariya I. Vasileva, Kate Saenko, Bryan A. Plummer

Many real-world tasks require models to compare images along multiple similarity conditions (e. g. similarity in color, category or shape).

Adversarial Self-Defense for Cycle-Consistent GANs

1 code implementation NeurIPS 2019 Dina Bashkirova, Ben Usman, Kate Saenko

The goal of unsupervised image-to-image translation is to map images from one domain to another without the ground truth correspondence between the two domains.

Adversarial Attack Translation +1

Are You Looking? Grounding to Multiple Modalities in Vision-and-Language Navigation

no code implementations ACL 2019 Ronghang Hu, Daniel Fried, Anna Rohrbach, Dan Klein, Trevor Darrell, Kate Saenko

The actual grounding can connect language to the environment through multiple modalities, e. g. "stop at the door" might ground into visual objects, while "turn right" might rely only on the geometric structure of a route.

Vision and Language Navigation

Language-Conditioned Graph Networks for Relational Reasoning

1 code implementation ICCV 2019 Ronghang Hu, Anna Rohrbach, Trevor Darrell, Kate Saenko

E. g., conditioning on the "on" relationship to the plate, the object "mug" gathers messages from the object "plate" to update its representation to "mug on the plate", which can be easily consumed by a simple classifier for answer prediction.

Referring Expression Comprehension Relational Reasoning +1

Domain Agnostic Learning with Disentangled Representations

1 code implementation28 Apr 2019 Xingchao Peng, Zijun Huang, Ximeng Sun, Kate Saenko

Unsupervised model transfer has the potential to greatly improve the generalizability of deep models to novel domains.

General Classification Image Classification +1

Semi-supervised Domain Adaptation via Minimax Entropy

3 code implementations ICCV 2019 Kuniaki Saito, Donghyun Kim, Stan Sclaroff, Trevor Darrell, Kate Saenko

Contemporary domain adaptation methods are very effective at aligning feature distributions of source and target domains without any target supervision.

Domain Adaptation

Cross-Domain Image Manipulation by Demonstration

no code implementations28 Jan 2019 Ben Usman, Nick Dufour, Kate Saenko, Chris Bregler

In this work we propose a model that can manipulate individual visual attributes of objects in a real scene using examples of how respective attribute manipulations affect the output of a simulation.

Image Manipulation

Similarity R-C3D for Few-shot Temporal Activity Detection

no code implementations25 Dec 2018 Huijuan Xu, Bingyi Kang, Ximeng Sun, Jiashi Feng, Kate Saenko, Trevor Darrell

In this paper, we present a conceptually simple and general yet novel framework for few-shot temporal activity detection which detects the start and end time of the few-shot input activities in an untrimmed video.

Action Detection Activity Detection

TwoStreamVAN: Improving Motion Modeling in Video Generation

1 code implementation3 Dec 2018 Ximeng Sun, Huijuan Xu, Kate Saenko

Video generation is an inherently challenging task, as it requires modeling realistic temporal dynamics as well as spatial content.

Video Generation

SPLAT: Semantic Pixel-Level Adaptation Transforms for Detection

no code implementations3 Dec 2018 Eric Tzeng, Kaylee Burns, Kate Saenko, Trevor Darrell

Without dense labels, as is the case when only detection labels are available in the source, transformations are learned using CycleGAN alignment.

Domain Adaptation Pseudo Label +1

Revisiting Image-Language Networks for Open-ended Phrase Detection

3 code implementations17 Nov 2018 Bryan A. Plummer, Kevin J. Shih, Yichen Li, Ke Xu, Svetlana Lazebnik, Stan Sclaroff, Kate Saenko

Most existing work that grounds natural language phrases in images starts with the assumption that the phrase in question is relevant to the image.

object-detection Object Detection +1

Toward Driving Scene Understanding: A Dataset for Learning Driver Behavior and Causal Reasoning

no code implementations CVPR 2018 Vasili Ramanishka, Yi-Ting Chen, Teruhisa Misu, Kate Saenko

We present the Honda Research Institute Driving Dataset (HDD), a challenging dataset to enable research on learning driver behavior in real-life environments.

Scene Understanding

Exploiting Environmental Variation to Improve Policy Robustness in Reinforcement Learning

no code implementations27 Sep 2018 Siddharth Mysore, Robert Platt, Kate Saenko

We propose a novel method to exploit this observation to develop robust actor policies, by automatically developing a sampling curriculum over environment settings to use in training.

reinforcement-learning Reinforcement Learning (RL)

Object Hallucination in Image Captioning

1 code implementation EMNLP 2018 Anna Rohrbach, Lisa Anne Hendricks, Kaylee Burns, Trevor Darrell, Kate Saenko

Despite continuously improving performance, contemporary image captioning models are prone to "hallucinating" objects that are not actually in a scene.

Image Captioning

Adapting control policies from simulation to reality using a pairwise loss

no code implementations27 Jul 2018 Ulrich Viereck, Xingchao Peng, Kate Saenko, Robert Platt

This paper proposes an approach to domain transfer based on a pairwise loss function that helps transfer control policies learned in simulation onto a real robot.

Explainable Neural Computation via Stack Neural Module Networks

1 code implementation ECCV 2018 Ronghang Hu, Jacob Andreas, Trevor Darrell, Kate Saenko

In complex inferential tasks like question answering, machine learning models must confront two challenges: the need to implement a compositional reasoning process, and, in many applications, the need for this reasoning process to be interpretable to assist users in both development and prediction.

Decision Making Question Answering +1

Syn2Real: A New Benchmark forSynthetic-to-Real Visual Domain Adaptation

no code implementations26 Jun 2018 Xingchao Peng, Ben Usman, Kuniaki Saito, Neela Kaushik, Judy Hoffman, Kate Saenko

In this paper, we present a new large-scale benchmark called Syn2Real, which consists of a synthetic domain rendered from 3D object models and two real-image domains containing the same object categories.

Classification Domain Adaptation +4

RISE: Randomized Input Sampling for Explanation of Black-box Models

9 code implementations19 Jun 2018 Vitali Petsiuk, Abir Das, Kate Saenko

We compare our approach to state-of-the-art importance extraction methods using both an automatic deletion/insertion metric and a pointing metric based on human-annotated object segments.

Decision Making

Unsupervised Video-to-Video Translation

1 code implementation ICLR 2019 Dina Bashkirova, Ben Usman, Kate Saenko

Unsupervised image-to-image translation is a recently proposed task of translating an image to a different style or domain given only unpaired image examples at training time.

Translation Unsupervised Image-To-Image Translation

Speaker-Follower Models for Vision-and-Language Navigation

1 code implementation NeurIPS 2018 Daniel Fried, Ronghang Hu, Volkan Cirik, Anna Rohrbach, Jacob Andreas, Louis-Philippe Morency, Taylor Berg-Kirkpatrick, Kate Saenko, Dan Klein, Trevor Darrell

We use this speaker model to (1) synthesize new instructions for data augmentation and to (2) implement pragmatic reasoning, which evaluates how well candidate action sequences explain an instruction.

Data Augmentation Vision and Language Navigation

Multilevel Language and Vision Integration for Text-to-Clip Retrieval

1 code implementation13 Apr 2018 Huijuan Xu, Kun He, Bryan A. Plummer, Leonid Sigal, Stan Sclaroff, Kate Saenko

To capture the inherent structures present in both text and video, we introduce a multilevel model that integrates vision and language features earlier and more tightly than prior work.


Women also Snowboard: Overcoming Bias in Captioning Models

2 code implementations ECCV 2018 Kaylee Burns, Lisa Anne Hendricks, Kate Saenko, Trevor Darrell, Anna Rohrbach

We introduce a new Equalizer model that ensures equal gender probability when gender evidence is occluded in a scene and confident predictions when gender evidence is present.

Image Captioning

Joint Event Detection and Description in Continuous Video Streams

1 code implementation28 Feb 2018 Huijuan Xu, Boyang Li, Vasili Ramanishka, Leonid Sigal, Kate Saenko

In order to explicitly model temporal relationships between visual events and their captions in a single video, we also propose a two-level hierarchical captioning module that keeps track of context.

Dense Captioning Dense Video Captioning +2

Contextual Multi-Scale Region Convolutional 3D Network for Activity Detection

no code implementations28 Jan 2018 Yancheng Bai, Huijuan Xu, Kate Saenko, Bernard Ghanem

In this paper, we propose the contextual multi-scale region convolutional 3D network (CMS-RC3D) for activity detection.

Action Detection Activity Detection

Learning Multi-Level Hierarchies with Hindsight

4 code implementations4 Dec 2017 Andrew Levy, George Konidaris, Robert Platt, Kate Saenko

Hierarchical agents have the potential to solve sequential decision making tasks with greater sample efficiency than their non-hierarchical counterparts because hierarchical agents can break down tasks into sets of subtasks that only require short sequences of decisions.

Decision Making Hierarchical Reinforcement Learning

Adversarial Dropout Regularization

no code implementations ICLR 2018 Kuniaki Saito, Yoshitaka Ushiku, Tatsuya Harada, Kate Saenko

However, a drawback of this approach is that the critic simply labels the generated features as in-domain or not, without considering the boundaries between classes.

General Classification Image Classification +2

VisDA: The Visual Domain Adaptation Challenge

2 code implementations18 Oct 2017 Xingchao Peng, Ben Usman, Neela Kaushik, Judy Hoffman, Dequan Wang, Kate Saenko

We present the 2017 Visual Domain Adaptation (VisDA) dataset and challenge, a large-scale testbed for unsupervised domain adaptation across visual domains.

General Classification Image Classification +3

Stable Distribution Alignment Using the Dual of the Adversarial Distance

no code implementations ICLR 2018 Ben Usman, Kate Saenko, Brian Kulis

Our empirical results suggest that using the dual formulation for the restricted family of linear discriminators results in a more stable convergence to a desirable solution when compared with the performance of a primal min-max GAN-like objective and an MMD objective under the same restrictions.

Domain Adaptation

Grasp Pose Detection in Point Clouds

1 code implementation29 Jun 2017 Andreas ten Pas, Marcus Gualtieri, Kate Saenko, Robert Platt

Many grasp detection methods achieve grasp success rates (grasp successes as a fraction of the total number of grasp attempts) between 75% and 95% for novel objects presented in isolation or in light clutter.


Learning a visuomotor controller for real world robotic grasping using simulated depth images

no code implementations14 Jun 2017 Ulrich Viereck, Andreas ten Pas, Kate Saenko, Robert Platt

This paper proposes an approach to learning a closed-loop controller for robotic grasping that dynamically guides the gripper to the object.

Robotic Grasping

Adversarial Discriminative Domain Adaptation

19 code implementations CVPR 2017 Eric Tzeng, Judy Hoffman, Kate Saenko, Trevor Darrell

Adversarial learning methods are a promising approach to training robust deep networks, and can generate complex samples across diverse domains.

General Classification Unsupervised Domain Adaptation +1

Synthetic to Real Adaptation with Generative Correlation Alignment Networks

no code implementations19 Jan 2017 Xingchao Peng, Kate Saenko

Experimentally, we show training off-the-shelf classifiers on the newly generated data can significantly boost performance when testing on the real image domains (PASCAL VOC 2007 benchmark and Office dataset), improving upon several existing methods.

Domain Adaptation Object Recognition

Top-down Visual Saliency Guided by Captions

6 code implementations CVPR 2017 Vasili Ramanishka, Abir Das, Jianming Zhang, Kate Saenko

Neural image/video captioning models can generate accurate descriptions, but their internal process of mapping regions to words is a black box and therefore difficult to explain.

Video Captioning

Correlation Alignment for Unsupervised Domain Adaptation

4 code implementations6 Dec 2016 Baochen Sun, Jiashi Feng, Kate Saenko

In contrast to subspace manifold methods, it aligns the original feature distributions of the source and target domains, rather than the bases of lower-dimensional subspaces.

Unsupervised Domain Adaptation

Modeling Relationships in Referential Expressions with Compositional Modular Networks

2 code implementations CVPR 2017 Ronghang Hu, Marcus Rohrbach, Jacob Andreas, Trevor Darrell, Kate Saenko

In this paper we instead present a modular deep architecture capable of analyzing referential expressions into their component parts, identifying entities and relationships mentioned in the input expression and grounding them all in the scene.

Visual Question Answering (VQA)

Combining Texture and Shape Cues for Object Recognition With Minimal Supervision

no code implementations14 Sep 2016 Xingchao Peng, Kate Saenko

We present a novel approach to object classification and detection which requires minimal supervision and which combines visual texture cues and shape information learned from freely available unlabeled web search results.

General Classification Image Retrieval +1

Deep CORAL: Correlation Alignment for Deep Domain Adaptation

9 code implementations6 Jul 2016 Baochen Sun, Kate Saenko

CORAL is a "frustratingly easy" unsupervised domain adaptation method that aligns the second-order statistics of the source and target distributions with a linear transformation.

Domain Generalization Unsupervised Domain Adaptation

Captioning Images with Diverse Objects

1 code implementation CVPR 2017 Subhashini Venugopalan, Lisa Anne Hendricks, Marcus Rohrbach, Raymond Mooney, Trevor Darrell, Kate Saenko

We propose minimizing a joint objective which can learn from these diverse data sources and leverage distributional semantic embeddings, enabling the model to generalize and describe novel objects outside of image-caption datasets.

Object Recognition

Fine-to-coarse Knowledge Transfer For Low-Res Image Classification

no code implementations21 May 2016 Xingchao Peng, Judy Hoffman, Stella X. Yu, Kate Saenko

We address the difficult problem of distinguishing fine-grained object categories in low resolution images.

Classification General Classification +2

Improving LSTM-based Video Description with Linguistic Knowledge Mined from Text

3 code implementations EMNLP 2016 Subhashini Venugopalan, Lisa Anne Hendricks, Raymond Mooney, Kate Saenko

This paper investigates how linguistic knowledge mined from large text corpora can aid the generation of natural language descriptions of videos.

Descriptive Language Modelling +1

High precision grasp pose detection in dense clutter

3 code implementations4 Mar 2016 Marcus Gualtieri, Andreas ten Pas, Kate Saenko, Robert Platt

Our focus in this paper is on improving the second step by using depth sensor scans from large online datasets to train a convolutional neural network.


Sequence to Sequence - Video to Text

no code implementations ICCV 2015 Subhashini Venugopalan, Marcus Rohrbach, Jeffrey Donahue, Raymond Mooney, Trevor Darrell, Kate Saenko

Our LSTM model is trained on video-sentence pairs and learns to associate a sequence of video frames to a sequence of words in order to generate a description of the event in the video clip.

Language Modelling

Adapting Deep Visuomotor Representations with Weak Pairwise Constraints

no code implementations23 Nov 2015 Eric Tzeng, Coline Devin, Judy Hoffman, Chelsea Finn, Pieter Abbeel, Sergey Levine, Kate Saenko, Trevor Darrell

We propose a novel, more powerful combination of both distribution and pairwise image alignment, and remove the requirement for expensive annotation by using weakly aligned pairs of images in the source and target domains.

Domain Adaptation

Deep Compositional Captioning: Describing Novel Object Categories without Paired Training Data

1 code implementation CVPR 2016 Lisa Anne Hendricks, Subhashini Venugopalan, Marcus Rohrbach, Raymond Mooney, Kate Saenko, Trevor Darrell

Current deep caption models can only describe objects contained in paired image-sentence corpora, despite the fact that they are pre-trained with large object recognition datasets, namely ImageNet.

Image Captioning Novel Concepts +2

Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering

1 code implementation17 Nov 2015 Huijuan Xu, Kate Saenko

We propose a novel spatial attention architecture that aligns words with image patches in the first hop, and obtain improved results by adding a second attention hop which considers the whole question to choose visual evidence based on the results of the first hop.

Image Captioning Question Answering +1

Return of Frustratingly Easy Domain Adaptation

1 code implementation17 Nov 2015 Baochen Sun, Jiashi Feng, Kate Saenko

Unlike human learning, machine learning often fails to handle changes between training (source) and test (target) input distributions.

BIG-bench Machine Learning Unsupervised Domain Adaptation

Natural Language Object Retrieval

1 code implementation CVPR 2016 Ronghang Hu, Huazhe Xu, Marcus Rohrbach, Jiashi Feng, Kate Saenko, Trevor Darrell

In this paper, we address the task of natural language object retrieval, to localize a target object within a given image based on a natural language query of the object.

Image Captioning Image Retrieval +3

Spatial Semantic Regularisation for Large Scale Object Detection

no code implementations ICCV 2015 Damian Mrowca, Marcus Rohrbach, Judy Hoffman, Ronghang Hu, Kate Saenko, Trevor Darrell

Our approach proves to be especially useful in large scale settings with thousands of classes, where spatial and semantic interactions are very frequent and only weakly supervised detectors can be built due to a lack of bounding box annotations.

object-detection Object Detection

Simultaneous Deep Transfer Across Domains and Tasks

1 code implementation ICCV 2015 Eric Tzeng, Judy Hoffman, Trevor Darrell, Kate Saenko

Recent reports suggest that a generic supervised deep CNN model trained on a large-scale dataset reduces, but does not remove, dataset bias.

Domain Adaptation

A Multi-scale Multiple Instance Video Description Network

no code implementations21 May 2015 Huijuan Xu, Subhashini Venugopalan, Vasili Ramanishka, Marcus Rohrbach, Kate Saenko

Most state-of-the-art methods for solving this problem borrow existing deep convolutional neural network (CNN) architectures (AlexNet, GoogLeNet) to extract a visual representation of the input video.

Image Segmentation Multiple Instance Learning +2

Sequence to Sequence -- Video to Text

4 code implementations3 May 2015 Subhashini Venugopalan, Marcus Rohrbach, Jeff Donahue, Raymond Mooney, Trevor Darrell, Kate Saenko

Our LSTM model is trained on video-sentence pairs and learns to associate a sequence of video frames to a sequence of words in order to generate a description of the event in the video clip.

Language Modelling

What Do Deep CNNs Learn About Objects?

no code implementations9 Apr 2015 Xingchao Peng, Baochen Sun, Karim Ali, Kate Saenko

Deep convolutional neural networks learn extremely powerful image representations, yet most of that power is hidden in the millions of deep-layer parameters.

Learning Deep Object Detectors from 3D Models

no code implementations ICCV 2015 Xingchao Peng, Baochen Sun, Karim Ali, Kate Saenko

Crowdsourced 3D CAD models are becoming easily accessible online, and can potentially generate an infinite number of training images for almost any object category. We show that augmenting the training data of contemporary Deep Convolutional Neural Net (DCNN) models with such synthetic data can be effective, especially when real training data is limited or not well matched to the target domain.

Deep Domain Confusion: Maximizing for Domain Invariance

7 code implementations10 Dec 2014 Eric Tzeng, Judy Hoffman, Ning Zhang, Kate Saenko, Trevor Darrell

Recent reports suggest that a generic supervised deep CNN model trained on a large-scale dataset reduces, but does not remove, dataset bias on a standard benchmark.

Domain Adaptation Model Selection

Detector Discovery in the Wild: Joint Multiple Instance and Representation Learning

no code implementations CVPR 2015 Judy Hoffman, Deepak Pathak, Trevor Darrell, Kate Saenko

We develop methods for detector learning which exploit joint training over both weak and strong labels and which transfer learned perceptual representations from strongly-labeled auxiliary tasks.

Multiple Instance Learning Representation Learning +1