Search Results for author: Kate Saenko

Found 195 papers, 100 papers with code

KiVA: Kid-inspired Visual Analogies for Testing Large Multimodal Models

1 code implementation25 Jul 2024 Eunice Yiu, Maan Qraitem, Charlie Wong, Anisa Noor Majhi, Yutong Bai, Shiry Ginosar, Alison Gopnik, Kate Saenko

This paper investigates visual analogical reasoning in large multimodal models (LMMs) compared to human adults and children.

Visual Analogies Visual Reasoning

Tell Me What's Next: Textual Foresight for Generic UI Representations

1 code implementation12 Jun 2024 Andrea Burns, Kate Saenko, Bryan A. Plummer

Mobile app user interfaces (UIs) are rich with action, text, structure, and image content that can be utilized to learn generic UI representations for tasks like automating user commands, summarizing content, and evaluating the accessibility of user interfaces.

Representation Learning

SLANT: Spurious Logo ANalysis Toolkit

no code implementations3 Jun 2024 Maan Qraitem, Piotr Teterwak, Kate Saenko, Bryan A. Plummer

We uncover various seemingly harmless logos that VL models correlate 1) with negative human adjectives 2) with the concept of `harmlessness'; causing models to misclassify harmful online content as harmless, and 3) with user-provided object concepts; causing lower recognition accuracy on ImageNet zero-shot classification.

Zero-Shot Learning

Concept Arithmetics for Circumventing Concept Inhibition in Diffusion Models

no code implementations21 Apr 2024 Vitali Petsiuk, Kate Saenko

We use compositional property of diffusion models, which allows to leverage multiple prompts in a single image generation.

ARC Image Generation +1

Koala: Key frame-conditioned long video-LLM

no code implementations CVPR 2024 Reuben Tan, Ximeng Sun, Ping Hu, Jui-Hsien Wang, Hanieh Deilamsalehy, Bryan A. Plummer, Bryan Russell, Kate Saenko

Long video question answering is a challenging task that involves recognizing short-term activities and reasoning about their fine-grained relationships.

Action Recognition Question Answering +2

Vision-LLMs Can Fool Themselves with Self-Generated Typographic Attacks

1 code implementation1 Feb 2024 Maan Qraitem, Nazia Tasnim, Piotr Teterwak, Kate Saenko, Bryan A. Plummer

Furthermore, prior work's Typographic attacks against CLIP randomly sample a misleading class from a predefined set of categories.

Descriptive

SynCDR : Training Cross Domain Retrieval Models with Synthetic Data

1 code implementation31 Dec 2023 Samarth Mishra, Carlos D. Castillo, Hongcheng Wang, Kate Saenko, Venkatesh Saligrama

In cross-domain retrieval, a model is required to identify images from the same semantic category across two visual domains.

Retrieval Translation

CLAMP: Contrastive LAnguage Model Prompt-tuning

no code implementations4 Dec 2023 Piotr Teterwak, Ximeng Sun, Bryan A. Plummer, Kate Saenko, Ser-Nam Lim

Our results show that LLMs can, indeed, achieve good image classification performance when adapted this way.

Contrastive Learning Image Captioning +5

Learning to Compose SuperWeights for Neural Parameter Allocation Search

1 code implementation3 Dec 2023 Piotr Teterwak, Soren Nelson, Nikoli Dryden, Dina Bashkirova, Kate Saenko, Bryan A. Plummer

To address this, we generate layer weights by learning to compose sets of SuperWeights, which represent a group of trainable parameters.

Lasagna: Layered Score Distillation for Disentangled Object Relighting

1 code implementation30 Nov 2023 Dina Bashkirova, Arijit Ray, Rupayan Mallick, Sarah Adel Bargal, Jianming Zhang, Ranjay Krishna, Kate Saenko

Although generative editing methods now enable some forms of image editing, relighting is still beyond today's capabilities; existing methods struggle to keep other aspects of the image -- colors, shapes, and textures -- consistent after the edit.

Colorization Object +1

Video Frame Interpolation with Many-to-many Splatting and Spatial Selective Refinement

no code implementations29 Oct 2023 Ping Hu, Simon Niklaus, Lu Zhang, Stan Sclaroff, Kate Saenko

In this work, we first propose a fully differentiable Many-to-Many (M2M) splatting framework to interpolate frames efficiently.

Computational Efficiency Motion Estimation +1

Socratis: Are large multimodal models emotionally aware?

no code implementations31 Aug 2023 Katherine Deng, Arijit Ray, Reuben Tan, Saadia Gabriel, Bryan A. Plummer, Kate Saenko

We further see that current captioning metrics based on large vision-language models also fail to correlate with human preferences.

Label Budget Allocation in Multi-Task Learning

no code implementations24 Aug 2023 Ximeng Sun, Kihyuk Sohn, Kate Saenko, Clayton Mellina, Xiao Bian

How should the label budget (i. e. the amount of money spent on labeling) be allocated among different tasks to achieve optimal multi-task performance?

Multi-Task Learning

From Fake to Real: Pretraining on Balanced Synthetic Images to Prevent Spurious Correlations in Image Recognition

1 code implementation8 Aug 2023 Maan Qraitem, Kate Saenko, Bryan A. Plummer

By training on real and synthetic data separately, FFR does not expose the model to the statistical differences between real and synthetic data and thus avoids the issue of bias toward the pair $(B, G)$.

DualCoOp++: Fast and Effective Adaptation to Multi-Label Recognition with Limited Annotations

no code implementations3 Aug 2023 Ping Hu, Ximeng Sun, Stan Sclaroff, Kate Saenko

Previous works have focused on learning the alignment between textual and visual spaces to compensate for limited image labels, yet may suffer from reduced accuracy due to the scarcity of high-quality multi-label annotations.

Multi-Label Image Recognition

Multiscale Video Pretraining for Long-Term Activity Forecasting

no code implementations24 Jul 2023 Reuben Tan, Matthias De Lange, Michael Iuzzolino, Bryan A. Plummer, Kate Saenko, Karl Ridgeway, Lorenzo Torresani

To alleviate this issue, we propose Multiscale Video Pretraining (MVP), a novel self-supervised pretraining approach that learns robust representations for forecasting by learning to predict contextualized representations of future video clips over multiple timescales.

Action Anticipation Long Term Action Anticipation

Hardwiring ViT Patch Selectivity into CNNs using Patch Mixing

no code implementations30 Jun 2023 Ariel N. Lee, Sarah Adel Bargal, Janavi Kasera, Stan Sclaroff, Kate Saenko, Nataniel Ruiz

We hypothesize that this power to ignore out-of-context information (which we name $\textit{patch selectivity}$), while integrating in-context information in a non-local manner in early layers, allows ViTs to more easily handle occlusion.

Data Augmentation Inductive Bias

COLA: A Benchmark for Compositional Text-to-image Retrieval

1 code implementation NeurIPS 2023 Arijit Ray, Filip Radenovic, Abhimanyu Dubey, Bryan A. Plummer, Ranjay Krishna, Kate Saenko

To solve Cola, a model must retrieve images with the correct configuration of attributes and objects and avoid choosing a distractor image with the same objects and attributes but in the wrong configuration.

Attribute CoLA +2

ERM++: An Improved Baseline for Domain Generalization

2 code implementations4 Apr 2023 Piotr Teterwak, Kuniaki Saito, Theodoros Tsiligkaridis, Kate Saenko, Bryan A. Plummer

We also explore the relationship between DG performance and similarity to pre-training data, and find that similarity to pre-training data distributions is an important driver of performance, but that ERM++ with stronger initializations can deliver strong performance even on dissimilar datasets. Code is released at https://github. com/piotr-teterwak/erm_plusplus.

Domain Generalization

DIME-FM: DIstilling Multimodal and Efficient Foundation Models

no code implementations31 Mar 2023 Ximeng Sun, Pengchuan Zhang, Peizhao Zhang, Hardik Shah, Kate Saenko, Xide Xia

We transfer the knowledge from the pre-trained CLIP-ViTL/14 model to a ViT-B/32 model, with only 40M public images and 28. 4M unpaired public sentences.

Image Classification

Language-Guided Audio-Visual Source Separation via Trimodal Consistency

no code implementations CVPR 2023 Reuben Tan, Arijit Ray, Andrea Burns, Bryan A. Plummer, Justin Salamon, Oriol Nieto, Bryan Russell, Kate Saenko

We propose a self-supervised approach for learning to perform audio source separation in videos based on natural language queries, using only unlabeled video and audio pairs as training data.

Audio Source Separation Natural Language Queries

Mind the Backbone: Minimizing Backbone Distortion for Robust Object Detection

1 code implementation26 Mar 2023 Kuniaki Saito, Donghyun Kim, Piotr Teterwak, Rogerio Feris, Kate Saenko

We propose to use Relative Gradient Norm (RGN) as a way to measure the vulnerability of a backbone to feature distortion, and show that high RGN is indeed correlated with lower OOD performance.

object-detection Robust Object Detection

MaskSketch: Unpaired Structure-guided Masked Image Generation

2 code implementations CVPR 2023 Dina Bashkirova, Jose Lezama, Kihyuk Sohn, Kate Saenko, Irfan Essa

We show that intermediate self-attention maps of a masked generative transformer encode important structural information of the input image, such as scene layout and object shape, and we propose a novel sampling method based on this observation to enable structure-guided generation.

Conditional Image Generation Diversity +3

Pic2Word: Mapping Pictures to Words for Zero-shot Composed Image Retrieval

1 code implementation CVPR 2023 Kuniaki Saito, Kihyuk Sohn, Xiang Zhang, Chun-Liang Li, Chen-Yu Lee, Kate Saenko, Tomas Pfister

Existing methods rely on supervised learning of CIR models using labeled triplets consisting of the query image, text specification, and the target image.

Attribute Retrieval +2

Anchored Learning for On-the-Fly Adaptation -- Extended Technical Report

1 code implementation17 Jan 2023 Bassel El Mabsout, Shahin Roozkhosh, Siddharth Mysore, Kate Saenko, Renato Mancuso

This study presents "anchor critics", a novel strategy for enhancing the robustness of reinforcement learning (RL) agents in crossing the sim-to-real gap.

Reinforcement Learning (RL)

DIME-FM : DIstilling Multimodal and Efficient Foundation Models

no code implementations ICCV 2023 Ximeng Sun, Pengchuan Zhang, Peizhao Zhang, Hardik Shah, Kate Saenko, Xide Xia

In this paper, we introduce a new distillation mechanism (DIME-FM) that allows us to transfer the knowledge contained in large VLFMs to smaller, customized foundation models using a relatively small amount of inexpensive, unpaired images and sentences.

Image Classification

Finding Differences Between Transformers and ConvNets Using Counterfactual Simulation Testing

no code implementations29 Nov 2022 Nataniel Ruiz, Sarah Adel Bargal, Cihang Xie, Kate Saenko, Stan Sclaroff

One shortcoming of this is the fact that these deep neural networks cannot be easily evaluated for robustness issues with respect to specific scene variations.

counterfactual Object

Exploring Consistency in Cross-Domain Transformer for Domain Adaptive Semantic Segmentation

1 code implementation27 Nov 2022 Kaihong Wang, Donghyun Kim, Rogerio Feris, Kate Saenko, Margrit Betke

We propose to perform adaptation on attention maps with cross-domain attention layers that share features between the source and the target domains.

Semantic Segmentation Unsupervised Domain Adaptation

Human Evaluation of Text-to-Image Models on a Multi-Task Benchmark

no code implementations22 Nov 2022 Vitali Petsiuk, Alexander E. Siemenn, Saisamrit Surbehera, Zad Chin, Keith Tyser, Gregory Hunter, Arvind Raghavan, Yann Hicke, Bryan A. Plummer, Ori Kerret, Tonio Buonassisi, Kate Saenko, Armando Solar-Lezama, Iddo Drori

For example, asking a model to generate a varying number of the same object to measure its ability to count or providing a text prompt with several objects that each have a different attribute to identify its ability to match objects and attributes correctly.

Attribute Text-to-Image Generation

Bias Mimicking: A Simple Sampling Approach for Bias Mitigation

1 code implementation CVPR 2023 Maan Qraitem, Kate Saenko, Bryan A. Plummer

Using this notion, BM, through a novel training procedure, ensures that the model is exposed to the entire distribution per epoch without repeating samples.

FETA: Towards Specializing Foundation Models for Expert Task Applications

1 code implementation8 Sep 2022 Amit Alfassy, Assaf Arbelle, Oshri Halimi, Sivan Harary, Roei Herzig, Eli Schwartz, Rameswar Panda, Michele Dolfi, Christoph Auer, Kate Saenko, PeterW. J. Staar, Rogerio Feris, Leonid Karlinsky

However, as we show in this paper, FMs still have poor out-of-the-box performance on expert tasks (e. g. retrieval of car manuals technical illustrations from language queries), data for which is either unseen or belonging to a long-tail part of the data distribution of the huge datasets used for FM pre-training.

Domain Generalization Image Retrieval +7

NewsStories: Illustrating articles with visual summaries

1 code implementation26 Jul 2022 Reuben Tan, Bryan A. Plummer, Kate Saenko, JP Lewis, Avneesh Sud, Thomas Leung

Thus, we explore a novel setting where the goal is to learn a self-supervised visual-language representation that is robust to varying text length and the number of images.

Retrieval

Prefix Conditioning Unifies Language and Label Supervision

no code implementations CVPR 2023 Kuniaki Saito, Kihyuk Sohn, Xiang Zhang, Chun-Liang Li, Chen-Yu Lee, Kate Saenko, Tomas Pfister

In experiments, we show that this simple technique improves the performance in zero-shot image recognition accuracy and robustness to the image-level distribution shift.

Classification Contrastive Learning +2

Temporal Relevance Analysis for Video Action Models

no code implementations25 Apr 2022 Quanfu Fan, Donghyun Kim, Chun-Fu, Chen, Stan Sclaroff, Kate Saenko, Sarah Adel Bargal

In this paper, we provide a deep analysis of temporal modeling for action recognition, an important but underexplored problem in the literature.

Action Recognition

Many-to-many Splatting for Efficient Video Frame Interpolation

1 code implementation CVPR 2022 Ping Hu, Simon Niklaus, Stan Sclaroff, Kate Saenko

Motion-based video frame interpolation commonly relies on optical flow to warp pixels from the inputs to the desired interpolation instant.

Motion Estimation Optical Flow Estimation +1

A Unified Framework for Domain Adaptive Pose Estimation

1 code implementation1 Apr 2022 Donghyun Kim, Kaihong Wang, Kate Saenko, Margrit Betke, Stan Sclaroff

In this paper, we investigate the problem of domain adaptive 2D pose estimation that transfers knowledge learned on a synthetic source domain to a target domain without supervision.

2D Pose Estimation Animal Pose Estimation +2

A Broad Study of Pre-training for Domain Generalization and Adaptation

1 code implementation22 Mar 2022 Donghyun Kim, Kaihong Wang, Stan Sclaroff, Kate Saenko

In this paper, we provide a broad study and in-depth analysis of pre-training for domain adaptation and generalization, namely: network architectures, size, pre-training loss, and datasets.

Domain Generalization

The Abduction of Sherlock Holmes: A Dataset for Visual Abductive Reasoning

1 code implementation10 Feb 2022 Jack Hessel, Jena D. Hwang, Jae Sung Park, Rowan Zellers, Chandra Bhagavatula, Anna Rohrbach, Kate Saenko, Yejin Choi

We present Sherlock, an annotated corpus of 103K images for testing machine capacity for abductive reasoning beyond literal image contents.

Visual Abductive Reasoning Visual Reasoning

A Dataset for Interactive Vision-Language Navigation with Unknown Command Feasibility

1 code implementation4 Feb 2022 Andrea Burns, Deniz Arsan, Sanjna Agrawal, Ranjitha Kumar, Kate Saenko, Bryan A. Plummer

To study VLN with unknown command feasibility, we introduce a new dataset Mobile app Tasks with Iterative Feedback (MoTIF), where the goal is to complete a natural language command in a mobile app.

Common Sense Reasoning Question Answering +1

Explaining Reinforcement Learning Policies through Counterfactual Trajectories

1 code implementation29 Jan 2022 Julius Frost, Olivia Watkins, Eric Weiner, Pieter Abbeel, Trevor Darrell, Bryan Plummer, Kate Saenko

In order for humans to confidently decide where to employ RL agents for real-world tasks, a human developer must validate that the agent will perform well at test-time.

counterfactual Decision Making +3

Extending the WILDS Benchmark for Unsupervised Adaptation

1 code implementation ICLR 2022 Shiori Sagawa, Pang Wei Koh, Tony Lee, Irena Gao, Sang Michael Xie, Kendrick Shen, Ananya Kumar, Weihua Hu, Michihiro Yasunaga, Henrik Marklund, Sara Beery, Etienne David, Ian Stavness, Wei Guo, Jure Leskovec, Kate Saenko, Tatsunori Hashimoto, Sergey Levine, Chelsea Finn, Percy Liang

Unlabeled data can be a powerful point of leverage for mitigating these distribution shifts, as it is frequently much more available than labeled data and can often be obtained from distributions beyond the source distribution as well.

Unsupervised Domain Generalization by Learning a Bridge Across Domains

1 code implementation CVPR 2022 Sivan Harary, Eli Schwartz, Assaf Arbelle, Peter Staar, Shady Abu-Hussein, Elad Amrani, Roei Herzig, Amit Alfassy, Raja Giryes, Hilde Kuehne, Dina Katabi, Kate Saenko, Rogerio Feris, Leonid Karlinsky

The ability to generalize learned representations across significantly different visual domains, such as between real photos, clipart, paintings, and sketches, is a fundamental capacity of the human visual system.

Domain Generalization Self-Supervised Learning

Learning to Detect Every Thing in an Open World

no code implementations3 Dec 2021 Kuniaki Saito, Ping Hu, Trevor Darrell, Kate Saenko

LDET leads to significant improvements on many datasets in the open-world instance segmentation task, outperforming baselines on cross-category generalization on COCO, as well as cross-dataset evaluation on UVO and Cityscapes.

Data Augmentation object-detection +3

Look at What I’m Doing: Self-Supervised Spatial Grounding of Narrations in Instructional Videos

no code implementations NeurIPS 2021 Reuben Tan, Bryan Plummer, Kate Saenko, Hailin Jin, Bryan Russell

Key to our approach is the ability to learn to spatially localize interactions with self-supervision on a large corpus of videos with accompanying transcribed narrations.

Disentangled Unsupervised Image Translation via Restricted Information Flow

no code implementations26 Nov 2021 Ben Usman, Dina Bashkirova, Kate Saenko

Unsupervised image-to-image translation methods aim to map images from one domain into plausible examples from another domain while preserving structures shared across two domains.

Attribute Translation +1

Contrast and Mix: Temporal Contrastive Video Domain Adaptation with Background Mixing

no code implementations NeurIPS 2021 Aadarsh Sahoo, Rutav Shah, Rameswar Panda, Kate Saenko, Abir Das

Unsupervised domain adaptation which aims to adapt models trained on a labeled source domain to a completely unlabeled target domain has attracted much attention in recent years.

Contrastive Learning Unsupervised Domain Adaptation +1

Look at What I'm Doing: Self-Supervised Spatial Grounding of Narrations in Instructional Videos

no code implementations20 Oct 2021 Reuben Tan, Bryan A. Plummer, Kate Saenko, Hailin Jin, Bryan Russell

Key to our approach is the ability to learn to spatially localize interactions with self-supervision on a large corpus of videos with accompanying transcribed narrations.

Multi-Critic Actor Learning: Teaching RL Policies to Act with Style

no code implementations ICLR 2022 Siddharth Mysore, George Cheng, Yunqi Zhao, Kate Saenko, Meng Wu

MultiCriticAL is tested in the context of multi-style learning, a special case of MTRL where agents are trained to behave with different distinct behavior styles, and yields up to 45% performance gains over the single-critic baselines and even successfully learns behavior styles in cases where single-critic approaches may simply fail to learn.

MixtureEnsembles: Leveraging Parameter Sharing for Efficient Ensembles

no code implementations29 Sep 2021 Piotr Teterwak, Nikoli Dryden, Dina Bashkirova, Kate Saenko, Bryan A. Plummer

We improve on these methods with MixtureEnsembles, which learns to factorize ensemble members with shared parameters by constructing each layer with a linear combination of templates.

Pyramid Mini-Batching for Optimal Transport

no code implementations29 Sep 2021 Devin Guillory, Kuniaki Saito, Eric Tzeng, Yannik Pitcan, Kate Saenko, Trevor Darrell

Optimal transport theory provides a useful tool to measure the differences between two distributions.

Domain Adaptation

Dynamic Network Quantization for Efficient Video Inference

1 code implementation ICCV 2021 Ximeng Sun, Rameswar Panda, Chun-Fu Chen, Aude Oliva, Rogerio Feris, Kate Saenko

Deep convolutional networks have recently achieved great success in video recognition, yet their practical realization remains a challenge due to the large amount of computational resources required to achieve robust recognition.

Quantization Video Recognition

MetaPose: Fast 3D Pose from Multiple Views without 3D Supervision

1 code implementation CVPR 2022 Ben Usman, Andrea Tagliasacchi, Kate Saenko, Avneesh Sud

In the era of deep learning, human pose estimation from multiple cameras with unknown calibration has received little attention to date.

Weakly-supervised 3D Human Pose Estimation

ZeroWaste Dataset: Towards Deformable Object Segmentation in Cluttered Scenes

1 code implementation CVPR 2022 Dina Bashkirova, Mohamed Abdelfattah, Ziliang Zhu, James Akl, Fadi Alladkani, Ping Hu, Vitaly Ablavsky, Berk Calli, Sarah Adel Bargal, Kate Saenko

Recyclable waste detection poses a unique computer vision challenge as it requires detection of highly deformable and often translucent objects in cluttered scenes without the kind of context information usually present in human-centric datasets.

Object object-detection +5

OpenMatch: Open-set Consistency Regularization for Semi-supervised Learning with Outliers

1 code implementation28 May 2021 Kuniaki Saito, Donghyun Kim, Kate Saenko

OpenMatch achieves state-of-the-art performance on three datasets, and even outperforms a fully supervised model in detecting outliers unseen in unlabeled data on CIFAR10.

Novelty Detection Outlier Detection

AdaMML: Adaptive Multi-Modal Learning for Efficient Video Recognition

1 code implementation ICCV 2021 Rameswar Panda, Chun-Fu Chen, Quanfu Fan, Ximeng Sun, Kate Saenko, Aude Oliva, Rogerio Feris

Specifically, given a video segment, a multi-modal policy network is used to decide what modalities should be used for processing by the recognition model, with the goal of improving both accuracy and efficiency.

Video Recognition

Mobile App Tasks with Iterative Feedback (MoTIF): Addressing Task Feasibility in Interactive Visual Environments

1 code implementation17 Apr 2021 Andrea Burns, Deniz Arsan, Sanjna Agrawal, Ranjitha Kumar, Kate Saenko, Bryan A. Plummer

In recent years, vision-language research has shifted to study tasks which require more complex reasoning, such as interactive question answering, visual common sense reasoning, and question-answer plausibility prediction.

Common Sense Reasoning Question Answering

OVANet: One-vs-All Network for Universal Domain Adaptation

2 code implementations ICCV 2021 Kuniaki Saito, Kate Saenko

In this paper, we propose a method to learn the threshold using source samples and to adapt it to the target domain.

Universal Domain Adaptation

Evaluation of Correctness in Unsupervised Many-to-Many Image Translation

1 code implementation29 Mar 2021 Dina Bashkirova, Ben Usman, Kate Saenko

Given an input image from a source domain and a guidance image from a target domain, unsupervised many-to-many image-to-image (UMMI2I) translation methods seek to generate a plausible example from the target domain that preserves domain-invariant information of the input source image and inherits the domain-specific information from the guidance image.

Translation

Improved Techniques for Quantizing Deep Networks with Adaptive Bit-Widths

no code implementations2 Mar 2021 Ximeng Sun, Rameswar Panda, Chun-Fu Chen, Naigang Wang, Bowen Pan, Kailash Gopalakrishnan, Aude Oliva, Rogerio Feris, Kate Saenko

Second, to effectively transfer knowledge, we develop a dynamic block swapping method by randomly replacing the blocks in the lower-precision student network with the corresponding blocks in the higher-precision teacher network.

Image Classification Quantization +2

Honey, I Shrunk The Actor: A Case Study on Preserving Performance with Smaller Actors in Actor-Critic RL

no code implementations23 Feb 2021 Siddharth Mysore, Bassel Mabsout, Renato Mancuso, Kate Saenko

Actors and critics in actor-critic reinforcement learning algorithms are functionally separate, yet they often use the same network architectures.

Reinforcement Learning (RL)

VA-RED$^2$: Video Adaptive Redundancy Reduction

no code implementations ICLR 2021 Bowen Pan, Rameswar Panda, Camilo Fosco, Chung-Ching Lin, Alex Andonian, Yue Meng, Kate Saenko, Aude Oliva, Rogerio Feris

An inherent property of real-world videos is the high correlation of information across frames which can translate into redundancy in either temporal or spatial feature maps of the models, or both.

Semi-Supervised Action Recognition with Temporal Contrastive Learning

1 code implementation CVPR 2021 Ankit Singh, Omprakash Chakraborty, Ashutosh Varshney, Rameswar Panda, Rogerio Feris, Kate Saenko, Abir Das

We approach this problem by learning a two-pathway temporal contrastive model using unlabeled videos at two different speeds leveraging the fact that changing video speed does not change an action.

Action Recognition Contrastive Learning

Surprisingly Simple Semi-Supervised Domain Adaptation with Pretraining and Consistency

1 code implementation29 Jan 2021 Samarth Mishra, Kate Saenko, Venkatesh Saligrama

With our Pretraining and Consistency (PAC) approach, we achieve state of the art target accuracy on this semi-supervised domain adaptation task, surpassing multiple adversarial domain alignment methods, across multiple datasets.

Semi-supervised Domain Adaptation Unsupervised Domain Adaptation

CDS: Cross-Domain Self-Supervised Pre-Training

no code implementations ICCV 2021 Donghyun Kim, Kuniaki Saito, Tae-Hyun Oh, Bryan A. Plummer, Stan Sclaroff, Kate Saenko

We present a two-stage pre-training approach that improves the generalization ability of standard single-domain pre-training.

Domain Adaptation Transfer Learning

Regularizing Action Policies for Smooth Control with Reinforcement Learning

no code implementations11 Dec 2020 Siddharth Mysore, Bassel Mabsout, Renato Mancuso, Kate Saenko

A critical problem with the practical utility of controllers trained with deep Reinforcement Learning (RL) is the notable lack of smoothness in the actions learned by the RL policies.

Deep Reinforcement Learning reinforcement-learning +1

Select, Label, and Mix: Learning Discriminative Invariant Feature Representations for Partial Domain Adaptation

no code implementations6 Dec 2020 Aadarsh Sahoo, Rameswar Panda, Rogerio Feris, Kate Saenko, Abir Das

Partial domain adaptation which assumes that the unknown target label space is a subset of the source label space has attracted much attention in computer vision.

Partial Domain Adaptation

Uncertainty-Aware Learning for Zero-Shot Semantic Segmentation

no code implementations NeurIPS 2020 Ping Hu, Stan Sclaroff, Kate Saenko

Recently, most ZSS methods focus on learning the visual-semantic correspondence to transfer knowledge from seen classes to unseen classes at the pixel level.

Semantic correspondence Semantic Segmentation +1

Temporal Action Detection with Multi-level Supervision

no code implementations ICCV 2021 Baifeng Shi, Qi Dai, Judy Hoffman, Kate Saenko, Trevor Darrell, Huijuan Xu

We extensively benchmark against the baselines for SSAD and OSAD on our created data splits in THUMOS14 and ActivityNet1. 2, and demonstrate the effectiveness of the proposed UFA and IB methods.

Action Detection Semi-Supervised Action Detection

Auxiliary Task Reweighting for Minimum-data Learning

no code implementations NeurIPS 2020 Baifeng Shi, Judy Hoffman, Kate Saenko, Trevor Darrell, Huijuan Xu

By adjusting the auxiliary task weights to minimize the divergence between the surrogate prior and the true prior of the main task, we obtain a more accurate prior estimation, achieving the goal of minimizing the required amount of training data for the main task and avoiding a costly grid search.

Domain Adaptation Multi-Label Classification

Detecting Cross-Modal Inconsistency to Defend Against Neural Fake News

1 code implementation EMNLP 2020 Reuben Tan, Bryan A. Plummer, Kate Saenko

In addition to the valuable insights gleaned from our user study experiments, we provide a relatively effective approach based on detecting visual-semantic inconsistencies, which will serve as an effective first line of defense and a useful reference for future work in defending against machine-generated disinformation.

Self-supervised Visual Attribute Learning for Fashion Compatibility

no code implementations1 Aug 2020 Donghyun Kim, Kuniaki Saito, Samarth Mishra, Stan Sclaroff, Kate Saenko, Bryan A Plummer

Our approach consists of three self-supervised tasks designed to capture different concepts that are neglected in prior work that we can select from depending on the needs of our downstream tasks.

Attribute Object Recognition +3

AR-Net: Adaptive Frame Resolution for Efficient Action Recognition

1 code implementation ECCV 2020 Yue Meng, Chung-Ching Lin, Rameswar Panda, Prasanna Sattigeri, Leonid Karlinsky, Aude Oliva, Kate Saenko, Rogerio Feris

Specifically, given a video frame, a policy network is used to decide what input resolution should be used for processing by the action recognition model, with the goal of improving both accuracy and efficiency.

Action Recognition

Domain2Vec: Domain Embedding for Unsupervised Domain Adaptation

1 code implementation ECCV 2020 Xingchao Peng, Yichen Li, Kate Saenko

Extensive experiments are conducted to demonstrate the power of our new datasets in benchmarking state-of-the-art multi-source domain adaptation methods, as well as the advantage of our proposed model.

Benchmarking Disentanglement +2

COCO-FUNIT: Few-Shot Unsupervised Image Translation with a Content Conditioned Style Encoder

1 code implementation ECCV 2020 Kuniaki Saito, Kate Saenko, Ming-Yu Liu

Unsupervised image-to-image translation intends to learn a mapping of an image in a given domain to an analogous image in a different domain, without explicit supervision of the mapping.

Translation Unsupervised Image-To-Image Translation

Real-time Semantic Segmentation with Fast Attention

1 code implementation7 Jul 2020 Ping Hu, Federico Perazzi, Fabian Caba Heilbron, Oliver Wang, Zhe Lin, Kate Saenko, Stan Sclaroff

The proposed architecture relies on our fast spatial attention, which is a simple yet efficient modification of the popular self-attention mechanism and captures the same rich spatial context at a small fraction of the computational cost, by changing the order of operations.

Real-Time Semantic Segmentation Segmentation

Neural Parameter Allocation Search

1 code implementation ICLR 2022 Bryan A. Plummer, Nikoli Dryden, Julius Frost, Torsten Hoefler, Kate Saenko

We introduce Neural Parameter Allocation Search (NPAS), a novel task where the goal is to train a neural network given an arbitrary, fixed parameter budget.

Image Classification Phrase Grounding

Learning visual servo policies via planner cloning

no code implementations24 May 2020 Ulrich Viereck, Kate Saenko, Robert Platt

Learning control policies for visual servoing in novel environments is an important problem.

Learning to Scale Multilingual Representations for Vision-Language Tasks

no code implementations ECCV 2020 Andrea Burns, Donghyun Kim, Derry Wijaya, Kate Saenko, Bryan A. Plummer

Current multilingual vision-language models either require a large number of additional parameters for each supported language, or suffer performance degradation as languages are added.

Language Modelling Machine Translation +4

Spatio-Temporal Action Detection with Multi-Object Interaction

no code implementations1 Apr 2020 Huijuan Xu, Lizhi Yang, Stan Sclaroff, Kate Saenko, Trevor Darrell

Spatio-temporal action detection in videos requires localizing the action both spatially and temporally in the form of an "action tube".

Action Detection Human Detection +2

Revisiting Few-shot Activity Detection with Class Similarity Control

no code implementations31 Mar 2020 Huijuan Xu, Ximeng Sun, Eric Tzeng, Abir Das, Kate Saenko, Trevor Darrell

In this paper, we present a conceptually simple and general yet novel framework for few-shot temporal activity detection based on proposal regression which detects the start and end time of the activities in untrimmed videos.

Action Detection Activity Detection +1

Log-Likelihood Ratio Minimizing Flows: Towards Robust and Quantifiable Neural Distribution Alignment

1 code implementation NeurIPS 2020 Ben Usman, Avneesh Sud, Nick Dufour, Kate Saenko

We show that, under certain assumptions, this combination yields a deep neural likelihood-based minimization objective that attains a known lower bound upon convergence.

Domain Adaptation Translation +1

Cross-domain Self-supervised Learning for Domain Adaptation with Few Source Labels

no code implementations18 Mar 2020 Donghyun Kim, Kuniaki Saito, Tae-Hyun Oh, Bryan A. Plummer, Stan Sclaroff, Kate Saenko

We show that when labeled source examples are limited, existing methods often fail to learn discriminative features applicable for both source and target domains.

Self-Supervised Learning Unsupervised Domain Adaptation

Universal Domain Adaptation through Self Supervision

1 code implementation NeurIPS 2020 Kuniaki Saito, Donghyun Kim, Stan Sclaroff, Kate Saenko

While some methods address target settings with either partial or open-set categories, they assume that the particular setting is known a priori.

Clustering Partial Domain Adaptation +2

A Broader Study of Cross-Domain Few-Shot Learning

2 code implementations ECCV 2020 Yunhui Guo, Noel C. Codella, Leonid Karlinsky, James V. Codella, John R. Smith, Kate Saenko, Tajana Rosing, Rogerio Feris

Extensive experiments on the proposed benchmark are performed to evaluate state-of-art meta-learning approaches, transfer learning approaches, and newer methods for cross-domain few-shot learning.

cross-domain few-shot learning Few-Shot Image Classification +1

Federated Adversarial Domain Adaptation

no code implementations ICLR 2020 Xingchao Peng, Zijun Huang, Yizhe Zhu, Kate Saenko

In this work, we present a principled approach to the problem of federated domain adaptation, which aims to align the representations learned among the different nodes with the data distribution of the target node.

Disentanglement Domain Adaptation +4

LoGAN: Latent Graph Co-Attention Network for Weakly-Supervised Video Moment Retrieval

no code implementations27 Sep 2019 Reuben Tan, Huijuan Xu, Kate Saenko, Bryan A. Plummer

However, while such approaches tend to focus on identifying relationships between elements of the video and language modalities, there is less emphasis on modeling relational context between video frames given the semantic context of the query.

Moment Retrieval Retrieval

Generalized Domain Adaptation with Covariate and Label Shift CO-ALignment

no code implementations25 Sep 2019 Shuhan Tan, Xingchao Peng, Kate Saenko

In this paper, we explore the task of Generalized Domain Adaptation (GDA): How to transfer knowledge across different domains in the presence of both covariate and label shift?

Domain Adaptation Transfer Learning

wMAN: WEAKLY-SUPERVISED MOMENT ALIGNMENT NETWORK FOR TEXT-BASED VIDEO SEGMENT RETRIEVAL

no code implementations25 Sep 2019 Reuben Tan, Huijuan Xu, Kate Saenko, Bryan A. Plummer

Given a video and a sentence, the goal of weakly-supervised video moment retrieval is to locate the video segment which is described by the sentence without having access to temporal annotations during training.

Moment Retrieval Retrieval +1

MULE: Multimodal Universal Language Embedding

no code implementations8 Sep 2019 Donghyun Kim, Kuniaki Saito, Kate Saenko, Stan Sclaroff, Bryan A. Plummer

In this paper, we present a modular approach which can easily be incorporated into existing vision-language methods in order to support many languages.

Data Augmentation Machine Translation +3

Learning Similarity Conditions Without Explicit Supervision

1 code implementation ICCV 2019 Reuben Tan, Mariya I. Vasileva, Kate Saenko, Bryan A. Plummer

Many real-world tasks require models to compare images along multiple similarity conditions (e. g. similarity in color, category or shape).

Triplet

Adversarial Self-Defense for Cycle-Consistent GANs

1 code implementation NeurIPS 2019 Dina Bashkirova, Ben Usman, Kate Saenko

The goal of unsupervised image-to-image translation is to map images from one domain to another without the ground truth correspondence between the two domains.

Adversarial Attack Translation +1

Are You Looking? Grounding to Multiple Modalities in Vision-and-Language Navigation

no code implementations ACL 2019 Ronghang Hu, Daniel Fried, Anna Rohrbach, Dan Klein, Trevor Darrell, Kate Saenko

The actual grounding can connect language to the environment through multiple modalities, e. g. "stop at the door" might ground into visual objects, while "turn right" might rely only on the geometric structure of a route.

Vision and Language Navigation

Language-Conditioned Graph Networks for Relational Reasoning

1 code implementation ICCV 2019 Ronghang Hu, Anna Rohrbach, Trevor Darrell, Kate Saenko

E. g., conditioning on the "on" relationship to the plate, the object "mug" gathers messages from the object "plate" to update its representation to "mug on the plate", which can be easily consumed by a simple classifier for answer prediction.

Object Referring Expression Comprehension +2

Domain Agnostic Learning with Disentangled Representations

1 code implementation28 Apr 2019 Xingchao Peng, Zijun Huang, Ximeng Sun, Kate Saenko

Unsupervised model transfer has the potential to greatly improve the generalizability of deep models to novel domains.

General Classification Image Classification +1

Semi-supervised Domain Adaptation via Minimax Entropy

4 code implementations ICCV 2019 Kuniaki Saito, Donghyun Kim, Stan Sclaroff, Trevor Darrell, Kate Saenko

Contemporary domain adaptation methods are very effective at aligning feature distributions of source and target domains without any target supervision.

Domain Adaptation Semi-supervised Domain Adaptation

Cross-Domain Image Manipulation by Demonstration

no code implementations28 Jan 2019 Ben Usman, Nick Dufour, Kate Saenko, Chris Bregler

In this work we propose a model that can manipulate individual visual attributes of objects in a real scene using examples of how respective attribute manipulations affect the output of a simulation.

Attribute Image Manipulation

Similarity R-C3D for Few-shot Temporal Activity Detection

no code implementations25 Dec 2018 Huijuan Xu, Bingyi Kang, Ximeng Sun, Jiashi Feng, Kate Saenko, Trevor Darrell

In this paper, we present a conceptually simple and general yet novel framework for few-shot temporal activity detection which detects the start and end time of the few-shot input activities in an untrimmed video.

Action Detection Activity Detection

TwoStreamVAN: Improving Motion Modeling in Video Generation

1 code implementation3 Dec 2018 Ximeng Sun, Huijuan Xu, Kate Saenko

Video generation is an inherently challenging task, as it requires modeling realistic temporal dynamics as well as spatial content.

Motion Generation Video Generation

SPLAT: Semantic Pixel-Level Adaptation Transforms for Detection

no code implementations3 Dec 2018 Eric Tzeng, Kaylee Burns, Kate Saenko, Trevor Darrell

Without dense labels, as is the case when only detection labels are available in the source, transformations are learned using CycleGAN alignment.

Domain Adaptation Pseudo Label +1

Revisiting Image-Language Networks for Open-ended Phrase Detection

3 code implementations17 Nov 2018 Bryan A. Plummer, Kevin J. Shih, Yichen Li, Ke Xu, Svetlana Lazebnik, Stan Sclaroff, Kate Saenko

Most existing work that grounds natural language phrases in images starts with the assumption that the phrase in question is relevant to the image.

object-detection Object Detection +1

Toward Driving Scene Understanding: A Dataset for Learning Driver Behavior and Causal Reasoning

no code implementations CVPR 2018 Vasili Ramanishka, Yi-Ting Chen, Teruhisa Misu, Kate Saenko

We present the Honda Research Institute Driving Dataset (HDD), a challenging dataset to enable research on learning driver behavior in real-life environments.

Scene Understanding

Exploiting Environmental Variation to Improve Policy Robustness in Reinforcement Learning

no code implementations27 Sep 2018 Siddharth Mysore, Robert Platt, Kate Saenko

We propose a novel method to exploit this observation to develop robust actor policies, by automatically developing a sampling curriculum over environment settings to use in training.

reinforcement-learning Reinforcement Learning +1

Object Hallucination in Image Captioning

1 code implementation EMNLP 2018 Anna Rohrbach, Lisa Anne Hendricks, Kaylee Burns, Trevor Darrell, Kate Saenko

Despite continuously improving performance, contemporary image captioning models are prone to "hallucinating" objects that are not actually in a scene.

Hallucination Image Captioning +3

Adapting control policies from simulation to reality using a pairwise loss

no code implementations27 Jul 2018 Ulrich Viereck, Xingchao Peng, Kate Saenko, Robert Platt

This paper proposes an approach to domain transfer based on a pairwise loss function that helps transfer control policies learned in simulation onto a real robot.

Explainable Neural Computation via Stack Neural Module Networks

1 code implementation ECCV 2018 Ronghang Hu, Jacob Andreas, Trevor Darrell, Kate Saenko

In complex inferential tasks like question answering, machine learning models must confront two challenges: the need to implement a compositional reasoning process, and, in many applications, the need for this reasoning process to be interpretable to assist users in both development and prediction.

Decision Making Question Answering +1

Syn2Real: A New Benchmark forSynthetic-to-Real Visual Domain Adaptation

no code implementations26 Jun 2018 Xingchao Peng, Ben Usman, Kuniaki Saito, Neela Kaushik, Judy Hoffman, Kate Saenko

In this paper, we present a new large-scale benchmark called Syn2Real, which consists of a synthetic domain rendered from 3D object models and two real-image domains containing the same object categories.

Classification Diversity +6

RISE: Randomized Input Sampling for Explanation of Black-box Models

12 code implementations19 Jun 2018 Vitali Petsiuk, Abir Das, Kate Saenko

We compare our approach to state-of-the-art importance extraction methods using both an automatic deletion/insertion metric and a pointing metric based on human-annotated object segments.

Explainable Artificial Intelligence (XAI) Feature Importance +5

Unsupervised Video-to-Video Translation

1 code implementation ICLR 2019 Dina Bashkirova, Ben Usman, Kate Saenko

Unsupervised image-to-image translation is a recently proposed task of translating an image to a different style or domain given only unpaired image examples at training time.

Translation Unsupervised Image-To-Image Translation

Speaker-Follower Models for Vision-and-Language Navigation

1 code implementation NeurIPS 2018 Daniel Fried, Ronghang Hu, Volkan Cirik, Anna Rohrbach, Jacob Andreas, Louis-Philippe Morency, Taylor Berg-Kirkpatrick, Kate Saenko, Dan Klein, Trevor Darrell

We use this speaker model to (1) synthesize new instructions for data augmentation and to (2) implement pragmatic reasoning, which evaluates how well candidate action sequences explain an instruction.

Data Augmentation Vision and Language Navigation

Multilevel Language and Vision Integration for Text-to-Clip Retrieval

1 code implementation13 Apr 2018 Huijuan Xu, Kun He, Bryan A. Plummer, Leonid Sigal, Stan Sclaroff, Kate Saenko

To capture the inherent structures present in both text and video, we introduce a multilevel model that integrates vision and language features earlier and more tightly than prior work.

Retrieval Sentence

Women also Snowboard: Overcoming Bias in Captioning Models

2 code implementations ECCV 2018 Kaylee Burns, Lisa Anne Hendricks, Kate Saenko, Trevor Darrell, Anna Rohrbach

We introduce a new Equalizer model that ensures equal gender probability when gender evidence is occluded in a scene and confident predictions when gender evidence is present.

Image Captioning

Joint Event Detection and Description in Continuous Video Streams

1 code implementation28 Feb 2018 Huijuan Xu, Boyang Li, Vasili Ramanishka, Leonid Sigal, Kate Saenko

In order to explicitly model temporal relationships between visual events and their captions in a single video, we also propose a two-level hierarchical captioning module that keeps track of context.

Dense Captioning Dense Video Captioning +2

Contextual Multi-Scale Region Convolutional 3D Network for Activity Detection

no code implementations28 Jan 2018 Yancheng Bai, Huijuan Xu, Kate Saenko, Bernard Ghanem

In this paper, we propose the contextual multi-scale region convolutional 3D network (CMS-RC3D) for activity detection.

Action Detection Activity Detection

Learning Multi-Level Hierarchies with Hindsight

4 code implementations4 Dec 2017 Andrew Levy, George Konidaris, Robert Platt, Kate Saenko

Hierarchical agents have the potential to solve sequential decision making tasks with greater sample efficiency than their non-hierarchical counterparts because hierarchical agents can break down tasks into sets of subtasks that only require short sequences of decisions.

Decision Making Hierarchical Reinforcement Learning +2

Adversarial Dropout Regularization

no code implementations ICLR 2018 Kuniaki Saito, Yoshitaka Ushiku, Tatsuya Harada, Kate Saenko

However, a drawback of this approach is that the critic simply labels the generated features as in-domain or not, without considering the boundaries between classes.

General Classification Image Classification +2

VisDA: The Visual Domain Adaptation Challenge

2 code implementations18 Oct 2017 Xingchao Peng, Ben Usman, Neela Kaushik, Judy Hoffman, Dequan Wang, Kate Saenko

We present the 2017 Visual Domain Adaptation (VisDA) dataset and challenge, a large-scale testbed for unsupervised domain adaptation across visual domains.

General Classification Image Classification +3

Stable Distribution Alignment Using the Dual of the Adversarial Distance

no code implementations ICLR 2018 Ben Usman, Kate Saenko, Brian Kulis

Our empirical results suggest that using the dual formulation for the restricted family of linear discriminators results in a more stable convergence to a desirable solution when compared with the performance of a primal min-max GAN-like objective and an MMD objective under the same restrictions.

Domain Adaptation

Grasp Pose Detection in Point Clouds

1 code implementation29 Jun 2017 Andreas ten Pas, Marcus Gualtieri, Kate Saenko, Robert Platt

Many grasp detection methods achieve grasp success rates (grasp successes as a fraction of the total number of grasp attempts) between 75% and 95% for novel objects presented in isolation or in light clutter.

Robotics

Learning a visuomotor controller for real world robotic grasping using simulated depth images

no code implementations14 Jun 2017 Ulrich Viereck, Andreas ten Pas, Kate Saenko, Robert Platt

This paper proposes an approach to learning a closed-loop controller for robotic grasping that dynamically guides the gripper to the object.

Robotic Grasping

Adversarial Discriminative Domain Adaptation

20 code implementations CVPR 2017 Eric Tzeng, Judy Hoffman, Kate Saenko, Trevor Darrell

Adversarial learning methods are a promising approach to training robust deep networks, and can generate complex samples across diverse domains.