Search Results for author: Kate Saenko

Found 187 papers, 95 papers with code

MetaPose: Fast 3D Pose from Multiple Views without 3D Supervision

1 code implementation • CVPR 2022 • Ben Usman, Andrea Tagliasacchi, Kate Saenko, Avneesh Sud

In the era of deep learning, human pose estimation from multiple cameras with unknown calibration has received little attention to date.

Ranked #1 on 3D Human Pose Estimation on SkiPose

Weakly-supervised 3D Human Pose Estimation

32,735

Paper
Code

MaskSketch: Unpaired Structure-guided Masked Image Generation

2 code implementations • CVPR 2023 • Dina Bashkirova, Jose Lezama, Kihyuk Sohn, Kate Saenko, Irfan Essa

We show that intermediate self-attention maps of a masked generative transformer encode important structural information of the input image, such as scene layout and object shape, and we propose a novel sampling method based on this observation to enable structure-guided generation.

Conditional Image Generation Image-to-Image Translation +2

27,729

Paper
Code

COCO-FUNIT: Few-Shot Unsupervised Image Translation with a Content Conditioned Style Encoder

1 code implementation • ECCV 2020 • Kuniaki Saito, Kate Saenko, Ming-Yu Liu

Unsupervised image-to-image translation intends to learn a mapping of an image in a given domain to an analogous image in a different domain, without explicit supervision of the mapping.

Translation Unsupervised Image-To-Image Translation

3,937

Paper
Code

CyCADA: Cycle-Consistent Adversarial Domain Adaptation

3 code implementations • ICML 2018 • Judy Hoffman, Eric Tzeng, Taesung Park, Jun-Yan Zhu, Phillip Isola, Kate Saenko, Alexei A. Efros, Trevor Darrell

Domain adaptation is critical for success in new, unseen environments.

Ranked #1 on Unsupervised Image-To-Image Translation on SVNH-to-MNIST

Domain Adaptation Semantic Segmentation +2

3,124

Paper
Code

Adversarial Discriminative Domain Adaptation

20 code implementations • CVPR 2017 • Eric Tzeng, Judy Hoffman, Kate Saenko, Trevor Darrell

Adversarial learning methods are a promising approach to training robust deep networks, and can generate complex samples across diverse domains.

Ranked #3 on Unsupervised Image-To-Image Translation on SVNH-to-MNIST

General Classification Unsupervised Domain Adaptation +1

3,124

Paper
Code

Deep CORAL: Correlation Alignment for Deep Domain Adaptation

9 code implementations • 6 Jul 2016 • Baochen Sun, Kate Saenko

CORAL is a "frustratingly easy" unsupervised domain adaptation method that aligns the second-order statistics of the source and target distributions with a linear transformation.

Ranked #4 on Domain Generalization on NICO Animal

Domain Generalization Unsupervised Domain Adaptation

3,124

Paper
Code

A Suite of Generative Tasks for Multi-Level Multimodal Webpage Understanding

1 code implementation • 5 May 2023 • Andrea Burns, Krishna Srinivasan, Joshua Ainslie, Geoff Brown, Bryan A. Plummer, Kate Saenko, Jianmo Ni, Mandy Guo

Webpages have been a rich, scalable resource for vision-language and language only tasks.

Image Captioning

953

Paper
Code

WikiWeb2M: A Page-Level Multimodal Wikipedia Dataset

2 code implementations • 9 May 2023 • Andrea Burns, Krishna Srinivasan, Joshua Ainslie, Geoff Brown, Bryan A. Plummer, Kate Saenko, Jianmo Ni, Mandy Guo

Webpages have been a rich resource for language and vision-language tasks.

Image Captioning

953

Paper
Code

Modeling Relationships in Referential Expressions with Compositional Modular Networks

2 code implementations • CVPR 2017 • Ronghang Hu, Marcus Rohrbach, Jacob Andreas, Trevor Darrell, Kate Saenko

In this paper we instead present a modular deep architecture capable of analyzing referential expressions into their component parts, identifying entities and relationships mentioned in the input expression and grounding them all in the scene.

Ranked #1 on Visual Question Answering (VQA) on Visual7W

Visual Question Answering (VQA)

745

Paper
Code

High precision grasp pose detection in dense clutter

3 code implementations • 4 Mar 2016 • Marcus Gualtieri, Andreas ten Pas, Kate Saenko, Robert Platt

Our focus in this paper is on improving the second step by using depth sensor scans from large online datasets to train a convolutional neural network.

Robotics

567

Paper
Code

Grasp Pose Detection in Point Clouds

1 code implementation • 29 Jun 2017 • Andreas ten Pas, Marcus Gualtieri, Kate Saenko, Robert Platt

Many grasp detection methods achieve grasp success rates (grasp successes as a fraction of the total number of grasp attempts) between 75% and 95% for novel objects presented in isolation or in light clutter.

Robotics

567

Paper
Code

RISE: Randomized Input Sampling for Explanation of Black-box Models

11 code implementations • 19 Jun 2018 • Vitali Petsiuk, Abir Das, Kate Saenko

We compare our approach to state-of-the-art importance extraction methods using both an automatic deletion/insertion metric and a pointing metric based on human-annotated object segments.

Decision Making

483

Paper
Code

Strong-Weak Distribution Alignment for Adaptive Object Detection

2 code implementations • CVPR 2019 • Kuniaki Saito, Yoshitaka Ushiku, Tatsuya Harada, Kate Saenko

This motivates us to propose a novel method for detector adaptation based on strong local alignment and weak global alignment.

Ranked #2 on Unsupervised Domain Adaptation on SIM10K to BDD100K

Object object-detection +2

341

Paper
Code

Correlation Alignment for Unsupervised Domain Adaptation

4 code implementations • 6 Dec 2016 • Baochen Sun, Jiashi Feng, Kate Saenko

In contrast to subspace manifold methods, it aligns the original feature distributions of the source and target domains, rather than the bases of lower-dimensional subspaces.

Ranked #8 on Domain Adaptation on Office-Caltech

Unsupervised Domain Adaptation

330

Paper
Code

Tune it the Right Way: Unsupervised Validation of Domain Adaptation via Soft Neighborhood Density

2 code implementations • ICCV 2021 • Kuniaki Saito, Donghyun Kim, Piotr Teterwak, Stan Sclaroff, Trevor Darrell, Kate Saenko

Unsupervised domain adaptation (UDA) methods can dramatically improve generalization on unlabeled target domains.

Image Classification Semantic Segmentation +1

319

Paper
Code

Semi-supervised Domain Adaptation via Minimax Entropy

3 code implementations • ICCV 2019 • Kuniaki Saito, Donghyun Kim, Stan Sclaroff, Trevor Darrell, Kate Saenko

Contemporary domain adaptation methods are very effective at aligning feature distributions of source and target domains without any target supervision.

Domain Adaptation Semi-supervised Domain Adaptation

288

Paper
Code

Learning Multi-Level Hierarchies with Hindsight

4 code implementations • 4 Dec 2017 • Andrew Levy, George Konidaris, Robert Platt, Kate Saenko

Hierarchical agents have the potential to solve sequential decision making tasks with greater sample efficiency than their non-hierarchical counterparts because hierarchical agents can break down tasks into sets of subtasks that only require short sequences of decisions.

Decision Making Hierarchical Reinforcement Learning

270

Paper
Code

Learning to Reason: End-to-End Module Networks for Visual Question Answering

1 code implementation • ICCV 2017 • Ronghang Hu, Jacob Andreas, Marcus Rohrbach, Trevor Darrell, Kate Saenko

Natural language questions are inherently compositional, and many are most easily answered by reasoning about their decomposition into modular sub-problems.

Ranked #43 on Visual Question Answering (VQA) on VQA v2 test-dev

Visual Dialog Visual Question Answering

270

Paper
Code

R-C3D: Region Convolutional 3D Network for Temporal Activity Detection

3 code implementations • ICCV 2017 • Huijuan Xu, Abir Das, Kate Saenko

We address the problem of activity detection in continuous, untrimmed video streams.

Ranked #1 on Action Recognition In Videos on THUMOS’14

Action Detection Action Recognition In Videos +2

254

Paper
Code

A Broader Study of Cross-Domain Few-Shot Learning

2 code implementations • ECCV 2020 • Yunhui Guo, Noel C. Codella, Leonid Karlinsky, James V. Codella, John R. Smith, Kate Saenko, Tajana Rosing, Rogerio Feris

Extensive experiments on the proposed benchmark are performed to evaluate state-of-art meta-learning approaches, transfer learning approaches, and newer methods for cross-domain few-shot learning.

Ranked #3 on Cross-Domain Few-Shot on Plantae

cross-domain few-shot learning Few-Shot Image Classification +1

217

Paper
Code

VisDA: The Visual Domain Adaptation Challenge

2 code implementations • 18 Oct 2017 • Xingchao Peng, Ben Usman, Neela Kaushik, Judy Hoffman, Dequan Wang, Kate Saenko

We present the 2017 Visual Domain Adaptation (VisDA) dataset and challenge, a large-scale testbed for unsupervised domain adaptation across visual domains.

General Classification Image Classification +3

169

Paper
Code

Women also Snowboard: Overcoming Bias in Captioning Models

2 code implementations • ECCV 2018 • Kaylee Burns, Lisa Anne Hendricks, Kate Saenko, Trevor Darrell, Anna Rohrbach

We introduce a new Equalizer model that ensures equal gender probability when gender evidence is occluded in a scene and confident predictions when gender evidence is present.

Image Captioning

167

Paper
Code

Domain Agnostic Learning with Disentangled Representations

1 code implementation • 28 Apr 2019 • Xingchao Peng, Zijun Huang, Ximeng Sun, Kate Saenko

Unsupervised model transfer has the potential to greatly improve the generalizability of deep models to novel domains.

Ranked #4 on Multi-target Domain Adaptation on DomainNet

General Classification Image Classification +1

140

Paper
Code

Pic2Word: Mapping Pictures to Words for Zero-shot Composed Image Retrieval

1 code implementation • CVPR 2023 • Kuniaki Saito, Kihyuk Sohn, Xiang Zhang, Chun-Liang Li, Chen-Yu Lee, Kate Saenko, Tomas Pfister

Existing methods rely on supervised learning of CIR models using labeled triplets consisting of the query image, text specification, and the target image.

Ranked #1 on Zero-shot Image Retrieval on ImageNet-R

Attribute Retrieval +2

136

Paper
Code

Speaker-Follower Models for Vision-and-Language Navigation

1 code implementation • NeurIPS 2018 • Daniel Fried, Ronghang Hu, Volkan Cirik, Anna Rohrbach, Jacob Andreas, Louis-Philippe Morency, Taylor Berg-Kirkpatrick, Kate Saenko, Dan Klein, Trevor Darrell

We use this speaker model to (1) synthesize new instructions for data augmentation and to (2) implement pragmatic reasoning, which evaluates how well candidate action sequences explain an instruction.

Data Augmentation Vision and Language Navigation

124

Paper
Code

Universal Domain Adaptation through Self Supervision

1 code implementation • NeurIPS 2020 • Kuniaki Saito, Donghyun Kim, Stan Sclaroff, Kate Saenko

While some methods address target settings with either partial or open-set categories, they assume that the particular setting is known a priori.

Clustering Partial Domain Adaptation +2

122

Paper
Code

Natural Language Object Retrieval

1 code implementation • CVPR 2016 • Ronghang Hu, Huazhe Xu, Marcus Rohrbach, Jiashi Feng, Kate Saenko, Trevor Darrell

In this paper, we address the task of natural language object retrieval, to localize a target object within a given image based on a natural language query of the object.

Ranked #12 on Referring Expression Comprehension on Talk2Car

Image Captioning Image Retrieval +4

112

Paper
Code

AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning

2 code implementations • NeurIPS 2020 • Ximeng Sun, Rameswar Panda, Rogerio Feris, Kate Saenko

Multi-task learning is an open and challenging problem in computer vision.

Ranked #104 on Semantic Segmentation on NYU Depth v2

Multi-Task Learning Semantic Segmentation

109

Paper
Code

Long-term Recurrent Convolutional Networks for Visual Recognition and Description

7 code implementations • CVPR 2015 • Jeff Donahue, Lisa Anne Hendricks, Marcus Rohrbach, Subhashini Venugopalan, Sergio Guadarrama, Kate Saenko, Trevor Darrell

Models based on deep convolutional networks have dominated recent image interpretation tasks; we investigate whether models which are also recurrent, or "temporally deep", are effective for tasks involving sequences, visual and otherwise.

Ranked #3 on Human Interaction Recognition on BIT

Retrieval Video Recognition

Paper
Code

Language-Conditioned Graph Networks for Relational Reasoning

1 code implementation • ICCV 2019 • Ronghang Hu, Anna Rohrbach, Trevor Darrell, Kate Saenko

E. g., conditioning on the "on" relationship to the plate, the object "mug" gathers messages from the object "plate" to update its representation to "mug on the plate", which can be easily consumed by a simple classifier for answer prediction.

Ranked #3 on Referring Expression Comprehension on CLEVR-Ref+

Object Referring Expression Comprehension +2

Paper
Code

OVANet: One-vs-All Network for Universal Domain Adaptation

2 code implementations • ICCV 2021 • Kuniaki Saito, Kate Saenko

In this paper, we propose a method to learn the threshold using source samples and to adapt it to the target domain.

Ranked #6 on Universal Domain Adaptation on DomainNet

Universal Domain Adaptation

Paper
Code

Many-to-many Splatting for Efficient Video Frame Interpolation

1 code implementation • CVPR 2022 • Ping Hu, Simon Niklaus, Stan Sclaroff, Kate Saenko

Motion-based video frame interpolation commonly relies on optical flow to warp pixels from the inputs to the desired interpolation instant.

Ranked #1 on Video Frame Interpolation on Xiph-4K (Crop)

Motion Estimation Optical Flow Estimation +1

Paper
Code

Deep Compositional Captioning: Describing Novel Object Categories without Paired Training Data

1 code implementation • CVPR 2016 • Lisa Anne Hendricks, Subhashini Venugopalan, Marcus Rohrbach, Raymond Mooney, Kate Saenko, Trevor Darrell

Current deep caption models can only describe objects contained in paired image-sentence corpora, despite the fact that they are pre-trained with large object recognition datasets, namely ImageNet.

Image Captioning Novel Concepts +3

Paper
Code

Explainable Neural Computation via Stack Neural Module Networks

1 code implementation • ECCV 2018 • Ronghang Hu, Jacob Andreas, Trevor Darrell, Kate Saenko

In complex inferential tasks like question answering, machine learning models must confront two challenges: the need to implement a compositional reasoning process, and, in many applications, the need for this reasoning process to be interpretable to assist users in both development and prediction.

Ranked #14 on Referring Expression Comprehension on Talk2Car

Decision Making Question Answering +1

Paper
Code

Deep Domain Confusion: Maximizing for Domain Invariance

7 code implementations • 10 Dec 2014 • Eric Tzeng, Judy Hoffman, Ning Zhang, Kate Saenko, Trevor Darrell

Recent reports suggest that a generic supervised deep CNN model trained on a large-scale dataset reduces, but does not remove, dataset bias on a standard benchmark.

Ranked #6 on Domain Adaptation on Office-Caltech

Domain Adaptation Model Selection +1

Paper
Code

Sequence to Sequence -- Video to Text

4 code implementations • 3 May 2015 • Subhashini Venugopalan, Marcus Rohrbach, Jeff Donahue, Raymond Mooney, Trevor Darrell, Kate Saenko

Our LSTM model is trained on video-sentence pairs and learns to associate a sequence of video frames to a sequence of words in order to generate a description of the event in the video clip.

Caption Generation Language Modelling +1

Paper
Code

Translating Videos to Natural Language Using Deep Recurrent Neural Networks

1 code implementation • HLT 2015 • Subhashini Venugopalan, Huijuan Xu, Jeff Donahue, Marcus Rohrbach, Raymond Mooney, Kate Saenko

Solving the visual symbol grounding problem has long been a goal of artificial intelligence.

Sentence Text Generation +1

Paper
Code

Black-box Explanation of Object Detectors via Saliency Maps

2 code implementations • CVPR 2021 • Vitali Petsiuk, Rajiv Jain, Varun Manjunatha, Vlad I. Morariu, Ashutosh Mehra, Vicente Ordonez, Kate Saenko

We propose D-RISE, a method for generating visual explanations for the predictions of object detectors.

Object object-detection +1

Paper
Code

AR-Net: Adaptive Frame Resolution for Efficient Action Recognition

1 code implementation • ECCV 2020 • Yue Meng, Chung-Ching Lin, Rameswar Panda, Prasanna Sattigeri, Leonid Karlinsky, Aude Oliva, Kate Saenko, Rogerio Feris

Specifically, given a video frame, a policy network is used to decide what input resolution should be used for processing by the action recognition model, with the goal of improving both accuracy and efficiency.

Action Recognition

Paper
Code

Extending the WILDS Benchmark for Unsupervised Adaptation

1 code implementation • ICLR 2022 • Shiori Sagawa, Pang Wei Koh, Tony Lee, Irena Gao, Sang Michael Xie, Kendrick Shen, Ananya Kumar, Weihua Hu, Michihiro Yasunaga, Henrik Marklund, Sara Beery, Etienne David, Ian Stavness, Wei Guo, Jure Leskovec, Kate Saenko, Tatsunori Hashimoto, Sergey Levine, Chelsea Finn, Percy Liang

Unlabeled data can be a powerful point of leverage for mitigating these distribution shifts, as it is frequently much more available than labeled data and can often be obtained from distributions beyond the source distribution as well.

Paper
Code

OpenMatch: Open-set Consistency Regularization for Semi-supervised Learning with Outliers

1 code implementation • 28 May 2021 • Kuniaki Saito, Donghyun Kim, Kate Saenko

OpenMatch achieves state-of-the-art performance on three datasets, and even outperforms a fully supervised model in detecting outliers unseen in unlabeled data on CIFAR10.

Novelty Detection Outlier Detection

Paper
Code

OpenMatch: Open-Set Semi-supervised Learning with Open-set Consistency Regularization

1 code implementation • NeurIPS 2021 • Kuniaki Saito, Donghyun Kim, Kate Saenko

\ours achieves state-of-the-art performance on three datasets, and even outperforms a fully supervised model in detecting outliers unseen in unlabeled data on CIFAR10.

Ranked #2 on Semi-Supervised Image Classification on CIFAR-10, 400 Labels (OpenSet, 6/4)

Novelty Detection Outlier Detection +1

Paper
Code

Semi-Supervised Action Recognition with Temporal Contrastive Learning

1 code implementation • CVPR 2021 • Ankit Singh, Omprakash Chakraborty, Ashutosh Varshney, Rameswar Panda, Rogerio Feris, Kate Saenko, Abir Das

We approach this problem by learning a two-pathway temporal contrastive model using unlabeled videos at two different speeds leveraging the fact that changing video speed does not change an action.

Action Recognition Contrastive Learning

Paper
Code

Real-time Semantic Segmentation with Fast Attention

1 code implementation • 7 Jul 2020 • Ping Hu, Federico Perazzi, Fabian Caba Heilbron, Oliver Wang, Zhe Lin, Kate Saenko, Stan Sclaroff

The proposed architecture relies on our fast spatial attention, which is a simple yet efficient modification of the popular self-attention mechanism and captures the same rich spatial context at a small fraction of the computational cost, by changing the order of operations.

Ranked #32 on Semantic Segmentation on DensePASS

Real-Time Semantic Segmentation Segmentation

Paper
Code

AdaMML: Adaptive Multi-Modal Learning for Efficient Video Recognition

1 code implementation • ICCV 2021 • Rameswar Panda, Chun-Fu Chen, Quanfu Fan, Ximeng Sun, Kate Saenko, Aude Oliva, Rogerio Feris

Specifically, given a video segment, a multi-modal policy network is used to decide what modalities should be used for processing by the recognition model, with the goal of improving both accuracy and efficiency.

Video Recognition

Paper
Code

LSDA: Large Scale Detection Through Adaptation

1 code implementation • NeurIPS 2014 • Judy Hoffman, Sergio Guadarrama, Eric Tzeng, Ronghang Hu, Jeff Donahue, Ross Girshick, Trevor Darrell, Kate Saenko

A major challenge in scaling object detection is the difficulty of obtaining labeled images for large numbers of categories.

Classification General Classification +2

Paper
Code

Fine-grained Angular Contrastive Learning with Coarse Labels

1 code implementation • CVPR 2021 • Guy Bukchin, Eli Schwartz, Kate Saenko, Ori Shahar, Rogerio Feris, Raja Giryes, Leonid Karlinsky

A very practical example of C2FS is when the target classes are sub-classes of the training classes.

Contrastive Learning Few-Shot Learning +1

Paper
Code

Learning Similarity Conditions Without Explicit Supervision

1 code implementation • ICCV 2019 • Reuben Tan, Mariya I. Vasileva, Kate Saenko, Bryan A. Plummer

Many real-world tasks require models to compare images along multiple similarity conditions (e. g. similarity in color, category or shape).

Paper
Code

Multilevel Language and Vision Integration for Text-to-Clip Retrieval

1 code implementation • 13 Apr 2018 • Huijuan Xu, Kun He, Bryan A. Plummer, Leonid Sigal, Stan Sclaroff, Kate Saenko

To capture the inherent structures present in both text and video, we introduce a multilevel model that integrates vision and language features earlier and more tightly than prior work.

Retrieval Sentence

Paper
Code

VisDA-2021 Competition Universal Domain Adaptation to Improve Performance on Out-of-Distribution Data

1 code implementation • 23 Jul 2021 • Dina Bashkirova, Dan Hendrycks, Donghyun Kim, Samarth Mishra, Kate Saenko, Kuniaki Saito, Piotr Teterwak, Ben Usman

Progress in machine learning is typically measured by training and testing a model on the same distribution of data, i. e., the same domain.

BIG-bench Machine Learning Universal Domain Adaptation +1

Paper
Code

DualCoOp: Fast Adaptation to Multi-Label Recognition with Limited Annotations

1 code implementation • 20 Jun 2022 • Ximeng Sun, Ping Hu, Kate Saenko

Solving multi-label recognition (MLR) for images in the low-label regime is a challenging task with many real-world applications.

Ranked #2 on Multi-label Image Recognition with Partial Labels on PASCAL VOC 2007

Multi-label Image Recognition with Partial Labels

Paper
Code

Object Hallucination in Image Captioning

1 code implementation • EMNLP 2018 • Anna Rohrbach, Lisa Anne Hendricks, Kaylee Burns, Trevor Darrell, Kate Saenko

Despite continuously improving performance, contemporary image captioning models are prone to "hallucinating" objects that are not actually in a scene.

Hallucination Image Captioning +2

Paper
Code

Mobile App Tasks with Iterative Feedback (MoTIF): Addressing Task Feasibility in Interactive Visual Environments

1 code implementation • 17 Apr 2021 • Andrea Burns, Deniz Arsan, Sanjna Agrawal, Ranjitha Kumar, Kate Saenko, Bryan A. Plummer

In recent years, vision-language research has shifted to study tasks which require more complex reasoning, such as interactive question answering, visual common sense reasoning, and question-answer plausibility prediction.

Common Sense Reasoning Question Answering

Paper
Code

A Dataset for Interactive Vision-Language Navigation with Unknown Command Feasibility

1 code implementation • 4 Feb 2022 • Andrea Burns, Deniz Arsan, Sanjna Agrawal, Ranjitha Kumar, Kate Saenko, Bryan A. Plummer

To study VLN with unknown command feasibility, we introduce a new dataset Mobile app Tasks with Iterative Feedback (MoTIF), where the goal is to complete a natural language command in a mobile app.

Common Sense Reasoning Question Answering +1

Paper
Code

ZeroWaste Dataset: Towards Deformable Object Segmentation in Cluttered Scenes

1 code implementation • CVPR 2022 • Dina Bashkirova, Mohamed Abdelfattah, Ziliang Zhu, James Akl, Fadi Alladkani, Ping Hu, Vitaly Ablavsky, Berk Calli, Sarah Adel Bargal, Kate Saenko

Recyclable waste detection poses a unique computer vision challenge as it requires detection of highly deformable and often translucent objects in cluttered scenes without the kind of context information usually present in human-centric datasets.

Object object-detection +5

Paper
Code

Revisiting Image-Language Networks for Open-ended Phrase Detection

3 code implementations • 17 Nov 2018 • Bryan A. Plummer, Kevin J. Shih, Yichen Li, Ke Xu, Svetlana Lazebnik, Stan Sclaroff, Kate Saenko

Most existing work that grounds natural language phrases in images starts with the assumption that the phrase in question is relevant to the image.

object-detection Object Detection +1

Paper
Code

Active Domain Adaptation via Clustering Uncertainty-weighted Embeddings

1 code implementation • ICCV 2021 • Viraj Prabhu, Arjun Chandrasekaran, Kate Saenko, Judy Hoffman

Generalizing deep neural networks to new target domains is critical to their real-world utility.

Active Learning Clustering +2

Paper
Code

Simultaneous Deep Transfer Across Domains and Tasks

1 code implementation • ICCV 2015 • Eric Tzeng, Judy Hoffman, Trevor Darrell, Kate Saenko

Recent reports suggest that a generic supervised deep CNN model trained on a large-scale dataset reduces, but does not remove, dataset bias.

Domain Adaptation

Paper
Code

Unsupervised Domain Generalization by Learning a Bridge Across Domains

1 code implementation • CVPR 2022 • Sivan Harary, Eli Schwartz, Assaf Arbelle, Peter Staar, Shady Abu-Hussein, Elad Amrani, Roei Herzig, Amit Alfassy, Raja Giryes, Hilde Kuehne, Dina Katabi, Kate Saenko, Rogerio Feris, Leonid Karlinsky

The ability to generalize learned representations across significantly different visual domains, such as between real photos, clipart, paintings, and sketches, is a fundamental capacity of the human visual system.

Domain Generalization Self-Supervised Learning

Paper
Code

Return of Frustratingly Easy Domain Adaptation

1 code implementation • 17 Nov 2015 • Baochen Sun, Jiashi Feng, Kate Saenko

Unlike human learning, machine learning often fails to handle changes between training (source) and test (target) input distributions.

Ranked #4 on Domain Adaptation on Synth Digits-to-SVHN

BIG-bench Machine Learning Unsupervised Domain Adaptation

Paper
Code

Separating Skills and Concepts for Novel Visual Question Answering

1 code implementation • CVPR 2021 • Spencer Whitehead, Hui Wu, Heng Ji, Rogerio Feris, Kate Saenko

Generalization to out-of-distribution data has been a problem for Visual Question Answering (VQA) models.

Attribute Contrastive Learning +2

Paper
Code

Joint Event Detection and Description in Continuous Video Streams

1 code implementation • 28 Feb 2018 • Huijuan Xu, Boyang Li, Vasili Ramanishka, Leonid Sigal, Kate Saenko

In order to explicitly model temporal relationships between visual events and their captions in a single video, we also propose a two-level hierarchical captioning module that keeps track of context.

Dense Captioning Dense Video Captioning +2

Paper
Code

Detecting Cross-Modal Inconsistency to Defend Against Neural Fake News

1 code implementation • EMNLP 2020 • Reuben Tan, Bryan A. Plummer, Kate Saenko

In addition to the valuable insights gleaned from our user study experiments, we provide a relatively effective approach based on detecting visual-semantic inconsistencies, which will serve as an effective first line of defense and a useful reference for future work in defending against machine-generated disinformation.

Paper
Code

NewsStories: Illustrating articles with visual summaries

1 code implementation • 26 Jul 2022 • Reuben Tan, Bryan A. Plummer, Kate Saenko, JP Lewis, Avneesh Sud, Thomas Leung

Thus, we explore a novel setting where the goal is to learn a self-supervised visual-language representation that is robust to varying text length and the number of images.

Retrieval

Paper
Code

FETA: Towards Specializing Foundation Models for Expert Task Applications

1 code implementation • 8 Sep 2022 • Amit Alfassy, Assaf Arbelle, Oshri Halimi, Sivan Harary, Roei Herzig, Eli Schwartz, Rameswar Panda, Michele Dolfi, Christoph Auer, Kate Saenko, PeterW. J. Staar, Rogerio Feris, Leonid Karlinsky

However, as we show in this paper, FMs still have poor out-of-the-box performance on expert tasks (e. g. retrieval of car manuals technical illustrations from language queries), data for which is either unseen or belonging to a long-tail part of the data distribution of the huge datasets used for FM pre-training.

Ranked #1 on Image-to-Text Retrieval on FETA Car-Manuals

Domain Generalization Image Retrieval +6

Paper
Code

VisDA 2022 Challenge: Domain Adaptation for Industrial Waste Sorting

1 code implementation • 26 Mar 2023 • Dina Bashkirova, Samarth Mishra, Diala Lteif, Piotr Teterwak, Donghyun Kim, Fadi Alladkani, James Akl, Berk Calli, Sarah Adel Bargal, Kate Saenko, Daehan Kim, Minseok Seo, YoungJin Jeon, Dong-Geol Choi, Shahaf Ettedgui, Raja Giryes, Shady Abu-Hussein, Binhui Xie, Shuang Li

To test the abilities of computer vision models on this task, we present the VisDA 2022 Challenge on Domain Adaptation for Industrial Waste Sorting.

Data Augmentation Domain Generalization +1

Paper
Code

A Unified Framework for Domain Adaptive Pose Estimation

1 code implementation • 1 Apr 2022 • Donghyun Kim, Kaihong Wang, Kate Saenko, Margrit Betke, Stan Sclaroff

In this paper, we investigate the problem of domain adaptive 2D pose estimation that transfers knowledge learned on a synthetic source domain to a target domain without supervision.

2D Pose Estimation Animal Pose Estimation +2

Paper
Code

Lasagna: Layered Score Distillation for Disentangled Object Relighting

1 code implementation • 30 Nov 2023 • Dina Bashkirova, Arijit Ray, Rupayan Mallick, Sarah Adel Bargal, Jianming Zhang, Ranjay Krishna, Kate Saenko

Although generative editing methods now enable some forms of image editing, relighting is still beyond today's capabilities; existing methods struggle to keep other aspects of the image -- colors, shapes, and textures -- consistent after the edit.

Colorization Object +1

Paper
Code

Top-down Visual Saliency Guided by Captions

6 code implementations • CVPR 2017 • Vasili Ramanishka, Abir Das, Jianming Zhang, Kate Saenko

Neural image/video captioning models can generate accurate descriptions, but their internal process of mapping regions to words is a black box and therefore difficult to explain.

Sentence Video Captioning

Paper
Code

A Broad Study of Pre-training for Domain Generalization and Adaptation

1 code implementation • 22 Mar 2022 • Donghyun Kim, Kaihong Wang, Stan Sclaroff, Kate Saenko

In this paper, we provide a broad study and in-depth analysis of pre-training for domain adaptation and generalization, namely: network architectures, size, pre-training loss, and datasets.

Domain Generalization

Paper
Code

Dynamic Network Quantization for Efficient Video Inference

1 code implementation • ICCV 2021 • Ximeng Sun, Rameswar Panda, Chun-Fu Chen, Aude Oliva, Rogerio Feris, Kate Saenko

Deep convolutional networks have recently achieved great success in video recognition, yet their practical realization remains a challenge due to the large amount of computational resources required to achieve robust recognition.

Quantization Video Recognition

Paper
Code

ERM++: An Improved Baseline for Domain Generalization

1 code implementation • 4 Apr 2023 • Piotr Teterwak, Kuniaki Saito, Theodoros Tsiligkaridis, Kate Saenko, Bryan A. Plummer

We also explore the relationship between DG performance and similarity to pre-training data, and find that similarity to pre-training data distributions is an important driver of performance, but that ERM++ with stronger initializations can deliver strong performance even on dissimilar datasets. Code is released at https://github. com/piotr-teterwak/erm_plusplus.

Domain Generalization

Paper
Code

Bias Mimicking: A Simple Sampling Approach for Bias Mitigation

1 code implementation • CVPR 2023 • Maan Qraitem, Kate Saenko, Bryan A. Plummer

Using this notion, BM, through a novel training procedure, ensures that the model is exposed to the entire distribution per epoch without repeating samples.

Paper
Code

Exploring Consistency in Cross-Domain Transformer for Domain Adaptive Semantic Segmentation

1 code implementation • 27 Nov 2022 • Kaihong Wang, Donghyun Kim, Rogerio Feris, Kate Saenko, Margrit Betke

We propose to perform adaptation on attention maps with cross-domain attention layers that share features between the source and the target domains.

Semantic Segmentation Unsupervised Domain Adaptation

Paper
Code

Adversarial Self-Defense for Cycle-Consistent GANs

1 code implementation • NeurIPS 2019 • Dina Bashkirova, Ben Usman, Kate Saenko

The goal of unsupervised image-to-image translation is to map images from one domain to another without the ground truth correspondence between the two domains.

Adversarial Attack Translation +1

Paper
Code

Unsupervised Video-to-Video Translation

1 code implementation • ICLR 2019 • Dina Bashkirova, Ben Usman, Kate Saenko

Unsupervised image-to-image translation is a recently proposed task of translating an image to a different style or domain given only unpaired image examples at training time.

Translation Unsupervised Image-To-Image Translation

Paper
Code

Improving LSTM-based Video Description with Linguistic Knowledge Mined from Text

3 code implementations • EMNLP 2016 • Subhashini Venugopalan, Lisa Anne Hendricks, Raymond Mooney, Kate Saenko

This paper investigates how linguistic knowledge mined from large text corpora can aid the generation of natural language descriptions of videos.

Descriptive Language Modelling +1

Paper
Code

TwoStreamVAN: Improving Motion Modeling in Video Generation

1 code implementation • 3 Dec 2018 • Ximeng Sun, Huijuan Xu, Kate Saenko

Video generation is an inherently challenging task, as it requires modeling realistic temporal dynamics as well as spatial content.

Video Generation

Paper
Code

Surprisingly Simple Semi-Supervised Domain Adaptation with Pretraining and Consistency

1 code implementation • 29 Jan 2021 • Samarth Mishra, Kate Saenko, Venkatesh Saligrama

With our Pretraining and Consistency (PAC) approach, we achieve state of the art target accuracy on this semi-supervised domain adaptation task, surpassing multiple adversarial domain alignment methods, across multiple datasets.

Semi-supervised Domain Adaptation Unsupervised Domain Adaptation

Paper
Code

Captioning Images with Diverse Objects

1 code implementation • CVPR 2017 • Subhashini Venugopalan, Lisa Anne Hendricks, Marcus Rohrbach, Raymond Mooney, Trevor Darrell, Kate Saenko

We propose minimizing a joint objective which can learn from these diverse data sources and leverage distributional semantic embeddings, enabling the model to generalize and describe novel objects outside of image-caption datasets.

Object Object Recognition

Paper
Code

Neural Parameter Allocation Search

1 code implementation • ICLR 2022 • Bryan A. Plummer, Nikoli Dryden, Julius Frost, Torsten Hoefler, Kate Saenko

We introduce Neural Parameter Allocation Search (NPAS), a novel task where the goal is to train a neural network given an arbitrary, fixed parameter budget.

Image Classification Phrase Grounding

Paper
Code

Moment Matching for Multi-Source Domain Adaptation

3 code implementations • ICCV 2019 • Xingchao Peng, Qinxun Bai, Xide Xia, Zijun Huang, Kate Saenko, Bo wang

Conventional unsupervised domain adaptation (UDA) assumes that training data are sampled from a single domain.

Ranked #6 on Multi-Source Unsupervised Domain Adaptation on Office-31

Benchmarking Multi-Source Unsupervised Domain Adaptation +1

Paper
Code

Explaining Reinforcement Learning Policies through Counterfactual Trajectories

1 code implementation • 29 Jan 2022 • Julius Frost, Olivia Watkins, Eric Weiner, Pieter Abbeel, Trevor Darrell, Bryan Plummer, Kate Saenko

In order for humans to confidently decide where to employ RL agents for real-world tasks, a human developer must validate that the agent will perform well at test-time.

counterfactual Decision Making +2

Paper
Code

Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering

1 code implementation • 17 Nov 2015 • Huijuan Xu, Kate Saenko

We propose a novel spatial attention architecture that aligns words with image patches in the first hop, and obtain improved results by adding a second attention hop which considers the whole question to choose visual evidence based on the results of the first hop.

Ranked #12 on Visual Question Answering (VQA) on COCO Visual Question Answering (VQA) real images 1.0 open ended

Image Captioning Question Answering +1

Paper
Code

Detector-Free Weakly Supervised Grounding by Separation

1 code implementation • ICCV 2021 • Assaf Arbelle, Sivan Doveh, Amit Alfassy, Joseph Shtok, Guy Lev, Eli Schwartz, Hilde Kuehne, Hila Barak Levi, Prasanna Sattigeri, Rameswar Panda, Chun-Fu Chen, Alex Bronstein, Kate Saenko, Shimon Ullman, Raja Giryes, Rogerio Feris, Leonid Karlinsky

In this work, we focus on the task of Detector-Free WSG (DF-WSG) to solve WSG without relying on a pre-trained detector.

Ranked #1 on Phrase Grounding on Visual Genome

Phrase Grounding

Paper
Code

Mind the Backbone: Minimizing Backbone Distortion for Robust Object Detection

1 code implementation • 26 Mar 2023 • Kuniaki Saito, Donghyun Kim, Piotr Teterwak, Rogerio Feris, Kate Saenko

We propose to use Relative Gradient Norm (RGN) as a way to measure the vulnerability of a backbone to feature distortion, and show that high RGN is indeed correlated with lower OOD performance.

object-detection Robust Object Detection

Paper
Code

Evaluation of Correctness in Unsupervised Many-to-Many Image Translation

1 code implementation • 29 Mar 2021 • Dina Bashkirova, Ben Usman, Kate Saenko

Given an input image from a source domain and a guidance image from a target domain, unsupervised many-to-many image-to-image (UMMI2I) translation methods seek to generate a plausible example from the target domain that preserves domain-invariant information of the input source image and inherits the domain-specific information from the guidance image.

Translation

Paper
Code

Why do These Match? Explaining the Behavior of Image Similarity Models

1 code implementation • ECCV 2020 • Bryan A. Plummer, Mariya I. Vasileva, Vitali Petsiuk, Kate Saenko, David Forsyth

Explaining a deep learning model can help users understand its behavior and allow researchers to discern its shortcomings.

Attribute General Classification +3

Paper
Code

Log-Likelihood Ratio Minimizing Flows: Towards Robust and Quantifiable Neural Distribution Alignment

1 code implementation • NeurIPS 2020 • Ben Usman, Avneesh Sud, Nick Dufour, Kate Saenko

We show that, under certain assumptions, this combination yields a deep neural likelihood-based minimization objective that attains a known lower bound upon convergence.

Domain Adaptation Translation +1

Paper
Code

Domain2Vec: Domain Embedding for Unsupervised Domain Adaptation

1 code implementation • ECCV 2020 • Xingchao Peng, Yichen Li, Kate Saenko

Extensive experiments are conducted to demonstrate the power of our new datasets in benchmarking state-of-the-art multi-source domain adaptation methods, as well as the advantage of our proposed model.

Benchmarking Disentanglement +2

Paper
Code

The SwaNNFlight System: On-the-Fly Sim-to-Real Adaptation via Anchored Learning

1 code implementation • 17 Jan 2023 • Bassel El Mabsout, Shahin Roozkhosh, Siddharth Mysore, Kate Saenko, Renato Mancuso

And (iii) anchor critics to help stabilize the fine-tuning of agents during sim-to-real transfer, online learning from real data while retaining behavior optimized in simulation.

Reinforcement Learning (RL)

Paper
Code

Learning to Compose SuperWeights for Neural Parameter Allocation Search

1 code implementation • 3 Dec 2023 • Piotr Teterwak, Soren Nelson, Nikoli Dryden, Dina Bashkirova, Kate Saenko, Bryan A. Plummer

To address this, we generate layer weights by learning to compose sets of SuperWeights, which represent a group of trainable parameters.

Paper
Code

Hierarchical Reinforcement Learning with Hindsight

no code implementations • ICLR 2019 • Andrew Levy, Robert Platt, Kate Saenko

Reinforcement Learning (RL) algorithms can suffer from poor sample efficiency when rewards are delayed and sparse.

Hierarchical Reinforcement Learning reinforcement-learning +1

Paper
Add Code

Adversarial Dropout Regularization

no code implementations • ICLR 2018 • Kuniaki Saito, Yoshitaka Ushiku, Tatsuya Harada, Kate Saenko

However, a drawback of this approach is that the critic simply labels the generated features as in-domain or not, without considering the boundaries between classes.

Ranked #2 on Synthetic-to-Real Translation on Syn2Real-C

General Classification Image Classification +2

Paper
Add Code

Stable Distribution Alignment Using the Dual of the Adversarial Distance

no code implementations • ICLR 2018 • Ben Usman, Kate Saenko, Brian Kulis

Our empirical results suggest that using the dual formulation for the restricted family of linear discriminators results in a more stable convergence to a desirable solution when compared with the performance of a primal min-max GAN-like objective and an MMD objective under the same restrictions.

Domain Adaptation

Paper
Add Code

Contextual Multi-Scale Region Convolutional 3D Network for Activity Detection

no code implementations • 28 Jan 2018 • Yancheng Bai, Huijuan Xu, Kate Saenko, Bernard Ghanem

In this paper, we propose the contextual multi-scale region convolutional 3D network (CMS-RC3D) for activity detection.

Action Detection Activity Detection

Paper
Add Code

Learning a visuomotor controller for real world robotic grasping using simulated depth images

no code implementations • 14 Jun 2017 • Ulrich Viereck, Andreas ten Pas, Kate Saenko, Robert Platt

This paper proposes an approach to learning a closed-loop controller for robotic grasping that dynamically guides the gripper to the object.

Robotic Grasping

Paper
Add Code

Adapting Deep Visuomotor Representations with Weak Pairwise Constraints

no code implementations • 23 Nov 2015 • Eric Tzeng, Coline Devin, Judy Hoffman, Chelsea Finn, Pieter Abbeel, Sergey Levine, Kate Saenko, Trevor Darrell

We propose a novel, more powerful combination of both distribution and pairwise image alignment, and remove the requirement for expensive annotation by using weakly aligned pairs of images in the source and target domains.

Domain Adaptation

Paper
Add Code

Synthetic to Real Adaptation with Generative Correlation Alignment Networks

no code implementations • 19 Jan 2017 • Xingchao Peng, Kate Saenko

Experimentally, we show training off-the-shelf classifiers on the newly generated data can significantly boost performance when testing on the real image domains (PASCAL VOC 2007 benchmark and Office dataset), improving upon several existing methods.

Domain Adaptation Object Recognition

Paper
Add Code

Combining Texture and Shape Cues for Object Recognition With Minimal Supervision

no code implementations • 14 Sep 2016 • Xingchao Peng, Kate Saenko

We present a novel approach to object classification and detection which requires minimal supervision and which combines visual texture cues and shape information learned from freely available unlabeled web search results.

General Classification Image Retrieval +1

Paper
Add Code

Fine-to-coarse Knowledge Transfer For Low-Res Image Classification

no code implementations • 21 May 2016 • Xingchao Peng, Judy Hoffman, Stella X. Yu, Kate Saenko

We address the difficult problem of distinguishing fine-grained object categories in low resolution images.

Classification General Classification +2

Paper
Add Code

A Multi-scale Multiple Instance Video Description Network

no code implementations • 21 May 2015 • Huijuan Xu, Subhashini Venugopalan, Vasili Ramanishka, Marcus Rohrbach, Kate Saenko

Most state-of-the-art methods for solving this problem borrow existing deep convolutional neural network (CNN) architectures (AlexNet, GoogLeNet) to extract a visual representation of the input video.

Image Segmentation Multiple Instance Learning +3

Paper
Add Code

Learning Deep Object Detectors from 3D Models

no code implementations • ICCV 2015 • Xingchao Peng, Baochen Sun, Karim Ali, Kate Saenko

Crowdsourced 3D CAD models are becoming easily accessible online, and can potentially generate an infinite number of training images for almost any object category. We show that augmenting the training data of contemporary Deep Convolutional Neural Net (DCNN) models with such synthetic data can be effective, especially when real training data is limited or not well matched to the target domain.

Object

Paper
Add Code

Spatial Semantic Regularisation for Large Scale Object Detection

no code implementations • ICCV 2015 • Damian Mrowca, Marcus Rohrbach, Judy Hoffman, Ronghang Hu, Kate Saenko, Trevor Darrell

Our approach proves to be especially useful in large scale settings with thousands of classes, where spatial and semantic interactions are very frequent and only weakly supervised detectors can be built due to a lack of bounding box annotations.

Clustering Object +2

Paper
Add Code

What Do Deep CNNs Learn About Objects?

no code implementations • 9 Apr 2015 • Xingchao Peng, Baochen Sun, Karim Ali, Kate Saenko

Deep convolutional neural networks learn extremely powerful image representations, yet most of that power is hidden in the millions of deep-layer parameters.

Paper
Add Code

Detector Discovery in the Wild: Joint Multiple Instance and Representation Learning

no code implementations • CVPR 2015 • Judy Hoffman, Deepak Pathak, Trevor Darrell, Kate Saenko

We develop methods for detector learning which exploit joint training over both weak and strong labels and which transfer learned perceptual representations from strongly-labeled auxiliary tasks.

Multiple Instance Learning Representation Learning +1

Paper
Add Code

Modeling Radiometric Uncertainty for Vision with Tone-mapped Color Images

no code implementations • 27 Nov 2013 • Ayan Chakrabarti, Ying Xiong, Baochen Sun, Trevor Darrell, Daniel Scharstein, Todd Zickler, Kate Saenko

To produce images that are suitable for display, tone-mapping is widely used in digital cameras to map linear color measurements into narrow gamuts with limited dynamic range.

Tone Mapping

Paper
Add Code

One-Shot Adaptation of Supervised Deep Convolutional Models

no code implementations • 21 Dec 2013 • Judy Hoffman, Eric Tzeng, Jeff Donahue, Yangqing Jia, Kate Saenko, Trevor Darrell

In other words, are deep CNNs trained on large amounts of labeled data as susceptible to dataset bias as previous methods have been shown to be?

Domain Adaptation Image Classification

Paper
Add Code

Towards Adapting ImageNet to Reality: Scalable Domain Adaptation with Implicit Low-rank Transformations

no code implementations • 20 Aug 2013 • Erik Rodner, Judy Hoffman, Jeff Donahue, Trevor Darrell, Kate Saenko

Images seen during test time are often not from the same distribution as images used for learning.

Domain Adaptation Scene Understanding

Paper
Add Code

Efficient Learning of Domain-invariant Image Representations

no code implementations • 15 Jan 2013 • Judy Hoffman, Erik Rodner, Jeff Donahue, Trevor Darrell, Kate Saenko

We present an algorithm that learns representations which explicitly compensate for domain mismatch and which can be efficiently realized as linear classifiers.

Representation Learning

Paper
Add Code

Syn2Real: A New Benchmark forSynthetic-to-Real Visual Domain Adaptation

no code implementations • 26 Jun 2018 • Xingchao Peng, Ben Usman, Kuniaki Saito, Neela Kaushik, Judy Hoffman, Kate Saenko

In this paper, we present a new large-scale benchmark called Syn2Real, which consists of a synthetic domain rendered from 3D object models and two real-image domains containing the same object categories.

Classification Domain Adaptation +5

Paper
Add Code

Women also Snowboard: Overcoming Bias in Captioning Models (Extended Abstract)

no code implementations • 2 Jul 2018 • Lisa Anne Hendricks, Kaylee Burns, Kate Saenko, Trevor Darrell, Anna Rohrbach

Most machine learning methods are known to capture and exploit biases of the training data.

Image Captioning

Paper
Add Code

Adapting control policies from simulation to reality using a pairwise loss

no code implementations • 27 Jul 2018 • Ulrich Viereck, Xingchao Peng, Kate Saenko, Robert Platt

This paper proposes an approach to domain transfer based on a pairwise loss function that helps transfer control policies learned in simulation onto a real robot.

Paper
Add Code

Toward Driving Scene Understanding: A Dataset for Learning Driver Behavior and Causal Reasoning

no code implementations • CVPR 2018 • Vasili Ramanishka, Yi-Ting Chen, Teruhisa Misu, Kate Saenko

We present the Honda Research Institute Driving Dataset (HDD), a challenging dataset to enable research on learning driver behavior in real-life environments.

Scene Understanding

Paper
Add Code

SPLAT: Semantic Pixel-Level Adaptation Transforms for Detection

no code implementations • 3 Dec 2018 • Eric Tzeng, Kaylee Burns, Kate Saenko, Trevor Darrell

Without dense labels, as is the case when only detection labels are available in the source, transformations are learned using CycleGAN alignment.

Domain Adaptation Pseudo Label +1

Paper
Add Code

Guided Zoom: Questioning Network Evidence for Fine-grained Classification

no code implementations • 6 Dec 2018 • Sarah Adel Bargal, Andrea Zunino, Vitali Petsiuk, Jianming Zhang, Kate Saenko, Vittorio Murino, Stan Sclaroff

We propose Guided Zoom, an approach that utilizes spatial grounding of a model's decision to make more informed predictions.

Classification General Classification

Paper
Add Code

MUTT: Metric Unit TesTing for Language Generation Tasks

no code implementations • ACL 2016 • William Boag, Renan Campos, Kate Saenko, Anna Rumshisky

Image Captioning Machine Translation +2

Paper
Add Code

Size Matters: Metric Visual Search Constraints from Monocular Metadata

no code implementations • NeurIPS 2010 • Mario Fritz, Kate Saenko, Trevor Darrell

Metric constraints are known to be highly discriminative for many objects, but if training is limited to data captured from a particular 3-D sensor the quantity of training data may be severly limited.

Paper
Add Code

Filtering Abstract Senses From Image Search Results

no code implementations • NeurIPS 2009 • Kate Saenko, Trevor Darrell

When faced with the task of learning a visual model based only on the name of an object, a common approach is to find images on the web that are associated with the object name, and then train a visual classifier from the search result.

Clustering Image Clustering +2

Paper
Add Code

Unsupervised Learning of Visual Sense Models for Polysemous Words

no code implementations • NeurIPS 2008 • Kate Saenko, Trevor Darrell

Polysemy is a problem for methods that exploit image search engines to build object category models.

Image Retrieval

Paper
Add Code

Similarity R-C3D for Few-shot Temporal Activity Detection

no code implementations • 25 Dec 2018 • Huijuan Xu, Bingyi Kang, Ximeng Sun, Jiashi Feng, Kate Saenko, Trevor Darrell

In this paper, we present a conceptually simple and general yet novel framework for few-shot temporal activity detection which detects the start and end time of the few-shot input activities in an untrimmed video.

Action Detection Activity Detection

Paper
Add Code

Semi-supervised Domain Adaptation with Instance Constraints

no code implementations • CVPR 2013 • Jeff Donahue, Judy Hoffman, Erik Rodner, Kate Saenko, Trevor Darrell

Most successful object classification and detection methods rely on classifiers trained on large labeled datasets.

Domain Adaptation General Classification +5

Paper
Add Code

Continuous Manifold Based Adaptation for Evolving Visual Domains

no code implementations • CVPR 2014 • Judy Hoffman, Trevor Darrell, Kate Saenko

The classic domain adaptation paradigm considers the world to be separated into stationary domains with clear boundaries between them.

Domain Adaptation

Paper
Add Code

Confidence-Rated Multiple Instance Boosting for Object Detection

no code implementations • CVPR 2014 • Karim Ali, Kate Saenko

In this paper, we propose a new MIL method for object detection that is capable of handling the noisier automatically obtained annotations.

Multiple Instance Learning Object +3

Paper
Add Code

Sequence to Sequence - Video to Text

no code implementations • ICCV 2015 • Subhashini Venugopalan, Marcus Rohrbach, Jeffrey Donahue, Raymond Mooney, Trevor Darrell, Kate Saenko

Our LSTM model is trained on video-sentence pairs and learns to associate a sequence of video frames to a sequence of words in order to generate a description of the event in the video clip.

Caption Generation Language Modelling +1

Paper
Add Code

Cross-Domain Image Manipulation by Demonstration

no code implementations • 28 Jan 2019 • Ben Usman, Nick Dufour, Kate Saenko, Chris Bregler

In this work we propose a model that can manipulate individual visual attributes of objects in a real scene using examples of how respective attribute manipulations affect the output of a simulation.

Attribute Image Manipulation

Paper
Add Code

Generating Natural-Language Video Descriptions Using Text-Mined Knowledge

no code implementations • WS 2013 • Niveda Krishnamoorthy, Girish Malkarnenkar, Raymond Mooney, Kate Saenko, Sergio Guadarrama

Language Modelling Text Generation

Paper
Add Code

Integrating Language and Vision to Generate Natural Language Descriptions of Videos in the Wild

no code implementations • COLING 2014 • Jesse Thomason, Subhashini Venugopalan, Sergio Guadarrama, Kate Saenko, Raymond Mooney

Language Modelling

Paper
Add Code

Are You Looking? Grounding to Multiple Modalities in Vision-and-Language Navigation

no code implementations • ACL 2019 • Ronghang Hu, Daniel Fried, Anna Rohrbach, Dan Klein, Trevor Darrell, Kate Saenko

The actual grounding can connect language to the environment through multiple modalities, e. g. "stop at the door" might ground into visual objects, while "turn right" might rely only on the geometric structure of a route.

Vision and Language Navigation

Paper
Add Code

Two-Stream Region Convolutional 3D Network for Temporal Activity Detection

no code implementations • 5 Jun 2019 • Huijuan Xu, Abir Das, Kate Saenko

We address the problem of temporal activity detection in continuous, untrimmed video streams.

Ranked #4 on Action Recognition on THUMOS’14

Action Detection Action Recognition +4

Paper
Add Code

Weakly-supervised Compositional FeatureAggregation for Few-shot Recognition

no code implementations • 11 Jun 2019 • Ping Hu, Ximeng Sun, Kate Saenko, Stan Sclaroff

Learning from a few examples is a challenging task for machine learning.

Action Recognition Few-Shot Image Classification +1

Paper
Add Code

Language Features Matter: Effective Language Representations for Vision-Language Tasks

no code implementations • ICCV 2019 • Andrea Burns, Reuben Tan, Kate Saenko, Stan Sclaroff, Bryan A. Plummer

Shouldn't language and vision features be treated equally in vision-language (VL) tasks?

Image Captioning Language Modelling +6

Paper
Add Code

MULE: Multimodal Universal Language Embedding

no code implementations • 8 Sep 2019 • Donghyun Kim, Kuniaki Saito, Kate Saenko, Stan Sclaroff, Bryan A. Plummer

In this paper, we present a modular approach which can easily be incorporated into existing vision-language methods in order to support many languages.

Data Augmentation Machine Translation +2

Paper
Add Code

LoGAN: Latent Graph Co-Attention Network for Weakly-Supervised Video Moment Retrieval

no code implementations • 27 Sep 2019 • Reuben Tan, Huijuan Xu, Kate Saenko, Bryan A. Plummer

However, while such approaches tend to focus on identifying relationships between elements of the video and language modalities, there is less emphasis on modeling relational context between video frames given the semantic context of the query.

Moment Retrieval Retrieval

Paper
Add Code

Class-imbalanced Domain Adaptation: An Empirical Odyssey

no code implementations • 23 Oct 2019 • Shuhan Tan, Xingchao Peng, Kate Saenko

Unsupervised domain adaptation is a promising way to generalize deep models to novel domains.

Transfer Learning Unsupervised Domain Adaptation

Paper
Add Code

Federated Adversarial Domain Adaptation

no code implementations • ICLR 2020 • Xingchao Peng, Zijun Huang, Yizhe Zhu, Kate Saenko

In this work, we present a principled approach to the problem of federated domain adaptation, which aims to align the representations learned among the different nodes with the data distribution of the target node.

Disentanglement Domain Adaptation +4

Paper
Add Code

Explainable Deep Classification Models for Domain Generalization

no code implementations • 13 Mar 2020 • Andrea Zunino, Sarah Adel Bargal, Riccardo Volpi, Mehrnoosh Sameki, Jianming Zhang, Stan Sclaroff, Vittorio Murino, Kate Saenko

Explanations are defined as regions of visual evidence upon which a deep classification network makes a decision.

Classification Domain Generalization +1

Paper
Add Code

Cross-domain Self-supervised Learning for Domain Adaptation with Few Source Labels

no code implementations • 18 Mar 2020 • Donghyun Kim, Kuniaki Saito, Tae-Hyun Oh, Bryan A. Plummer, Stan Sclaroff, Kate Saenko

We show that when labeled source examples are limited, existing methods often fail to learn discriminative features applicable for both source and target domains.

Self-Supervised Learning Unsupervised Domain Adaptation

Paper
Add Code

Spatio-Temporal Action Detection with Multi-Object Interaction

no code implementations • 1 Apr 2020 • Huijuan Xu, Lizhi Yang, Stan Sclaroff, Kate Saenko, Trevor Darrell

Spatio-temporal action detection in videos requires localizing the action both spatially and temporally in the form of an "action tube".

Action Detection Human Detection +2

Paper
Add Code

Revisiting Few-shot Activity Detection with Class Similarity Control

no code implementations • 31 Mar 2020 • Huijuan Xu, Ximeng Sun, Eric Tzeng, Abir Das, Kate Saenko, Trevor Darrell

In this paper, we present a conceptually simple and general yet novel framework for few-shot temporal activity detection based on proposal regression which detects the start and end time of the activities in untrimmed videos.

Action Detection Activity Detection +1

Paper
Add Code

Learning to Scale Multilingual Representations for Vision-Language Tasks

no code implementations • ECCV 2020 • Andrea Burns, Donghyun Kim, Derry Wijaya, Kate Saenko, Bryan A. Plummer

Current multilingual vision-language models either require a large number of additional parameters for each supported language, or suffer performance degradation as languages are added.

Language Modelling Machine Translation +3

Paper
Add Code

Learning visual servo policies via planner cloning

no code implementations • 24 May 2020 • Ulrich Viereck, Kate Saenko, Robert Platt

Learning control policies for visual servoing in novel environments is an important problem.

Paper
Add Code

Self-supervised Visual Attribute Learning for Fashion Compatibility

no code implementations • 1 Aug 2020 • Donghyun Kim, Kuniaki Saito, Samarth Mishra, Stan Sclaroff, Kate Saenko, Bryan A Plummer

Our approach consists of three self-supervised tasks designed to capture different concepts that are neglected in prior work that we can select from depending on the needs of our downstream tasks.

Attribute Object Recognition +3

Paper
Add Code

Auxiliary Task Reweighting for Minimum-data Learning

no code implementations • NeurIPS 2020 • Baifeng Shi, Judy Hoffman, Kate Saenko, Trevor Darrell, Huijuan Xu

By adjusting the auxiliary task weights to minimize the divergence between the surrogate prior and the true prior of the main task, we obtain a more accurate prior estimation, achieving the goal of minimizing the required amount of training data for the main task and avoiding a costly grid search.

Domain Adaptation Multi-Label Classification

Paper
Add Code

Uncertainty-Aware Learning for Zero-Shot Semantic Segmentation

no code implementations • NeurIPS 2020 • Ping Hu, Stan Sclaroff, Kate Saenko

Recently, most ZSS methods focus on learning the visual-semantic correspondence to transfer knowledge from seen classes to unseen classes at the pixel level.

Semantic correspondence Semantic Segmentation +1

Paper
Add Code

Temporal Action Detection with Multi-level Supervision

no code implementations • ICCV 2021 • Baifeng Shi, Qi Dai, Judy Hoffman, Kate Saenko, Trevor Darrell, Huijuan Xu

We extensively benchmark against the baselines for SSAD and OSAD on our created data splits in THUMOS14 and ActivityNet1. 2, and demonstrate the effectiveness of the proposed UFA and IB methods.

Action Detection Semi-Supervised Action Detection

Paper
Add Code

Learning from Lexical Perturbations for Consistent Visual Question Answering

1 code implementation • 26 Nov 2020 • Spencer Whitehead, Hui Wu, Yi Ren Fung, Heng Ji, Rogerio Feris, Kate Saenko

Existing Visual Question Answering (VQA) models are often fragile and sensitive to input variations.

Question Answering Visual Question Answering +1

Paper
Code

Select, Label, and Mix: Learning Discriminative Invariant Feature Representations for Partial Domain Adaptation

no code implementations • 6 Dec 2020 • Aadarsh Sahoo, Rameswar Panda, Rogerio Feris, Kate Saenko, Abir Das

Partial domain adaptation which assumes that the unknown target label space is a subset of the source label space has attracted much attention in computer vision.

Ranked #1 on Partial Domain Adaptation on Office-31

Partial Domain Adaptation

Paper
Add Code

Regularizing Action Policies for Smooth Control with Reinforcement Learning

no code implementations • 11 Dec 2020 • Siddharth Mysore, Bassel Mabsout, Renato Mancuso, Kate Saenko

A critical problem with the practical utility of controllers trained with deep Reinforcement Learning (RL) is the notable lack of smoothness in the actions learned by the RL policies.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

AdaFuse: Adaptive Temporal Fusion Network for Efficient Action Recognition

no code implementations • ICLR 2021 • Yue Meng, Rameswar Panda, Chung-Ching Lin, Prasanna Sattigeri, Leonid Karlinsky, Kate Saenko, Aude Oliva, Rogerio Feris

Temporal modelling is the key for efficient video action recognition.

Action Recognition Temporal Action Localization

Paper
Add Code

VA-RED$^2$: Video Adaptive Redundancy Reduction

no code implementations • ICLR 2021 • Bowen Pan, Rameswar Panda, Camilo Fosco, Chung-Ching Lin, Alex Andonian, Yue Meng, Kate Saenko, Aude Oliva, Rogerio Feris

An inherent property of real-world videos is the high correlation of information across frames which can translate into redundancy in either temporal or spatial feature maps of the models, or both.

Paper
Add Code

Honey, I Shrunk The Actor: A Case Study on Preserving Performance with Smaller Actors in Actor-Critic RL

no code implementations • 23 Feb 2021 • Siddharth Mysore, Bassel Mabsout, Renato Mancuso, Kate Saenko

Actors and critics in actor-critic reinforcement learning algorithms are functionally separate, yet they often use the same network architectures.

Reinforcement Learning (RL)

Paper
Add Code

Improved Techniques for Quantizing Deep Networks with Adaptive Bit-Widths

no code implementations • 2 Mar 2021 • Ximeng Sun, Rameswar Panda, Chun-Fu Chen, Naigang Wang, Bowen Pan, Kailash Gopalakrishnan, Aude Oliva, Rogerio Feris, Kate Saenko

Second, to effectively transfer knowledge, we develop a dynamic block swapping method by randomly replacing the blocks in the lower-precision student network with the corresponding blocks in the higher-precision teacher network.

Image Classification Quantization +2

Paper
Add Code

Learning Cross-modal Contrastive Features for Video Domain Adaptation

no code implementations • ICCV 2021 • Donghyun Kim, Yi-Hsuan Tsai, Bingbing Zhuang, Xiang Yu, Stan Sclaroff, Kate Saenko, Manmohan Chandraker

Learning transferable and domain adaptive feature representations from videos is important for video-relevant tasks such as action recognition.

Action Recognition Contrastive Learning +2

Paper
Add Code

CDS: Cross-Domain Self-Supervised Pre-Training

no code implementations • ICCV 2021 • Donghyun Kim, Kuniaki Saito, Tae-Hyun Oh, Bryan A. Plummer, Stan Sclaroff, Kate Saenko

We present a two-stage pre-training approach that improves the generalization ability of standard single-domain pre-training.

Domain Adaptation Transfer Learning

Paper
Add Code

MixtureEnsembles: Leveraging Parameter Sharing for Efficient Ensembles

no code implementations • 29 Sep 2021 • Piotr Teterwak, Nikoli Dryden, Dina Bashkirova, Kate Saenko, Bryan A. Plummer

We improve on these methods with MixtureEnsembles, which learns to factorize ensemble members with shared parameters by constructing each layer with a linear combination of templates.

Paper
Add Code

Pyramid Mini-Batching for Optimal Transport

no code implementations • 29 Sep 2021 • Devin Guillory, Kuniaki Saito, Eric Tzeng, Yannik Pitcan, Kate Saenko, Trevor Darrell

Optimal transport theory provides a useful tool to measure the differences between two distributions.

Domain Adaptation

Paper
Add Code

Multi-Critic Actor Learning: Teaching RL Policies to Act with Style

no code implementations • ICLR 2022 • Siddharth Mysore, George Cheng, Yunqi Zhao, Kate Saenko, Meng Wu

MultiCriticAL is tested in the context of multi-style learning, a special case of MTRL where agents are trained to behave with different distinct behavior styles, and yields up to 45% performance gains over the single-critic baselines and even successfully learns behavior styles in cases where single-critic approaches may simply fail to learn.

Paper
Add Code

Look at What I'm Doing: Self-Supervised Spatial Grounding of Narrations in Instructional Videos

no code implementations • 20 Oct 2021 • Reuben Tan, Bryan A. Plummer, Kate Saenko, Hailin Jin, Bryan Russell

Key to our approach is the ability to learn to spatially localize interactions with self-supervision on a large corpus of videos with accompanying transcribed narrations.

Paper
Add Code

Contrast and Mix: Temporal Contrastive Video Domain Adaptation with Background Mixing

no code implementations • NeurIPS 2021 • Aadarsh Sahoo, Rutav Shah, Rameswar Panda, Kate Saenko, Abir Das

Unsupervised domain adaptation which aims to adapt models trained on a labeled source domain to a completely unlabeled target domain has attracted much attention in recent years.

Ranked #2 on Unsupervised Domain Adaptation on UCF-HMDB

Contrastive Learning Unsupervised Domain Adaptation +1

Paper
Add Code

Look at What I’m Doing: Self-Supervised Spatial Grounding of Narrations in Instructional Videos

no code implementations • NeurIPS 2021 • Reuben Tan, Bryan Plummer, Kate Saenko, Hailin Jin, Bryan Russell

Key to our approach is the ability to learn to spatially localize interactions with self-supervision on a large corpus of videos with accompanying transcribed narrations.

Paper
Add Code

Exploiting Environmental Variation to Improve Policy Robustness in Reinforcement Learning

no code implementations • 27 Sep 2018 • Siddharth Mysore, Robert Platt, Kate Saenko

We propose a novel method to exploit this observation to develop robust actor policies, by automatically developing a sampling curriculum over environment settings to use in training.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

wMAN: WEAKLY-SUPERVISED MOMENT ALIGNMENT NETWORK FOR TEXT-BASED VIDEO SEGMENT RETRIEVAL

no code implementations • 25 Sep 2019 • Reuben Tan, Huijuan Xu, Kate Saenko, Bryan A. Plummer

Given a video and a sentence, the goal of weakly-supervised video moment retrieval is to locate the video segment which is described by the sentence without having access to temporal annotations during training.

Moment Retrieval Retrieval +1

Paper
Add Code

Generalized Domain Adaptation with Covariate and Label Shift CO-ALignment

no code implementations • 25 Sep 2019 • Shuhan Tan, Xingchao Peng, Kate Saenko

In this paper, we explore the task of Generalized Domain Adaptation (GDA): How to transfer knowledge across different domains in the presence of both covariate and label shift?

Domain Adaptation Transfer Learning

Paper
Add Code

Disentangled Unsupervised Image Translation via Restricted Information Flow

no code implementations • 26 Nov 2021 • Ben Usman, Dina Bashkirova, Kate Saenko

Unsupervised image-to-image translation methods aim to map images from one domain into plausible examples from another domain while preserving structures shared across two domains.

Attribute Translation +1

Paper
Add Code

Task2Sim : Towards Effective Pre-training and Transfer from Synthetic Data

no code implementations • 30 Nov 2021 • Samarth Mishra, Rameswar Panda, Cheng Perng Phoo, Chun-Fu Chen, Leonid Karlinsky, Kate Saenko, Venkatesh Saligrama, Rogerio S. Feris

It is thus better to tailor synthetic pre-training data to a specific downstream task, for best performance.

Paper
Add Code

Learning to Detect Every Thing in an Open World

no code implementations • 3 Dec 2021 • Kuniaki Saito, Ping Hu, Trevor Darrell, Kate Saenko

LDET leads to significant improvements on many datasets in the open-world instance segmentation task, outperforming baselines on cross-category generalization on COCO, as well as cross-dataset evaluation on UVO and Cityscapes.

Data Augmentation object-detection +3

Paper
Add Code

The Abduction of Sherlock Holmes: A Dataset for Visual Abductive Reasoning

no code implementations • 10 Feb 2022 • Jack Hessel, Jena D. Hwang, Jae Sung Park, Rowan Zellers, Chandra Bhagavatula, Anna Rohrbach, Kate Saenko, Yejin Choi

We present Sherlock, an annotated corpus of 103K images for testing machine capacity for abductive reasoning beyond literal image contents.

Visual Abductive Reasoning Visual Reasoning

Paper
Add Code

Temporal Relevance Analysis for Video Action Models

no code implementations • 25 Apr 2022 • Quanfu Fan, Donghyun Kim, Chun-Fu, Chen, Stan Sclaroff, Kate Saenko, Sarah Adel Bargal

In this paper, we provide a deep analysis of temporal modeling for action recognition, an important but underexplored problem in the literature.

Action Recognition

Paper
Add Code

Prefix Conditioning Unifies Language and Label Supervision

no code implementations • CVPR 2023 • Kuniaki Saito, Kihyuk Sohn, Xiang Zhang, Chun-Liang Li, Chen-Yu Lee, Kate Saenko, Tomas Pfister

In experiments, we show that this simple technique improves the performance in zero-shot image recognition accuracy and robustness to the image-level distribution shift.

Classification Contrastive Learning +2

Paper
Add Code

Task2Sim: Towards Effective Pre-Training and Transfer From Synthetic Data

no code implementations • CVPR 2022 • Samarth Mishra, Rameswar Panda, Cheng Perng Phoo, Chun-Fu (Richard) Chen, Leonid Karlinsky, Kate Saenko, Venkatesh Saligrama, Rogerio S. Feris

It is thus better to tailor synthetic pre-training data to a specific downstream task, for best performance.

Paper
Add Code

Human Evaluation of Text-to-Image Models on a Multi-Task Benchmark

no code implementations • 22 Nov 2022 • Vitali Petsiuk, Alexander E. Siemenn, Saisamrit Surbehera, Zad Chin, Keith Tyser, Gregory Hunter, Arvind Raghavan, Yann Hicke, Bryan A. Plummer, Ori Kerret, Tonio Buonassisi, Kate Saenko, Armando Solar-Lezama, Iddo Drori

For example, asking a model to generate a varying number of the same object to measure its ability to count or providing a text prompt with several objects that each have a different attribute to identify its ability to match objects and attributes correctly.

Attribute Text-to-Image Generation

Paper
Add Code

Finding Differences Between Transformers and ConvNets Using Counterfactual Simulation Testing

no code implementations • 29 Nov 2022 • Nataniel Ruiz, Sarah Adel Bargal, Cihang Xie, Kate Saenko, Stan Sclaroff

One shortcoming of this is the fact that these deep neural networks cannot be easily evaluated for robustness issues with respect to specific scene variations.

counterfactual Object

Paper
Add Code

Language-Guided Audio-Visual Source Separation via Trimodal Consistency

no code implementations • CVPR 2023 • Reuben Tan, Arijit Ray, Andrea Burns, Bryan A. Plummer, Justin Salamon, Oriol Nieto, Bryan Russell, Kate Saenko

We propose a self-supervised approach for learning to perform audio source separation in videos based on natural language queries, using only unlabeled video and audio pairs as training data.

Audio Source Separation Natural Language Queries

Paper
Add Code

DIME-FM: DIstilling Multimodal and Efficient Foundation Models

no code implementations • 31 Mar 2023 • Ximeng Sun, Pengchuan Zhang, Peizhao Zhang, Hardik Shah, Kate Saenko, Xide Xia

We transfer the knowledge from the pre-trained CLIP-ViTL/14 model to a ViT-B/32 model, with only 40M public images and 28. 4M unpaired public sentences.

Image Classification

Paper
Add Code

Hardwiring ViT Patch Selectivity into CNNs using Patch Mixing

no code implementations • 30 Jun 2023 • Ariel N. Lee, Sarah Adel Bargal, Janavi Kasera, Stan Sclaroff, Kate Saenko, Nataniel Ruiz

We hypothesize that this power to ignore out-of-context information (which we name $\textit{patch selectivity}$), while integrating in-context information in a non-local manner in early layers, allows ViTs to more easily handle occlusion.

Data Augmentation Inductive Bias

Paper
Add Code

Multiscale Video Pretraining for Long-Term Activity Forecasting

no code implementations • 24 Jul 2023 • Reuben Tan, Matthias De Lange, Michael Iuzzolino, Bryan A. Plummer, Kate Saenko, Karl Ridgeway, Lorenzo Torresani

To alleviate this issue, we propose Multiscale Video Pretraining (MVP), a novel self-supervised pretraining approach that learns robust representations for forecasting by learning to predict contextualized representations of future video clips over multiple timescales.

Action Anticipation Long Term Action Anticipation

Paper
Add Code

DualCoOp++: Fast and Effective Adaptation to Multi-Label Recognition with Limited Annotations

no code implementations • 3 Aug 2023 • Ping Hu, Ximeng Sun, Stan Sclaroff, Kate Saenko

Previous works have focused on learning the alignment between textual and visual spaces to compensate for limited image labels, yet may suffer from reduced accuracy due to the scarcity of high-quality multi-label annotations.

Paper
Add Code

From Fake to Real: Pretraining on Balanced Synthetic Images to Prevent Bias

no code implementations • 8 Aug 2023 • Maan Qraitem, Kate Saenko, Bryan A. Plummer

By training on real and synthetic data separately, FFR avoids the issue of bias toward signals from the pair $(B, G)$.

Paper
Add Code

Label Budget Allocation in Multi-Task Learning

no code implementations • 24 Aug 2023 • Ximeng Sun, Kihyuk Sohn, Kate Saenko, Clayton Mellina, Xiao Bian

How should the label budget (i. e. the amount of money spent on labeling) be allocated among different tasks to achieve optimal multi-task performance?

Multi-Task Learning

Paper
Add Code

Socratis: Are large multimodal models emotionally aware?

no code implementations • 31 Aug 2023 • Katherine Deng, Arijit Ray, Reuben Tan, Saadia Gabriel, Bryan A. Plummer, Kate Saenko

We further see that current captioning metrics based on large vision-language models also fail to correlate with human preferences.

Paper
Add Code

DIME-FM : DIstilling Multimodal and Efficient Foundation Models

no code implementations • ICCV 2023 • Ximeng Sun, Pengchuan Zhang, Peizhao Zhang, Hardik Shah, Kate Saenko, Xide Xia

In this paper, we introduce a new distillation mechanism (DIME-FM) that allows us to transfer the knowledge contained in large VLFMs to smaller, customized foundation models using a relatively small amount of inexpensive, unpaired images and sentences.

Image Classification

Paper
Add Code

Video Frame Interpolation with Many-to-many Splatting and Spatial Selective Refinement

no code implementations • 29 Oct 2023 • Ping Hu, Simon Niklaus, Lu Zhang, Stan Sclaroff, Kate Saenko

In this work, we first propose a fully differentiable Many-to-Many (M2M) splatting framework to interpolate frames efficiently.

Computational Efficiency Motion Estimation +1

Paper
Add Code

CLAMP: Contrastive LAnguage Model Prompt-tuning

no code implementations • 4 Dec 2023 • Piotr Teterwak, Ximeng Sun, Bryan A. Plummer, Kate Saenko, Ser-Nam Lim

Our results show that LLMs can, indeed, achieve good image classification performance when adapted this way.

Contrastive Learning Image Captioning +5

Paper
Add Code

SynCDR : Training Cross Domain Retrieval Models with Synthetic Data

1 code implementation • 31 Dec 2023 • Samarth Mishra, Carlos D. Castillo, Hongcheng Wang, Kate Saenko, Venkatesh Saligrama

In cross-domain retrieval, a model is required to identify images from the same semantic category across two visual domains.

Retrieval Translation

Paper
Code

Vision-LLMs Can Fool Themselves with Self-Generated Typographic Attacks

1 code implementation • 1 Feb 2024 • Maan Qraitem, Nazia Tasnim, Piotr Teterwak, Kate Saenko, Bryan A. Plummer

Furthermore, prior work's Typographic attacks against CLIP randomly sample a misleading class from a predefined set of categories.

Descriptive

Paper
Code

Koala: Key frame-conditioned long video-LLM

no code implementations • 5 Apr 2024 • Reuben Tan, Ximeng Sun, Ping Hu, Jui-Hsien Wang, Hanieh Deilamsalehy, Bryan A. Plummer, Bryan Russell, Kate Saenko

Long video question answering is a challenging task that involves recognizing short-term activities and reasoning about their fine-grained relationships.

Action Recognition Question Answering +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.