no code implementations • 3 Dec 2024 • Piotr Teterwak, Kuniaki Saito, Theodoros Tsiligkaridis, Bryan A. Plummer, Kate Saenko
We show that all evaluated DG methods struggle on DomainBed-OOP, while recent methods excel on DomainBed-IP.
1 code implementation • 25 Jul 2024 • Eunice Yiu, Maan Qraitem, Charlie Wong, Anisa Noor Majhi, Yutong Bai, Shiry Ginosar, Alison Gopnik, Kate Saenko
This paper investigates visual analogical reasoning in large multimodal models (LMMs) compared to human adults and children.
1 code implementation • 12 Jun 2024 • Andrea Burns, Kate Saenko, Bryan A. Plummer
Mobile app user interfaces (UIs) are rich with action, text, structure, and image content that can be utilized to learn generic UI representations for tasks like automating user commands, summarizing content, and evaluating the accessibility of user interfaces.
no code implementations • 3 Jun 2024 • Maan Qraitem, Piotr Teterwak, Kate Saenko, Bryan A. Plummer
We uncover various seemingly harmless logos that VL models correlate 1) with negative human adjectives 2) with the concept of `harmlessness'; causing models to misclassify harmful online content as harmless, and 3) with user-provided object concepts; causing lower recognition accuracy on ImageNet zero-shot classification.
no code implementations • 27 May 2024 • Florian Bordes, Richard Yuanzhe Pang, Anurag Ajay, Alexander C. Li, Adrien Bardes, Suzanne Petryk, Oscar Mañas, Zhiqiu Lin, Anas Mahmoud, Bargav Jayaraman, Mark Ibrahim, Melissa Hall, Yunyang Xiong, Jonathan Lebensold, Candace Ross, Srihari Jayakumar, Chuan Guo, Diane Bouchacourt, Haider Al-Tahan, Karthik Padthe, Vasu Sharma, Hu Xu, Xiaoqing Ellen Tan, Megan Richards, Samuel Lavoie, Pietro Astolfi, Reyhane Askari Hemmat, Jun Chen, Kushal Tirumala, Rim Assouel, Mazda Moayeri, Arjang Talattof, Kamalika Chaudhuri, Zechun Liu, Xilun Chen, Quentin Garrido, Karen Ullrich, Aishwarya Agrawal, Kate Saenko, Asli Celikyilmaz, Vikas Chandra
Then, we present and discuss approaches to evaluate VLMs.
no code implementations • 21 Apr 2024 • Vitali Petsiuk, Kate Saenko
We use compositional property of diffusion models, which allows to leverage multiple prompts in a single image generation.
no code implementations • CVPR 2024 • Reuben Tan, Ximeng Sun, Ping Hu, Jui-Hsien Wang, Hanieh Deilamsalehy, Bryan A. Plummer, Bryan Russell, Kate Saenko
Long video question answering is a challenging task that involves recognizing short-term activities and reasoning about their fine-grained relationships.
1 code implementation • 1 Feb 2024 • Maan Qraitem, Nazia Tasnim, Piotr Teterwak, Kate Saenko, Bryan A. Plummer
Furthermore, prior work's Typographic attacks against CLIP randomly sample a misleading class from a predefined set of categories.
1 code implementation • 31 Dec 2023 • Samarth Mishra, Carlos D. Castillo, Hongcheng Wang, Kate Saenko, Venkatesh Saligrama
In cross-domain retrieval, a model is required to identify images from the same semantic category across two visual domains.
no code implementations • 4 Dec 2023 • Piotr Teterwak, Ximeng Sun, Bryan A. Plummer, Kate Saenko, Ser-Nam Lim
Our results show that LLMs can, indeed, achieve good image classification performance when adapted this way.
1 code implementation • 3 Dec 2023 • Piotr Teterwak, Soren Nelson, Nikoli Dryden, Dina Bashkirova, Kate Saenko, Bryan A. Plummer
To address this, we generate layer weights by learning to compose sets of SuperWeights, which represent a group of trainable parameters.
1 code implementation • 30 Nov 2023 • Dina Bashkirova, Arijit Ray, Rupayan Mallick, Sarah Adel Bargal, Jianming Zhang, Ranjay Krishna, Kate Saenko
Although generative editing methods now enable some forms of image editing, relighting is still beyond today's capabilities; existing methods struggle to keep other aspects of the image -- colors, shapes, and textures -- consistent after the edit.
no code implementations • 29 Oct 2023 • Ping Hu, Simon Niklaus, Lu Zhang, Stan Sclaroff, Kate Saenko
In this work, we first propose a fully differentiable Many-to-Many (M2M) splatting framework to interpolate frames efficiently.
no code implementations • 31 Aug 2023 • Katherine Deng, Arijit Ray, Reuben Tan, Saadia Gabriel, Bryan A. Plummer, Kate Saenko
We further see that current captioning metrics based on large vision-language models also fail to correlate with human preferences.
no code implementations • 24 Aug 2023 • Ximeng Sun, Kihyuk Sohn, Kate Saenko, Clayton Mellina, Xiao Bian
How should the label budget (i. e. the amount of money spent on labeling) be allocated among different tasks to achieve optimal multi-task performance?
1 code implementation • 8 Aug 2023 • Maan Qraitem, Kate Saenko, Bryan A. Plummer
By training on real and synthetic data separately, FFR does not expose the model to the statistical differences between real and synthetic data and thus avoids the issue of bias toward the pair $(B, G)$.
no code implementations • 3 Aug 2023 • Ping Hu, Ximeng Sun, Stan Sclaroff, Kate Saenko
Previous works have focused on learning the alignment between textual and visual spaces to compensate for limited image labels, yet may suffer from reduced accuracy due to the scarcity of high-quality multi-label annotations.
no code implementations • 24 Jul 2023 • Reuben Tan, Matthias De Lange, Michael Iuzzolino, Bryan A. Plummer, Kate Saenko, Karl Ridgeway, Lorenzo Torresani
To alleviate this issue, we propose Multiscale Video Pretraining (MVP), a novel self-supervised pretraining approach that learns robust representations for forecasting by learning to predict contextualized representations of future video clips over multiple timescales.
no code implementations • 30 Jun 2023 • Ariel N. Lee, Sarah Adel Bargal, Janavi Kasera, Stan Sclaroff, Kate Saenko, Nataniel Ruiz
We hypothesize that this power to ignore out-of-context information (which we name $\textit{patch selectivity}$), while integrating in-context information in a non-local manner in early layers, allows ViTs to more easily handle occlusion.
2 code implementations • 9 May 2023 • Andrea Burns, Krishna Srinivasan, Joshua Ainslie, Geoff Brown, Bryan A. Plummer, Kate Saenko, Jianmo Ni, Mandy Guo
Webpages have been a rich resource for language and vision-language tasks.
1 code implementation • NeurIPS 2023 • Arijit Ray, Filip Radenovic, Abhimanyu Dubey, Bryan A. Plummer, Ranjay Krishna, Kate Saenko
To solve Cola, a model must retrieve images with the correct configuration of attributes and objects and avoid choosing a distractor image with the same objects and attributes but in the wrong configuration.
1 code implementation • 5 May 2023 • Andrea Burns, Krishna Srinivasan, Joshua Ainslie, Geoff Brown, Bryan A. Plummer, Kate Saenko, Jianmo Ni, Mandy Guo
Webpages have been a rich, scalable resource for vision-language and language only tasks.
2 code implementations • 4 Apr 2023 • Piotr Teterwak, Kuniaki Saito, Theodoros Tsiligkaridis, Kate Saenko, Bryan A. Plummer
We also explore the relationship between DG performance and similarity to pre-training data, and find that similarity to pre-training data distributions is an important driver of performance, but that ERM++ with stronger initializations can deliver strong performance even on dissimilar datasets. Code is released at https://github. com/piotr-teterwak/erm_plusplus.
no code implementations • 31 Mar 2023 • Ximeng Sun, Pengchuan Zhang, Peizhao Zhang, Hardik Shah, Kate Saenko, Xide Xia
We transfer the knowledge from the pre-trained CLIP-ViTL/14 model to a ViT-B/32 model, with only 40M public images and 28. 4M unpaired public sentences.
no code implementations • CVPR 2023 • Reuben Tan, Arijit Ray, Andrea Burns, Bryan A. Plummer, Justin Salamon, Oriol Nieto, Bryan Russell, Kate Saenko
We propose a self-supervised approach for learning to perform audio source separation in videos based on natural language queries, using only unlabeled video and audio pairs as training data.
1 code implementation • 26 Mar 2023 • Dina Bashkirova, Samarth Mishra, Diala Lteif, Piotr Teterwak, Donghyun Kim, Fadi Alladkani, James Akl, Berk Calli, Sarah Adel Bargal, Kate Saenko, Daehan Kim, Minseok Seo, YoungJin Jeon, Dong-Geol Choi, Shahaf Ettedgui, Raja Giryes, Shady Abu-Hussein, Binhui Xie, Shuang Li
To test the abilities of computer vision models on this task, we present the VisDA 2022 Challenge on Domain Adaptation for Industrial Waste Sorting.
1 code implementation • 26 Mar 2023 • Kuniaki Saito, Donghyun Kim, Piotr Teterwak, Rogerio Feris, Kate Saenko
We propose to use Relative Gradient Norm (RGN) as a way to measure the vulnerability of a backbone to feature distortion, and show that high RGN is indeed correlated with lower OOD performance.
2 code implementations • CVPR 2023 • Dina Bashkirova, Jose Lezama, Kihyuk Sohn, Kate Saenko, Irfan Essa
We show that intermediate self-attention maps of a masked generative transformer encode important structural information of the input image, such as scene layout and object shape, and we propose a novel sampling method based on this observation to enable structure-guided generation.
1 code implementation • CVPR 2023 • Kuniaki Saito, Kihyuk Sohn, Xiang Zhang, Chun-Liang Li, Chen-Yu Lee, Kate Saenko, Tomas Pfister
Existing methods rely on supervised learning of CIR models using labeled triplets consisting of the query image, text specification, and the target image.
Ranked #1 on Zero-shot Image Retrieval on ImageNet-R
1 code implementation • 17 Jan 2023 • Bassel El Mabsout, Shahin Roozkhosh, Siddharth Mysore, Kate Saenko, Renato Mancuso
This study presents "anchor critics", a novel strategy for enhancing the robustness of reinforcement learning (RL) agents in crossing the sim-to-real gap.
no code implementations • ICCV 2023 • Ximeng Sun, Pengchuan Zhang, Peizhao Zhang, Hardik Shah, Kate Saenko, Xide Xia
In this paper, we introduce a new distillation mechanism (DIME-FM) that allows us to transfer the knowledge contained in large VLFMs to smaller, customized foundation models using a relatively small amount of inexpensive, unpaired images and sentences.
no code implementations • 29 Nov 2022 • Nataniel Ruiz, Sarah Adel Bargal, Cihang Xie, Kate Saenko, Stan Sclaroff
One shortcoming of this is the fact that these deep neural networks cannot be easily evaluated for robustness issues with respect to specific scene variations.
1 code implementation • 27 Nov 2022 • Kaihong Wang, Donghyun Kim, Rogerio Feris, Kate Saenko, Margrit Betke
We propose to perform adaptation on attention maps with cross-domain attention layers that share features between the source and the target domains.
no code implementations • 22 Nov 2022 • Vitali Petsiuk, Alexander E. Siemenn, Saisamrit Surbehera, Zad Chin, Keith Tyser, Gregory Hunter, Arvind Raghavan, Yann Hicke, Bryan A. Plummer, Ori Kerret, Tonio Buonassisi, Kate Saenko, Armando Solar-Lezama, Iddo Drori
For example, asking a model to generate a varying number of the same object to measure its ability to count or providing a text prompt with several objects that each have a different attribute to identify its ability to match objects and attributes correctly.
1 code implementation • CVPR 2023 • Maan Qraitem, Kate Saenko, Bryan A. Plummer
Using this notion, BM, through a novel training procedure, ensures that the model is exposed to the entire distribution per epoch without repeating samples.
no code implementations • Frontiers of Computer Science 2022 • Shoumik Sovan, Majumdar Shubhangi Jain, Isidora Chara Tourni, Arsenii Mustafin, Diala Lteif, Stan Sclaroff, Kate Saenko, Sarah Adel Bargal
Deep learning models perform remarkably well for the same task under the assumption that data is always coming from the same distribution.
1 code implementation • 8 Sep 2022 • Amit Alfassy, Assaf Arbelle, Oshri Halimi, Sivan Harary, Roei Herzig, Eli Schwartz, Rameswar Panda, Michele Dolfi, Christoph Auer, Kate Saenko, PeterW. J. Staar, Rogerio Feris, Leonid Karlinsky
However, as we show in this paper, FMs still have poor out-of-the-box performance on expert tasks (e. g. retrieval of car manuals technical illustrations from language queries), data for which is either unseen or belonging to a long-tail part of the data distribution of the huge datasets used for FM pre-training.
Ranked #1 on Image-to-Text Retrieval on FETA Car-Manuals
1 code implementation • 26 Jul 2022 • Reuben Tan, Bryan A. Plummer, Kate Saenko, JP Lewis, Avneesh Sud, Thomas Leung
Thus, we explore a novel setting where the goal is to learn a self-supervised visual-language representation that is robust to varying text length and the number of images.
1 code implementation • 20 Jun 2022 • Ximeng Sun, Ping Hu, Kate Saenko
Solving multi-label recognition (MLR) for images in the low-label regime is a challenging task with many real-world applications.
no code implementations • CVPR 2023 • Kuniaki Saito, Kihyuk Sohn, Xiang Zhang, Chun-Liang Li, Chen-Yu Lee, Kate Saenko, Tomas Pfister
In experiments, we show that this simple technique improves the performance in zero-shot image recognition accuracy and robustness to the image-level distribution shift.
no code implementations • 25 Apr 2022 • Quanfu Fan, Donghyun Kim, Chun-Fu, Chen, Stan Sclaroff, Kate Saenko, Sarah Adel Bargal
In this paper, we provide a deep analysis of temporal modeling for action recognition, an important but underexplored problem in the literature.
1 code implementation • CVPR 2022 • Ping Hu, Simon Niklaus, Stan Sclaroff, Kate Saenko
Motion-based video frame interpolation commonly relies on optical flow to warp pixels from the inputs to the desired interpolation instant.
Ranked #1 on Video Frame Interpolation on Xiph-4K (Crop)
1 code implementation • 1 Apr 2022 • Donghyun Kim, Kaihong Wang, Kate Saenko, Margrit Betke, Stan Sclaroff
In this paper, we investigate the problem of domain adaptive 2D pose estimation that transfers knowledge learned on a synthetic source domain to a target domain without supervision.
1 code implementation • 22 Mar 2022 • Donghyun Kim, Kaihong Wang, Stan Sclaroff, Kate Saenko
In this paper, we provide a broad study and in-depth analysis of pre-training for domain adaptation and generalization, namely: network architectures, size, pre-training loss, and datasets.
1 code implementation • 10 Feb 2022 • Jack Hessel, Jena D. Hwang, Jae Sung Park, Rowan Zellers, Chandra Bhagavatula, Anna Rohrbach, Kate Saenko, Yejin Choi
We present Sherlock, an annotated corpus of 103K images for testing machine capacity for abductive reasoning beyond literal image contents.
1 code implementation • 4 Feb 2022 • Andrea Burns, Deniz Arsan, Sanjna Agrawal, Ranjitha Kumar, Kate Saenko, Bryan A. Plummer
To study VLN with unknown command feasibility, we introduce a new dataset Mobile app Tasks with Iterative Feedback (MoTIF), where the goal is to complete a natural language command in a mobile app.
1 code implementation • 29 Jan 2022 • Julius Frost, Olivia Watkins, Eric Weiner, Pieter Abbeel, Trevor Darrell, Bryan Plummer, Kate Saenko
In order for humans to confidently decide where to employ RL agents for real-world tasks, a human developer must validate that the agent will perform well at test-time.
no code implementations • CVPR 2022 • Samarth Mishra, Rameswar Panda, Cheng Perng Phoo, Chun-Fu (Richard) Chen, Leonid Karlinsky, Kate Saenko, Venkatesh Saligrama, Rogerio S. Feris
It is thus better to tailor synthetic pre-training data to a specific downstream task, for best performance.
1 code implementation • ICLR 2022 • Shiori Sagawa, Pang Wei Koh, Tony Lee, Irena Gao, Sang Michael Xie, Kendrick Shen, Ananya Kumar, Weihua Hu, Michihiro Yasunaga, Henrik Marklund, Sara Beery, Etienne David, Ian Stavness, Wei Guo, Jure Leskovec, Kate Saenko, Tatsunori Hashimoto, Sergey Levine, Chelsea Finn, Percy Liang
Unlabeled data can be a powerful point of leverage for mitigating these distribution shifts, as it is frequently much more available than labeled data and can often be obtained from distributions beyond the source distribution as well.
1 code implementation • CVPR 2022 • Sivan Harary, Eli Schwartz, Assaf Arbelle, Peter Staar, Shady Abu-Hussein, Elad Amrani, Roei Herzig, Amit Alfassy, Raja Giryes, Hilde Kuehne, Dina Katabi, Kate Saenko, Rogerio Feris, Leonid Karlinsky
The ability to generalize learned representations across significantly different visual domains, such as between real photos, clipart, paintings, and sketches, is a fundamental capacity of the human visual system.
no code implementations • 3 Dec 2021 • Kuniaki Saito, Ping Hu, Trevor Darrell, Kate Saenko
LDET leads to significant improvements on many datasets in the open-world instance segmentation task, outperforming baselines on cross-category generalization on COCO, as well as cross-dataset evaluation on UVO and Cityscapes.
no code implementations • NeurIPS 2021 • Reuben Tan, Bryan Plummer, Kate Saenko, Hailin Jin, Bryan Russell
Key to our approach is the ability to learn to spatially localize interactions with self-supervision on a large corpus of videos with accompanying transcribed narrations.
1 code implementation • NeurIPS 2021 • Kuniaki Saito, Donghyun Kim, Kate Saenko
\ours achieves state-of-the-art performance on three datasets, and even outperforms a fully supervised model in detecting outliers unseen in unlabeled data on CIFAR10.
no code implementations • 30 Nov 2021 • Samarth Mishra, Rameswar Panda, Cheng Perng Phoo, Chun-Fu Chen, Leonid Karlinsky, Kate Saenko, Venkatesh Saligrama, Rogerio S. Feris
It is thus better to tailor synthetic pre-training data to a specific downstream task, for best performance.
no code implementations • 26 Nov 2021 • Ben Usman, Dina Bashkirova, Kate Saenko
Unsupervised image-to-image translation methods aim to map images from one domain into plausible examples from another domain while preserving structures shared across two domains.
no code implementations • NeurIPS 2021 • Aadarsh Sahoo, Rutav Shah, Rameswar Panda, Kate Saenko, Abir Das
Unsupervised domain adaptation which aims to adapt models trained on a labeled source domain to a completely unlabeled target domain has attracted much attention in recent years.
Ranked #2 on Unsupervised Domain Adaptation on UCF-HMDB
no code implementations • 20 Oct 2021 • Reuben Tan, Bryan A. Plummer, Kate Saenko, Hailin Jin, Bryan Russell
Key to our approach is the ability to learn to spatially localize interactions with self-supervision on a large corpus of videos with accompanying transcribed narrations.
no code implementations • ICLR 2022 • Siddharth Mysore, George Cheng, Yunqi Zhao, Kate Saenko, Meng Wu
MultiCriticAL is tested in the context of multi-style learning, a special case of MTRL where agents are trained to behave with different distinct behavior styles, and yields up to 45% performance gains over the single-critic baselines and even successfully learns behavior styles in cases where single-critic approaches may simply fail to learn.
no code implementations • 29 Sep 2021 • Piotr Teterwak, Nikoli Dryden, Dina Bashkirova, Kate Saenko, Bryan A. Plummer
We improve on these methods with MixtureEnsembles, which learns to factorize ensemble members with shared parameters by constructing each layer with a linear combination of templates.
no code implementations • 29 Sep 2021 • Devin Guillory, Kuniaki Saito, Eric Tzeng, Yannik Pitcan, Kate Saenko, Trevor Darrell
Optimal transport theory provides a useful tool to measure the differences between two distributions.
no code implementations • ICCV 2021 • Donghyun Kim, Yi-Hsuan Tsai, Bingbing Zhuang, Xiang Yu, Stan Sclaroff, Kate Saenko, Manmohan Chandraker
Learning transferable and domain adaptive feature representations from videos is important for video-relevant tasks such as action recognition.
2 code implementations • ICCV 2021 • Kuniaki Saito, Donghyun Kim, Piotr Teterwak, Stan Sclaroff, Trevor Darrell, Kate Saenko
Unsupervised domain adaptation (UDA) methods can dramatically improve generalization on unlabeled target domains.
1 code implementation • ICCV 2021 • Ximeng Sun, Rameswar Panda, Chun-Fu Chen, Aude Oliva, Rogerio Feris, Kate Saenko
Deep convolutional networks have recently achieved great success in video recognition, yet their practical realization remains a challenge due to the large amount of computational resources required to achieve robust recognition.
1 code implementation • CVPR 2022 • Ben Usman, Andrea Tagliasacchi, Kate Saenko, Avneesh Sud
In the era of deep learning, human pose estimation from multiple cameras with unknown calibration has received little attention to date.
Ranked #1 on 3D Human Pose Estimation on SkiPose
1 code implementation • 23 Jul 2021 • Dina Bashkirova, Dan Hendrycks, Donghyun Kim, Samarth Mishra, Kate Saenko, Kuniaki Saito, Piotr Teterwak, Ben Usman
Progress in machine learning is typically measured by training and testing a model on the same distribution of data, i. e., the same domain.
2 code implementations • CVPR 2021 • Spencer Whitehead, Hui Wu, Heng Ji, Rogerio Feris, Kate Saenko
Generalization to out-of-distribution data has been a problem for Visual Question Answering (VQA) models.
1 code implementation • CVPR 2022 • Dina Bashkirova, Mohamed Abdelfattah, Ziliang Zhu, James Akl, Fadi Alladkani, Ping Hu, Vitaly Ablavsky, Berk Calli, Sarah Adel Bargal, Kate Saenko
Recyclable waste detection poses a unique computer vision challenge as it requires detection of highly deformable and often translucent objects in cluttered scenes without the kind of context information usually present in human-centric datasets.
1 code implementation • 28 May 2021 • Kuniaki Saito, Donghyun Kim, Kate Saenko
OpenMatch achieves state-of-the-art performance on three datasets, and even outperforms a fully supervised model in detecting outliers unseen in unlabeled data on CIFAR10.
1 code implementation • ICCV 2021 • Rameswar Panda, Chun-Fu Chen, Quanfu Fan, Ximeng Sun, Kate Saenko, Aude Oliva, Rogerio Feris
Specifically, given a video segment, a multi-modal policy network is used to decide what modalities should be used for processing by the recognition model, with the goal of improving both accuracy and efficiency.
1 code implementation • ICCV 2021 • Assaf Arbelle, Sivan Doveh, Amit Alfassy, Joseph Shtok, Guy Lev, Eli Schwartz, Hilde Kuehne, Hila Barak Levi, Prasanna Sattigeri, Rameswar Panda, Chun-Fu Chen, Alex Bronstein, Kate Saenko, Shimon Ullman, Raja Giryes, Rogerio Feris, Leonid Karlinsky
In this work, we focus on the task of Detector-Free WSG (DF-WSG) to solve WSG without relying on a pre-trained detector.
Ranked #1 on Phrase Grounding on Visual Genome
1 code implementation • 17 Apr 2021 • Andrea Burns, Deniz Arsan, Sanjna Agrawal, Ranjitha Kumar, Kate Saenko, Bryan A. Plummer
In recent years, vision-language research has shifted to study tasks which require more complex reasoning, such as interactive question answering, visual common sense reasoning, and question-answer plausibility prediction.
2 code implementations • ICCV 2021 • Kuniaki Saito, Kate Saenko
In this paper, we propose a method to learn the threshold using source samples and to adapt it to the target domain.
Ranked #6 on Universal Domain Adaptation on DomainNet
1 code implementation • 29 Mar 2021 • Dina Bashkirova, Ben Usman, Kate Saenko
Given an input image from a source domain and a guidance image from a target domain, unsupervised many-to-many image-to-image (UMMI2I) translation methods seek to generate a plausible example from the target domain that preserves domain-invariant information of the input source image and inherits the domain-specific information from the guidance image.
no code implementations • 2 Mar 2021 • Ximeng Sun, Rameswar Panda, Chun-Fu Chen, Naigang Wang, Bowen Pan, Kailash Gopalakrishnan, Aude Oliva, Rogerio Feris, Kate Saenko
Second, to effectively transfer knowledge, we develop a dynamic block swapping method by randomly replacing the blocks in the lower-precision student network with the corresponding blocks in the higher-precision teacher network.
no code implementations • 23 Feb 2021 • Siddharth Mysore, Bassel Mabsout, Renato Mancuso, Kate Saenko
Actors and critics in actor-critic reinforcement learning algorithms are functionally separate, yet they often use the same network architectures.
no code implementations • ICLR 2021 • Bowen Pan, Rameswar Panda, Camilo Fosco, Chung-Ching Lin, Alex Andonian, Yue Meng, Kate Saenko, Aude Oliva, Rogerio Feris
An inherent property of real-world videos is the high correlation of information across frames which can translate into redundancy in either temporal or spatial feature maps of the models, or both.
no code implementations • ICLR 2021 • Yue Meng, Rameswar Panda, Chung-Ching Lin, Prasanna Sattigeri, Leonid Karlinsky, Kate Saenko, Aude Oliva, Rogerio Feris
Temporal modelling is the key for efficient video action recognition.
1 code implementation • CVPR 2021 • Ankit Singh, Omprakash Chakraborty, Ashutosh Varshney, Rameswar Panda, Rogerio Feris, Kate Saenko, Abir Das
We approach this problem by learning a two-pathway temporal contrastive model using unlabeled videos at two different speeds leveraging the fact that changing video speed does not change an action.
1 code implementation • 29 Jan 2021 • Samarth Mishra, Kate Saenko, Venkatesh Saligrama
With our Pretraining and Consistency (PAC) approach, we achieve state of the art target accuracy on this semi-supervised domain adaptation task, surpassing multiple adversarial domain alignment methods, across multiple datasets.
Semi-supervised Domain Adaptation Unsupervised Domain Adaptation
no code implementations • ICCV 2021 • Donghyun Kim, Kuniaki Saito, Tae-Hyun Oh, Bryan A. Plummer, Stan Sclaroff, Kate Saenko
We present a two-stage pre-training approach that improves the generalization ability of standard single-domain pre-training.
no code implementations • 11 Dec 2020 • Siddharth Mysore, Bassel Mabsout, Renato Mancuso, Kate Saenko
A critical problem with the practical utility of controllers trained with deep Reinforcement Learning (RL) is the notable lack of smoothness in the actions learned by the RL policies.
1 code implementation • CVPR 2021 • Guy Bukchin, Eli Schwartz, Kate Saenko, Ori Shahar, Rogerio Feris, Raja Giryes, Leonid Karlinsky
A very practical example of C2FS is when the target classes are sub-classes of the training classes.
no code implementations • 6 Dec 2020 • Aadarsh Sahoo, Rameswar Panda, Rogerio Feris, Kate Saenko, Abir Das
Partial domain adaptation which assumes that the unknown target label space is a subset of the source label space has attracted much attention in computer vision.
Ranked #1 on Partial Domain Adaptation on Office-31
no code implementations • NeurIPS 2020 • Ping Hu, Stan Sclaroff, Kate Saenko
Recently, most ZSS methods focus on learning the visual-semantic correspondence to transfer knowledge from seen classes to unseen classes at the pixel level.
1 code implementation • 26 Nov 2020 • Spencer Whitehead, Hui Wu, Yi Ren Fung, Heng Ji, Rogerio Feris, Kate Saenko
Existing Visual Question Answering (VQA) models are often fragile and sensitive to input variations.
no code implementations • ICCV 2021 • Baifeng Shi, Qi Dai, Judy Hoffman, Kate Saenko, Trevor Darrell, Huijuan Xu
We extensively benchmark against the baselines for SSAD and OSAD on our created data splits in THUMOS14 and ActivityNet1. 2, and demonstrate the effectiveness of the proposed UFA and IB methods.
no code implementations • NeurIPS 2020 • Baifeng Shi, Judy Hoffman, Kate Saenko, Trevor Darrell, Huijuan Xu
By adjusting the auxiliary task weights to minimize the divergence between the surrogate prior and the true prior of the main task, we obtain a more accurate prior estimation, achieving the goal of minimizing the required amount of training data for the main task and avoiding a costly grid search.
1 code implementation • ICCV 2021 • Viraj Prabhu, Arjun Chandrasekaran, Kate Saenko, Judy Hoffman
Generalizing deep neural networks to new target domains is critical to their real-world utility.
1 code implementation • EMNLP 2020 • Reuben Tan, Bryan A. Plummer, Kate Saenko
In addition to the valuable insights gleaned from our user study experiments, we provide a relatively effective approach based on detecting visual-semantic inconsistencies, which will serve as an effective first line of defense and a useful reference for future work in defending against machine-generated disinformation.
no code implementations • 1 Aug 2020 • Donghyun Kim, Kuniaki Saito, Samarth Mishra, Stan Sclaroff, Kate Saenko, Bryan A Plummer
Our approach consists of three self-supervised tasks designed to capture different concepts that are neglected in prior work that we can select from depending on the needs of our downstream tasks.
1 code implementation • ECCV 2020 • Yue Meng, Chung-Ching Lin, Rameswar Panda, Prasanna Sattigeri, Leonid Karlinsky, Aude Oliva, Kate Saenko, Rogerio Feris
Specifically, given a video frame, a policy network is used to decide what input resolution should be used for processing by the action recognition model, with the goal of improving both accuracy and efficiency.
1 code implementation • ECCV 2020 • Xingchao Peng, Yichen Li, Kate Saenko
Extensive experiments are conducted to demonstrate the power of our new datasets in benchmarking state-of-the-art multi-source domain adaptation methods, as well as the advantage of our proposed model.
1 code implementation • ECCV 2020 • Kuniaki Saito, Kate Saenko, Ming-Yu Liu
Unsupervised image-to-image translation intends to learn a mapping of an image in a given domain to an analogous image in a different domain, without explicit supervision of the mapping.
1 code implementation • 7 Jul 2020 • Ping Hu, Federico Perazzi, Fabian Caba Heilbron, Oliver Wang, Zhe Lin, Kate Saenko, Stan Sclaroff
The proposed architecture relies on our fast spatial attention, which is a simple yet efficient modification of the popular self-attention mechanism and captures the same rich spatial context at a small fraction of the computational cost, by changing the order of operations.
Ranked #32 on Semantic Segmentation on DensePASS
1 code implementation • ICLR 2022 • Bryan A. Plummer, Nikoli Dryden, Julius Frost, Torsten Hoefler, Kate Saenko
We introduce Neural Parameter Allocation Search (NPAS), a novel task where the goal is to train a neural network given an arbitrary, fixed parameter budget.
2 code implementations • CVPR 2021 • Vitali Petsiuk, Rajiv Jain, Varun Manjunatha, Vlad I. Morariu, Ashutosh Mehra, Vicente Ordonez, Kate Saenko
We propose D-RISE, a method for generating visual explanations for the predictions of object detectors.
no code implementations • 24 May 2020 • Ulrich Viereck, Kate Saenko, Robert Platt
Learning control policies for visual servoing in novel environments is an important problem.
no code implementations • ECCV 2020 • Andrea Burns, Donghyun Kim, Derry Wijaya, Kate Saenko, Bryan A. Plummer
Current multilingual vision-language models either require a large number of additional parameters for each supported language, or suffer performance degradation as languages are added.
no code implementations • 1 Apr 2020 • Huijuan Xu, Lizhi Yang, Stan Sclaroff, Kate Saenko, Trevor Darrell
Spatio-temporal action detection in videos requires localizing the action both spatially and temporally in the form of an "action tube".
no code implementations • 31 Mar 2020 • Huijuan Xu, Ximeng Sun, Eric Tzeng, Abir Das, Kate Saenko, Trevor Darrell
In this paper, we present a conceptually simple and general yet novel framework for few-shot temporal activity detection based on proposal regression which detects the start and end time of the activities in untrimmed videos.
1 code implementation • NeurIPS 2020 • Ben Usman, Avneesh Sud, Nick Dufour, Kate Saenko
We show that, under certain assumptions, this combination yields a deep neural likelihood-based minimization objective that attains a known lower bound upon convergence.
no code implementations • 18 Mar 2020 • Donghyun Kim, Kuniaki Saito, Tae-Hyun Oh, Bryan A. Plummer, Stan Sclaroff, Kate Saenko
We show that when labeled source examples are limited, existing methods often fail to learn discriminative features applicable for both source and target domains.
no code implementations • 13 Mar 2020 • Andrea Zunino, Sarah Adel Bargal, Riccardo Volpi, Mehrnoosh Sameki, Jianming Zhang, Stan Sclaroff, Vittorio Murino, Kate Saenko
Explanations are defined as regions of visual evidence upon which a deep classification network makes a decision.
1 code implementation • NeurIPS 2020 • Kuniaki Saito, Donghyun Kim, Stan Sclaroff, Kate Saenko
While some methods address target settings with either partial or open-set categories, they assume that the particular setting is known a priori.
2 code implementations • ECCV 2020 • Yunhui Guo, Noel C. Codella, Leonid Karlinsky, James V. Codella, John R. Smith, Kate Saenko, Tajana Rosing, Rogerio Feris
Extensive experiments on the proposed benchmark are performed to evaluate state-of-art meta-learning approaches, transfer learning approaches, and newer methods for cross-domain few-shot learning.
Ranked #3 on Cross-Domain Few-Shot on Plantae
cross-domain few-shot learning Few-Shot Image Classification +1
2 code implementations • NeurIPS 2020 • Ximeng Sun, Rameswar Panda, Rogerio Feris, Kate Saenko
Multi-task learning is an open and challenging problem in computer vision.
Ranked #112 on Semantic Segmentation on NYU Depth v2
no code implementations • ICLR 2020 • Xingchao Peng, Zijun Huang, Yizhe Zhu, Kate Saenko
In this work, we present a principled approach to the problem of federated domain adaptation, which aims to align the representations learned among the different nodes with the data distribution of the target node.
no code implementations • 23 Oct 2019 • Shuhan Tan, Xingchao Peng, Kate Saenko
Unsupervised domain adaptation is a promising way to generalize deep models to novel domains.
no code implementations • 27 Sep 2019 • Reuben Tan, Huijuan Xu, Kate Saenko, Bryan A. Plummer
However, while such approaches tend to focus on identifying relationships between elements of the video and language modalities, there is less emphasis on modeling relational context between video frames given the semantic context of the query.
no code implementations • 25 Sep 2019 • Shuhan Tan, Xingchao Peng, Kate Saenko
In this paper, we explore the task of Generalized Domain Adaptation (GDA): How to transfer knowledge across different domains in the presence of both covariate and label shift?
no code implementations • 25 Sep 2019 • Reuben Tan, Huijuan Xu, Kate Saenko, Bryan A. Plummer
Given a video and a sentence, the goal of weakly-supervised video moment retrieval is to locate the video segment which is described by the sentence without having access to temporal annotations during training.
no code implementations • 8 Sep 2019 • Donghyun Kim, Kuniaki Saito, Kate Saenko, Stan Sclaroff, Bryan A. Plummer
In this paper, we present a modular approach which can easily be incorporated into existing vision-language methods in order to support many languages.
1 code implementation • ICCV 2019 • Reuben Tan, Mariya I. Vasileva, Kate Saenko, Bryan A. Plummer
Many real-world tasks require models to compare images along multiple similarity conditions (e. g. similarity in color, category or shape).
no code implementations • ICCV 2019 • Andrea Burns, Reuben Tan, Kate Saenko, Stan Sclaroff, Bryan A. Plummer
Shouldn't language and vision features be treated equally in vision-language (VL) tasks?
1 code implementation • NeurIPS 2019 • Dina Bashkirova, Ben Usman, Kate Saenko
The goal of unsupervised image-to-image translation is to map images from one domain to another without the ground truth correspondence between the two domains.
no code implementations • 11 Jun 2019 • Ping Hu, Ximeng Sun, Kate Saenko, Stan Sclaroff
Learning from a few examples is a challenging task for machine learning.
no code implementations • 5 Jun 2019 • Huijuan Xu, Abir Das, Kate Saenko
We address the problem of temporal activity detection in continuous, untrimmed video streams.
Ranked #4 on Action Recognition on THUMOS’14
no code implementations • ACL 2019 • Ronghang Hu, Daniel Fried, Anna Rohrbach, Dan Klein, Trevor Darrell, Kate Saenko
The actual grounding can connect language to the environment through multiple modalities, e. g. "stop at the door" might ground into visual objects, while "turn right" might rely only on the geometric structure of a route.
1 code implementation • ECCV 2020 • Bryan A. Plummer, Mariya I. Vasileva, Vitali Petsiuk, Kate Saenko, David Forsyth
Explaining a deep learning model can help users understand its behavior and allow researchers to discern its shortcomings.
1 code implementation • ICCV 2019 • Ronghang Hu, Anna Rohrbach, Trevor Darrell, Kate Saenko
E. g., conditioning on the "on" relationship to the plate, the object "mug" gathers messages from the object "plate" to update its representation to "mug on the plate", which can be easily consumed by a simple classifier for answer prediction.
Ranked #3 on Referring Expression Comprehension on CLEVR-Ref+
1 code implementation • 28 Apr 2019 • Xingchao Peng, Zijun Huang, Ximeng Sun, Kate Saenko
Unsupervised model transfer has the potential to greatly improve the generalizability of deep models to novel domains.
Ranked #4 on Multi-target Domain Adaptation on DomainNet
4 code implementations • ICCV 2019 • Kuniaki Saito, Donghyun Kim, Stan Sclaroff, Trevor Darrell, Kate Saenko
Contemporary domain adaptation methods are very effective at aligning feature distributions of source and target domains without any target supervision.
no code implementations • 28 Jan 2019 • Ben Usman, Nick Dufour, Kate Saenko, Chris Bregler
In this work we propose a model that can manipulate individual visual attributes of objects in a real scene using examples of how respective attribute manipulations affect the output of a simulation.
no code implementations • 25 Dec 2018 • Huijuan Xu, Bingyi Kang, Ximeng Sun, Jiashi Feng, Kate Saenko, Trevor Darrell
In this paper, we present a conceptually simple and general yet novel framework for few-shot temporal activity detection which detects the start and end time of the few-shot input activities in an untrimmed video.
2 code implementations • CVPR 2019 • Kuniaki Saito, Yoshitaka Ushiku, Tatsuya Harada, Kate Saenko
This motivates us to propose a novel method for detector adaptation based on strong local alignment and weak global alignment.
Ranked #2 on Unsupervised Domain Adaptation on SIM10K to BDD100K
no code implementations • 6 Dec 2018 • Sarah Adel Bargal, Andrea Zunino, Vitali Petsiuk, Jianming Zhang, Kate Saenko, Vittorio Murino, Stan Sclaroff
We propose Guided Zoom, an approach that utilizes spatial grounding of a model's decision to make more informed predictions.
3 code implementations • ICCV 2019 • Xingchao Peng, Qinxun Bai, Xide Xia, Zijun Huang, Kate Saenko, Bo wang
Conventional unsupervised domain adaptation (UDA) assumes that training data are sampled from a single domain.
1 code implementation • 3 Dec 2018 • Ximeng Sun, Huijuan Xu, Kate Saenko
Video generation is an inherently challenging task, as it requires modeling realistic temporal dynamics as well as spatial content.
no code implementations • 3 Dec 2018 • Eric Tzeng, Kaylee Burns, Kate Saenko, Trevor Darrell
Without dense labels, as is the case when only detection labels are available in the source, transformations are learned using CycleGAN alignment.
3 code implementations • 17 Nov 2018 • Bryan A. Plummer, Kevin J. Shih, Yichen Li, Ke Xu, Svetlana Lazebnik, Stan Sclaroff, Kate Saenko
Most existing work that grounds natural language phrases in images starts with the assumption that the phrase in question is relevant to the image.
no code implementations • CVPR 2018 • Vasili Ramanishka, Yi-Ting Chen, Teruhisa Misu, Kate Saenko
We present the Honda Research Institute Driving Dataset (HDD), a challenging dataset to enable research on learning driver behavior in real-life environments.
no code implementations • 27 Sep 2018 • Siddharth Mysore, Robert Platt, Kate Saenko
We propose a novel method to exploit this observation to develop robust actor policies, by automatically developing a sampling curriculum over environment settings to use in training.
1 code implementation • EMNLP 2018 • Anna Rohrbach, Lisa Anne Hendricks, Kaylee Burns, Trevor Darrell, Kate Saenko
Despite continuously improving performance, contemporary image captioning models are prone to "hallucinating" objects that are not actually in a scene.
no code implementations • 27 Jul 2018 • Ulrich Viereck, Xingchao Peng, Kate Saenko, Robert Platt
This paper proposes an approach to domain transfer based on a pairwise loss function that helps transfer control policies learned in simulation onto a real robot.
1 code implementation • ECCV 2018 • Ronghang Hu, Jacob Andreas, Trevor Darrell, Kate Saenko
In complex inferential tasks like question answering, machine learning models must confront two challenges: the need to implement a compositional reasoning process, and, in many applications, the need for this reasoning process to be interpretable to assist users in both development and prediction.
Ranked #14 on Referring Expression Comprehension on Talk2Car
no code implementations • 2 Jul 2018 • Lisa Anne Hendricks, Kaylee Burns, Kate Saenko, Trevor Darrell, Anna Rohrbach
Most machine learning methods are known to capture and exploit biases of the training data.
no code implementations • 26 Jun 2018 • Xingchao Peng, Ben Usman, Kuniaki Saito, Neela Kaushik, Judy Hoffman, Kate Saenko
In this paper, we present a new large-scale benchmark called Syn2Real, which consists of a synthetic domain rendered from 3D object models and two real-image domains containing the same object categories.
12 code implementations • 19 Jun 2018 • Vitali Petsiuk, Abir Das, Kate Saenko
We compare our approach to state-of-the-art importance extraction methods using both an automatic deletion/insertion metric and a pointing metric based on human-annotated object segments.
Explainable Artificial Intelligence (XAI) Feature Importance +5
1 code implementation • ICLR 2019 • Dina Bashkirova, Ben Usman, Kate Saenko
Unsupervised image-to-image translation is a recently proposed task of translating an image to a different style or domain given only unpaired image examples at training time.
1 code implementation • NeurIPS 2018 • Daniel Fried, Ronghang Hu, Volkan Cirik, Anna Rohrbach, Jacob Andreas, Louis-Philippe Morency, Taylor Berg-Kirkpatrick, Kate Saenko, Dan Klein, Trevor Darrell
We use this speaker model to (1) synthesize new instructions for data augmentation and to (2) implement pragmatic reasoning, which evaluates how well candidate action sequences explain an instruction.
no code implementations • ICLR 2019 • Andrew Levy, Robert Platt, Kate Saenko
Reinforcement Learning (RL) algorithms can suffer from poor sample efficiency when rewards are delayed and sparse.
Hierarchical Reinforcement Learning reinforcement-learning +2
1 code implementation • 13 Apr 2018 • Huijuan Xu, Kun He, Bryan A. Plummer, Leonid Sigal, Stan Sclaroff, Kate Saenko
To capture the inherent structures present in both text and video, we introduce a multilevel model that integrates vision and language features earlier and more tightly than prior work.
2 code implementations • ECCV 2018 • Kaylee Burns, Lisa Anne Hendricks, Kate Saenko, Trevor Darrell, Anna Rohrbach
We introduce a new Equalizer model that ensures equal gender probability when gender evidence is occluded in a scene and confident predictions when gender evidence is present.
1 code implementation • 28 Feb 2018 • Huijuan Xu, Boyang Li, Vasili Ramanishka, Leonid Sigal, Kate Saenko
In order to explicitly model temporal relationships between visual events and their captions in a single video, we also propose a two-level hierarchical captioning module that keeps track of context.
no code implementations • 28 Jan 2018 • Yancheng Bai, Huijuan Xu, Kate Saenko, Bernard Ghanem
In this paper, we propose the contextual multi-scale region convolutional 3D network (CMS-RC3D) for activity detection.
4 code implementations • 4 Dec 2017 • Andrew Levy, George Konidaris, Robert Platt, Kate Saenko
Hierarchical agents have the potential to solve sequential decision making tasks with greater sample efficiency than their non-hierarchical counterparts because hierarchical agents can break down tasks into sets of subtasks that only require short sequences of decisions.
3 code implementations • ICML 2018 • Judy Hoffman, Eric Tzeng, Taesung Park, Jun-Yan Zhu, Phillip Isola, Kate Saenko, Alexei A. Efros, Trevor Darrell
Domain adaptation is critical for success in new, unseen environments.
no code implementations • ICLR 2018 • Kuniaki Saito, Yoshitaka Ushiku, Tatsuya Harada, Kate Saenko
However, a drawback of this approach is that the critic simply labels the generated features as in-domain or not, without considering the boundaries between classes.
Ranked #2 on Synthetic-to-Real Translation on Syn2Real-C
2 code implementations • 18 Oct 2017 • Xingchao Peng, Ben Usman, Neela Kaushik, Judy Hoffman, Dequan Wang, Kate Saenko
We present the 2017 Visual Domain Adaptation (VisDA) dataset and challenge, a large-scale testbed for unsupervised domain adaptation across visual domains.
no code implementations • ICLR 2018 • Ben Usman, Kate Saenko, Brian Kulis
Our empirical results suggest that using the dual formulation for the restricted family of linear discriminators results in a more stable convergence to a desirable solution when compared with the performance of a primal min-max GAN-like objective and an MMD objective under the same restrictions.
1 code implementation • 29 Jun 2017 • Andreas ten Pas, Marcus Gualtieri, Kate Saenko, Robert Platt
Many grasp detection methods achieve grasp success rates (grasp successes as a fraction of the total number of grasp attempts) between 75% and 95% for novel objects presented in isolation or in light clutter.
Robotics
no code implementations • 14 Jun 2017 • Ulrich Viereck, Andreas ten Pas, Kate Saenko, Robert Platt
This paper proposes an approach to learning a closed-loop controller for robotic grasping that dynamically guides the gripper to the object.
1 code implementation • ICCV 2017 • Ronghang Hu, Jacob Andreas, Marcus Rohrbach, Trevor Darrell, Kate Saenko
Natural language questions are inherently compositional, and many are most easily answered by reasoning about their decomposition into modular sub-problems.
Ranked #44 on Visual Question Answering (VQA) on VQA v2 test-dev
3 code implementations • ICCV 2017 • Huijuan Xu, Abir Das, Kate Saenko
We address the problem of activity detection in continuous, untrimmed video streams.
Ranked #1 on Action Recognition In Videos on THUMOS’14
20 code implementations • CVPR 2017 • Eric Tzeng, Judy Hoffman, Kate Saenko, Trevor Darrell
Adversarial learning methods are a promising approach to training robust deep networks, and can generate complex samples across diverse domains.
Ranked #3 on Unsupervised Image-To-Image Translation on S