no code implementations • 8 Jan 2025 • Nannan Li, Kevin J. Shih, Bryan A. Plummer
We introduce a garment extraction model that generates (human, synthetic garment) pairs from a single image of a clothed individual.
no code implementations • 13 Dec 2024 • Piotr Teterwak, Kate Saenko, Bryan A. Plummer, Ser-Nam Lim
The learned embedding is input to the MLP, which generates the adapter parameters.
no code implementations • 10 Dec 2024 • Arijit Ray, Jiafei Duan, Reuben Tan, Dina Bashkirova, Rose Hendrix, Kiana Ehsani, Aniruddha Kembhavi, Bryan A. Plummer, Ranjay Krishna, Kuo-Hao Zeng, Kate Saenko
We find that even MLMs that perform relatively well on static questions struggle to accurately answer dynamic spatial questions.
no code implementations • 3 Dec 2024 • Piotr Teterwak, Kuniaki Saito, Theodoros Tsiligkaridis, Bryan A. Plummer, Kate Saenko
We show that all evaluated DG methods struggle on DomainBed-OOP, while recent methods excel on DomainBed-IP.
no code implementations • 25 Nov 2024 • Nazia Tasnim, Bryan A. Plummer
Incremental learning aims to adapt to new sets of categories over time with minimal computational overhead.
no code implementations • 4 Aug 2024 • Aoming Liu, Zhong Li, Zhang Chen, Nannan Li, Yi Xu, Bryan A. Plummer
In experiments on Planar, 360{\deg}, and Full Spherical Panoramas, PanoFree demonstrates significant error reduction, improves global consistency, and boosts image quality without extra fine-tuning.
1 code implementation • 12 Jun 2024 • Andrea Burns, Kate Saenko, Bryan A. Plummer
Mobile app user interfaces (UIs) are rich with action, text, structure, and image content that can be utilized to learn generic UI representations for tasks like automating user commands, summarizing content, and evaluating the accessibility of user interfaces.
no code implementations • 3 Jun 2024 • Maan Qraitem, Piotr Teterwak, Kate Saenko, Bryan A. Plummer
We uncover various seemingly harmless logos that VL models correlate 1) with negative human adjectives 2) with the concept of `harmlessness'; causing models to misclassify harmful online content as harmless, and 3) with user-provided object concepts; causing lower recognition accuracy on ImageNet zero-shot classification.
1 code implementation • 26 May 2024 • Chau Pham, Bryan A. Plummer
Multi-Channel Imaging (MCI) contains an array of challenges for encoding useful feature representations not present in traditional images.
no code implementations • CVPR 2024 • Reuben Tan, Ximeng Sun, Ping Hu, Jui-Hsien Wang, Hanieh Deilamsalehy, Bryan A. Plummer, Bryan Russell, Kate Saenko
Long video question answering is a challenging task that involves recognizing short-term activities and reasoning about their fine-grained relationships.
1 code implementation • 19 Feb 2024 • Zhongping Zhang, Wenda Qin, Bryan A. Plummer
A key challenge in our MGT localization task is that short spans of text, e. g., a single sentence, provides little information indicating if it is machine generated due to its short length.
1 code implementation • 1 Feb 2024 • Maan Qraitem, Nazia Tasnim, Piotr Teterwak, Kate Saenko, Bryan A. Plummer
Furthermore, prior work's Typographic attacks against CLIP randomly sample a misleading class from a predefined set of categories.
1 code implementation • CVPR 2024 • Nannan Li, Qing Liu, Krishna Kumar Singh, Yilin Wang, Jianming Zhang, Bryan A. Plummer, Zhe Lin
In this paper, we propose UniHuman, a unified model that addresses multiple facets of human image editing in real-world settings.
no code implementations • 4 Dec 2023 • Piotr Teterwak, Ximeng Sun, Bryan A. Plummer, Kate Saenko, Ser-Nam Lim
Our results show that LLMs can, indeed, achieve good image classification performance when adapted this way.
1 code implementation • 3 Dec 2023 • Piotr Teterwak, Soren Nelson, Nikoli Dryden, Dina Bashkirova, Kate Saenko, Bryan A. Plummer
To address this, we generate layer weights by learning to compose sets of SuperWeights, which represent a group of trainable parameters.
1 code implementation • 30 Nov 2023 • Siqi Wang, Chau Pham, Bryan A. Plummer
In this work, we explore the integration of these two approaches, proposing an interconnected structure with three crucial blocks: noise modeling, source knowledge identification, and enhanced noise detection using noise source-knowledge-integration methods.
1 code implementation • 7 Nov 2023 • Chau Pham, Piotr Teterwak, Soren Nelson, Bryan A. Plummer
Newly grown layer weights are generated by using a new linear combination of existing templates for a layer.
2 code implementations • NeurIPS 2023 • Zitong Chen, Chau Pham, Siqi Wang, Michael Doron, Nikita Moshkov, Bryan A. Plummer, Juan C. Caicedo
In this paper, we present a benchmark for investigating channel-adaptive models in microscopy imaging, which consists of 1) a dataset of varied-channel single-cell images, and 2) a biologically relevant evaluation framework.
no code implementations • 10 Oct 2023 • Chau Pham, Boyi Liu, Yingxiang Yang, Zhengyu Chen, Tianyi Liu, Jianbo Yuan, Bryan A. Plummer, Zhaoran Wang, Hongxia Yang
Although natural language is an obvious choice for communication due to LLM's language understanding capability, the token sampling step needed when generating natural language poses a potential risk of information loss, as it uses only one token to represent the model's belief across the entire vocabulary.
no code implementations • 31 Aug 2023 • Katherine Deng, Arijit Ray, Reuben Tan, Saadia Gabriel, Bryan A. Plummer, Kate Saenko
We further see that current captioning metrics based on large vision-language models also fail to correlate with human preferences.
1 code implementation • 8 Aug 2023 • Maan Qraitem, Kate Saenko, Bryan A. Plummer
By training on real and synthetic data separately, FFR does not expose the model to the statistical differences between real and synthetic data and thus avoids the issue of bias toward the pair $(B, G)$.
no code implementations • 24 Jul 2023 • Reuben Tan, Matthias De Lange, Michael Iuzzolino, Bryan A. Plummer, Kate Saenko, Karl Ridgeway, Lorenzo Torresani
To alleviate this issue, we propose Multiscale Video Pretraining (MVP), a novel self-supervised pretraining approach that learns robust representations for forecasting by learning to predict contextualized representations of future video clips over multiple timescales.
1 code implementation • 20 Jun 2023 • Siqi Wang, Bryan A. Plummer
Critically, we show that LNL methods fail to generalize on some real-world datasets, even when adapted to integrate noise source knowledge, highlighting the importance of directly exploring LNL+K.
no code implementations • 27 May 2023 • Zhongping Zhang, Jian Zheng, Jacob Zhiyuan Fang, Bryan A. Plummer
Using the input image as a control could mitigate these issues, but since these models are trained via reconstruction, a model can simply hide information about the original image when encoding it to perfectly reconstruct the image without learning the editing task.
2 code implementations • 9 May 2023 • Andrea Burns, Krishna Srinivasan, Joshua Ainslie, Geoff Brown, Bryan A. Plummer, Kate Saenko, Jianmo Ni, Mandy Guo
Webpages have been a rich resource for language and vision-language tasks.
1 code implementation • NeurIPS 2023 • Arijit Ray, Filip Radenovic, Abhimanyu Dubey, Bryan A. Plummer, Ranjay Krishna, Kate Saenko
To solve Cola, a model must retrieve images with the correct configuration of attributes and objects and avoid choosing a distractor image with the same objects and attributes but in the wrong configuration.
1 code implementation • 5 May 2023 • Andrea Burns, Krishna Srinivasan, Joshua Ainslie, Geoff Brown, Bryan A. Plummer, Kate Saenko, Jianmo Ni, Mandy Guo
Webpages have been a rich, scalable resource for vision-language and language only tasks.
2 code implementations • 4 Apr 2023 • Piotr Teterwak, Kuniaki Saito, Theodoros Tsiligkaridis, Kate Saenko, Bryan A. Plummer
Overall, ERM++ challenges the need for more complex DG methods by providing a stronger, more reliable baseline that maintains simplicity and ease of use.
no code implementations • CVPR 2023 • Reuben Tan, Arijit Ray, Andrea Burns, Bryan A. Plummer, Justin Salamon, Oriol Nieto, Bryan Russell, Kate Saenko
We propose a self-supervised approach for learning to perform audio source separation in videos based on natural language queries, using only unlabeled video and audio pairs as training data.
no code implementations • 22 Nov 2022 • Vitali Petsiuk, Alexander E. Siemenn, Saisamrit Surbehera, Zad Chin, Keith Tyser, Gregory Hunter, Arvind Raghavan, Yann Hicke, Bryan A. Plummer, Ori Kerret, Tonio Buonassisi, Kate Saenko, Armando Solar-Lezama, Iddo Drori
For example, asking a model to generate a varying number of the same object to measure its ability to count or providing a text prompt with several objects that each have a different attribute to identify its ability to match objects and attributes correctly.
1 code implementation • ICCV 2023 • Nannan Li, Kevin J. Shih, Bryan A. Plummer
Then we reconstruct the input image by sampling from the permuted textures for patch-level disentanglement.
1 code implementation • CVPR 2023 • Maan Qraitem, Kate Saenko, Bryan A. Plummer
Using this notion, BM, through a novel training procedure, ensures that the model is exposed to the entire distribution per epoch without repeating samples.
1 code implementation • 26 Jul 2022 • Reuben Tan, Bryan A. Plummer, Kate Saenko, JP Lewis, Avneesh Sud, Thomas Leung
Thus, we explore a novel setting where the goal is to learn a self-supervised visual-language representation that is robust to varying text length and the number of images.
1 code implementation • 13 Jul 2022 • Nannan Li, Bryan A. Plummer
Thus, the source attribute information can often be hidden in the disentangled features, leading to unwanted image editing effects.
1 code implementation • 24 Mar 2022 • Zhongping Zhang, Yiwen Gu, Bryan A. Plummer, Xin Miao, Jiayi Liu, Huayan Wang
We evaluate our method on MovieNet and Condensed Movies datasets, achieving approximate 6-9% improvement in mean Average Precision (mAP) over the baselines.
1 code implementation • 24 Mar 2022 • Zhongping Zhang, Huiwen He, Bryan A. Plummer, Zhenyu Liao, Huayan Wang
Unlike object detection methods based solely on object category, our method can accurately recognize the target object by comprehending the objects and their semantic relationships within a complex scene.
1 code implementation • 4 Feb 2022 • Andrea Burns, Deniz Arsan, Sanjna Agrawal, Ranjitha Kumar, Kate Saenko, Bryan A. Plummer
To study VLN with unknown command feasibility, we introduce a new dataset Mobile app Tasks with Iterative Feedback (MoTIF), where the goal is to complete a natural language command in a mobile app.
1 code implementation • 11 Dec 2021 • Zhongping Zhang, Yiwen Gu, Bryan A. Plummer
Article comprehension is an important challenge in natural language processing with many applications such as article generation or image-to-article retrieval.
no code implementations • 6 Dec 2021 • Siqi Wang, Manyuan Lu, Nikita Moshkov, Juan C. Caicedo, Bryan A. Plummer
Analyzing the morphology of cells in microscopy images can provide insights into the mechanism of compounds or the function of genes.
no code implementations • 6 Dec 2021 • Maan Qraitem, Bryan A. Plummer
Phrase detection requires methods to identify if a phrase is relevant to an image and localize it, if applicable.
no code implementations • 20 Oct 2021 • Reuben Tan, Bryan A. Plummer, Kate Saenko, Hailin Jin, Bryan Russell
Key to our approach is the ability to learn to spatially localize interactions with self-supervision on a large corpus of videos with accompanying transcribed narrations.
no code implementations • 29 Sep 2021 • Piotr Teterwak, Nikoli Dryden, Dina Bashkirova, Kate Saenko, Bryan A. Plummer
We improve on these methods with MixtureEnsembles, which learns to factorize ensemble members with shared parameters by constructing each layer with a linear combination of templates.
1 code implementation • 17 Apr 2021 • Andrea Burns, Deniz Arsan, Sanjna Agrawal, Ranjitha Kumar, Kate Saenko, Bryan A. Plummer
In recent years, vision-language research has shifted to study tasks which require more complex reasoning, such as interactive question answering, visual common sense reasoning, and question-answer plausibility prediction.
no code implementations • ICCV 2021 • Donghyun Kim, Kuniaki Saito, Tae-Hyun Oh, Bryan A. Plummer, Stan Sclaroff, Kate Saenko
We present a two-stage pre-training approach that improves the generalization ability of standard single-domain pre-training.
1 code implementation • EMNLP 2020 • Reuben Tan, Bryan A. Plummer, Kate Saenko
In addition to the valuable insights gleaned from our user study experiments, we provide a relatively effective approach based on detecting visual-semantic inconsistencies, which will serve as an effective first line of defense and a useful reference for future work in defending against machine-generated disinformation.
1 code implementation • ICLR 2022 • Bryan A. Plummer, Nikoli Dryden, Julius Frost, Torsten Hoefler, Kate Saenko
We introduce Neural Parameter Allocation Search (NPAS), a novel task where the goal is to train a neural network given an arbitrary, fixed parameter budget.
no code implementations • ECCV 2020 • Andrea Burns, Donghyun Kim, Derry Wijaya, Kate Saenko, Bryan A. Plummer
Current multilingual vision-language models either require a large number of additional parameters for each supported language, or suffer performance degradation as languages are added.
no code implementations • 18 Mar 2020 • Donghyun Kim, Kuniaki Saito, Tae-Hyun Oh, Bryan A. Plummer, Stan Sclaroff, Kate Saenko
We show that when labeled source examples are limited, existing methods often fail to learn discriminative features applicable for both source and target domains.
no code implementations • 18 Feb 2020 • Donghyun Kim, Tian Lan, Chuhang Zou, Ning Xu, Bryan A. Plummer, Stan Sclaroff, Jayan Eledath, Gerard Medioni
We embed the attention module in a ``slow-fast'' architecture, where the slower network runs on sparsely sampled keyframes and the light-weight shallow network runs on non-keyframes at a high frame rate.
no code implementations • 27 Sep 2019 • Reuben Tan, Huijuan Xu, Kate Saenko, Bryan A. Plummer
However, while such approaches tend to focus on identifying relationships between elements of the video and language modalities, there is less emphasis on modeling relational context between video frames given the semantic context of the query.
no code implementations • 25 Sep 2019 • Reuben Tan, Huijuan Xu, Kate Saenko, Bryan A. Plummer
Given a video and a sentence, the goal of weakly-supervised video moment retrieval is to locate the video segment which is described by the sentence without having access to temporal annotations during training.
no code implementations • 8 Sep 2019 • Donghyun Kim, Kuniaki Saito, Kate Saenko, Stan Sclaroff, Bryan A. Plummer
In this paper, we present a modular approach which can easily be incorporated into existing vision-language methods in order to support many languages.
1 code implementation • ICCV 2019 • Reuben Tan, Mariya I. Vasileva, Kate Saenko, Bryan A. Plummer
Many real-world tasks require models to compare images along multiple similarity conditions (e. g. similarity in color, category or shape).
no code implementations • ICCV 2019 • Andrea Burns, Reuben Tan, Kate Saenko, Stan Sclaroff, Bryan A. Plummer
Shouldn't language and vision features be treated equally in vision-language (VL) tasks?
1 code implementation • ECCV 2020 • Bryan A. Plummer, Mariya I. Vasileva, Vitali Petsiuk, Kate Saenko, David Forsyth
Explaining a deep learning model can help users understand its behavior and allow researchers to discern its shortcomings.
3 code implementations • 17 Nov 2018 • Bryan A. Plummer, Kevin J. Shih, Yichen Li, Ke Xu, Svetlana Lazebnik, Stan Sclaroff, Kate Saenko
Most existing work that grounds natural language phrases in images starts with the assumption that the phrase in question is relevant to the image.
no code implementations • 24 Sep 2018 • Bryan A. Plummer, M. Hadi Kiapour, Shuai Zheng, Robinson Piramuthu
In this paper, we introduce an attribute-based interactive image search which can leverage human-in-the-loop feedback to iteratively refine image search results.
1 code implementation • 13 Apr 2018 • Huijuan Xu, Kun He, Bryan A. Plummer, Leonid Sigal, Stan Sclaroff, Kate Saenko
To capture the inherent structures present in both text and video, we introduce a multilevel model that integrates vision and language features earlier and more tightly than prior work.
2 code implementations • ECCV 2018 • Mariya I. Vasileva, Bryan A. Plummer, Krishna Dusad, Shreya Rajpal, Ranjitha Kumar, David Forsyth
Outfits in online fashion data are composed of items of many different types (e. g. top, bottom, shoes) that share some stylistic relationship with one another.
1 code implementation • ECCV 2018 • Bryan A. Plummer, Paige Kordas, M. Hadi Kiapour, Shuai Zheng, Robinson Piramuthu, Svetlana Lazebnik
This paper presents an approach for grounding phrases in images which jointly learns multiple text-conditioned embeddings in a single end-to-end model.
no code implementations • CVPR 2017 • Bryan A. Plummer, Matthew Brown, Svetlana Lazebnik
This paper addresses video summarization, or the problem of distilling a raw video into a shorter form while still capturing the original story.
1 code implementation • ICCV 2017 • Bryan A. Plummer, Arun Mallya, Christopher M. Cervantes, Julia Hockenmaier, Svetlana Lazebnik
This paper presents a framework for localization or grounding of phrases in images using a large collection of linguistic and visual cues.
2 code implementations • ICCV 2015 • Bryan A. Plummer, Li-Wei Wang, Chris M. Cervantes, Juan C. Caicedo, Julia Hockenmaier, Svetlana Lazebnik
The Flickr30k dataset has become a standard benchmark for sentence-based image description.
Ranked #15 on Phrase Grounding on Flickr30k Entities Test