Search Results for author: Bryan A. Plummer

Found 53 papers, 30 papers with code

Koala: Key frame-conditioned long video-LLM

no code implementations5 Apr 2024 Reuben Tan, Ximeng Sun, Ping Hu, Jui-Hsien Wang, Hanieh Deilamsalehy, Bryan A. Plummer, Bryan Russell, Kate Saenko

Long video question answering is a challenging task that involves recognizing short-term activities and reasoning about their fine-grained relationships.

Action Recognition Question Answering +2

Machine-generated Text Localization

1 code implementation19 Feb 2024 Zhongping Zhang, Wenda Qin, Bryan A. Plummer

Machine-Generated Text (MGT) detection aims to identify a piece of text as machine or human written.

Binary Classification Misinformation +1

Vision-LLMs Can Fool Themselves with Self-Generated Typographic Attacks

1 code implementation1 Feb 2024 Maan Qraitem, Nazia Tasnim, Piotr Teterwak, Kate Saenko, Bryan A. Plummer

Furthermore, prior work's Typographic attacks against CLIP randomly sample a misleading class from a predefined set of categories.

Descriptive

UniHuman: A Unified Model for Editing Human Images in the Wild

1 code implementation22 Dec 2023 Nannan Li, Qing Liu, Krishna Kumar Singh, Yilin Wang, Jianming Zhang, Bryan A. Plummer, Zhe Lin

In this paper, we propose UniHuman, a unified model that addresses multiple facets of human image editing in real-world settings.

2k

CLAMP: Contrastive LAnguage Model Prompt-tuning

no code implementations4 Dec 2023 Piotr Teterwak, Ximeng Sun, Bryan A. Plummer, Kate Saenko, Ser-Nam Lim

Our results show that LLMs can, indeed, achieve good image classification performance when adapted this way.

Contrastive Learning Image Captioning +5

Learning to Compose SuperWeights for Neural Parameter Allocation Search

1 code implementation3 Dec 2023 Piotr Teterwak, Soren Nelson, Nikoli Dryden, Dina Bashkirova, Kate Saenko, Bryan A. Plummer

To address this, we generate layer weights by learning to compose sets of SuperWeights, which represent a group of trainable parameters.

A Unified Framework for Connecting Noise Modeling to Boost Noise Detection

1 code implementation30 Nov 2023 Siqi Wang, Chau Pham, Bryan A. Plummer

In this work, we explore the integration of these two approaches, proposing an interconnected structure with three crucial blocks: noise modeling, source knowledge identification, and enhanced noise detection using noise source-knowledge-integration methods.

Learning with noisy labels

MixtureGrowth: Growing Neural Networks by Recombining Learned Parameters

1 code implementation7 Nov 2023 Chau Pham, Piotr Teterwak, Soren Nelson, Bryan A. Plummer

Newly grown layer weights are generated by using a new linear combination of existing templates for a layer.

CHAMMI: A benchmark for channel-adaptive models in microscopy imaging

2 code implementations NeurIPS 2023 Zitong Chen, Chau Pham, Siqi Wang, Michael Doron, Nikita Moshkov, Bryan A. Plummer, Juan C. Caicedo

In this paper, we present a benchmark for investigating channel-adaptive models in microscopy imaging, which consists of 1) a dataset of varied-channel single-cell images, and 2) a biologically relevant evaluation framework.

Let Models Speak Ciphers: Multiagent Debate through Embeddings

no code implementations10 Oct 2023 Chau Pham, Boyi Liu, Yingxiang Yang, Zhengyu Chen, Tianyi Liu, Jianbo Yuan, Bryan A. Plummer, Zhaoran Wang, Hongxia Yang

Although natural language is an obvious choice for communication due to LLM's language understanding capability, the token sampling step needed when generating natural language poses a potential risk of information loss, as it uses only one token to represent the model's belief across the entire vocabulary.

Socratis: Are large multimodal models emotionally aware?

no code implementations31 Aug 2023 Katherine Deng, Arijit Ray, Reuben Tan, Saadia Gabriel, Bryan A. Plummer, Kate Saenko

We further see that current captioning metrics based on large vision-language models also fail to correlate with human preferences.

From Fake to Real: Pretraining on Balanced Synthetic Images to Prevent Bias

no code implementations8 Aug 2023 Maan Qraitem, Kate Saenko, Bryan A. Plummer

By training on real and synthetic data separately, FFR avoids the issue of bias toward signals from the pair $(B, G)$.

Multiscale Video Pretraining for Long-Term Activity Forecasting

no code implementations24 Jul 2023 Reuben Tan, Matthias De Lange, Michael Iuzzolino, Bryan A. Plummer, Kate Saenko, Karl Ridgeway, Lorenzo Torresani

To alleviate this issue, we propose Multiscale Video Pretraining (MVP), a novel self-supervised pretraining approach that learns robust representations for forecasting by learning to predict contextualized representations of future video clips over multiple timescales.

Action Anticipation Long Term Action Anticipation

LNL+K: Learning with Noisy Labels and Noise Source Distribution Knowledge

1 code implementation20 Jun 2023 Siqi Wang, Bryan A. Plummer

Learning with noisy labels (LNL) is challenging as the model tends to memorize noisy labels, which can lead to overfitting.

Learning with noisy labels

Text-to-image Editing by Image Information Removal

no code implementations27 May 2023 Zhongping Zhang, Jian Zheng, Jacob Zhiyuan Fang, Bryan A. Plummer

Using the input image as a control could mitigate these issues, but since these models are trained via reconstruction, a model can simply hide information about the original image when encoding it to perfectly reconstruct the image without learning the editing task.

Image Generation Image Reconstruction

ERM++: An Improved Baseline for Domain Generalization

1 code implementation4 Apr 2023 Piotr Teterwak, Kuniaki Saito, Theodoros Tsiligkaridis, Kate Saenko, Bryan A. Plummer

We also explore the relationship between DG performance and similarity to pre-training data, and find that similarity to pre-training data distributions is an important driver of performance, but that ERM++ with stronger initializations can deliver strong performance even on dissimilar datasets. Code is released at https://github. com/piotr-teterwak/erm_plusplus.

Domain Generalization

Language-Guided Audio-Visual Source Separation via Trimodal Consistency

no code implementations CVPR 2023 Reuben Tan, Arijit Ray, Andrea Burns, Bryan A. Plummer, Justin Salamon, Oriol Nieto, Bryan Russell, Kate Saenko

We propose a self-supervised approach for learning to perform audio source separation in videos based on natural language queries, using only unlabeled video and audio pairs as training data.

Audio Source Separation Natural Language Queries

Human Evaluation of Text-to-Image Models on a Multi-Task Benchmark

no code implementations22 Nov 2022 Vitali Petsiuk, Alexander E. Siemenn, Saisamrit Surbehera, Zad Chin, Keith Tyser, Gregory Hunter, Arvind Raghavan, Yann Hicke, Bryan A. Plummer, Ori Kerret, Tonio Buonassisi, Kate Saenko, Armando Solar-Lezama, Iddo Drori

For example, asking a model to generate a varying number of the same object to measure its ability to count or providing a text prompt with several objects that each have a different attribute to identify its ability to match objects and attributes correctly.

Attribute Text-to-Image Generation

Bias Mimicking: A Simple Sampling Approach for Bias Mitigation

1 code implementation CVPR 2023 Maan Qraitem, Kate Saenko, Bryan A. Plummer

Using this notion, BM, through a novel training procedure, ensures that the model is exposed to the entire distribution per epoch without repeating samples.

NewsStories: Illustrating articles with visual summaries

1 code implementation26 Jul 2022 Reuben Tan, Bryan A. Plummer, Kate Saenko, JP Lewis, Avneesh Sud, Thomas Leung

Thus, we explore a novel setting where the goal is to learn a self-supervised visual-language representation that is robust to varying text length and the number of images.

Retrieval

Supervised Attribute Information Removal and Reconstruction for Image Manipulation

1 code implementation13 Jul 2022 Nannan Li, Bryan A. Plummer

Thus, the source attribute information can often be hidden in the disentangled features, leading to unwanted image editing effects.

Attribute Image Manipulation +1

Complex Scene Image Editing by Scene Graph Comprehension

1 code implementation24 Mar 2022 Zhongping Zhang, Huiwen He, Bryan A. Plummer, Zhenyu Liao, Huayan Wang

Unlike object detection methods based solely on object category, our method can accurately recognize the target object by comprehending the objects and their semantic relationships within a complex scene.

Image Inpainting Image Manipulation +4

Movie Genre Classification by Language Augmentation and Shot Sampling

1 code implementation24 Mar 2022 Zhongping Zhang, Yiwen Gu, Bryan A. Plummer, Xin Miao, Jiayi Liu, Huayan Wang

We evaluate our method on MovieNet and Condensed Movies datasets, achieving approximate 6-9% improvement in mean Average Precision (mAP) over the baselines.

Action Recognition Boundary Detection +6

A Dataset for Interactive Vision-Language Navigation with Unknown Command Feasibility

1 code implementation4 Feb 2022 Andrea Burns, Deniz Arsan, Sanjna Agrawal, Ranjitha Kumar, Kate Saenko, Bryan A. Plummer

To study VLN with unknown command feasibility, we introduce a new dataset Mobile app Tasks with Iterative Feedback (MoTIF), where the goal is to complete a natural language command in a mobile app.

Common Sense Reasoning Question Answering +1

Show, Write, and Retrieve: Entity-aware Article Generation and Retrieval

1 code implementation11 Dec 2021 Zhongping Zhang, Yiwen Gu, Bryan A. Plummer

Article comprehension is an important challenge in natural language processing with many applications such as article generation or image-to-article retrieval.

News Generation Retrieval

Anchoring to Exemplars for Training Mixture-of-Expert Cell Embeddings

no code implementations6 Dec 2021 Siqi Wang, Manyuan Lu, Nikita Moshkov, Juan C. Caicedo, Bryan A. Plummer

Analyzing the morphology of cells in microscopy images can provide insights into the mechanism of compounds or the function of genes.

Drug Discovery

From Coarse to Fine-grained Concept based Discrimination for Phrase Detection

no code implementations6 Dec 2021 Maan Qraitem, Bryan A. Plummer

Phrase detection requires methods to identify if a phrase is relevant to an image and localize it, if applicable.

Look at What I'm Doing: Self-Supervised Spatial Grounding of Narrations in Instructional Videos

no code implementations20 Oct 2021 Reuben Tan, Bryan A. Plummer, Kate Saenko, Hailin Jin, Bryan Russell

Key to our approach is the ability to learn to spatially localize interactions with self-supervision on a large corpus of videos with accompanying transcribed narrations.

MixtureEnsembles: Leveraging Parameter Sharing for Efficient Ensembles

no code implementations29 Sep 2021 Piotr Teterwak, Nikoli Dryden, Dina Bashkirova, Kate Saenko, Bryan A. Plummer

We improve on these methods with MixtureEnsembles, which learns to factorize ensemble members with shared parameters by constructing each layer with a linear combination of templates.

Mobile App Tasks with Iterative Feedback (MoTIF): Addressing Task Feasibility in Interactive Visual Environments

1 code implementation17 Apr 2021 Andrea Burns, Deniz Arsan, Sanjna Agrawal, Ranjitha Kumar, Kate Saenko, Bryan A. Plummer

In recent years, vision-language research has shifted to study tasks which require more complex reasoning, such as interactive question answering, visual common sense reasoning, and question-answer plausibility prediction.

Common Sense Reasoning Question Answering

CDS: Cross-Domain Self-Supervised Pre-Training

no code implementations ICCV 2021 Donghyun Kim, Kuniaki Saito, Tae-Hyun Oh, Bryan A. Plummer, Stan Sclaroff, Kate Saenko

We present a two-stage pre-training approach that improves the generalization ability of standard single-domain pre-training.

Domain Adaptation Transfer Learning

Detecting Cross-Modal Inconsistency to Defend Against Neural Fake News

1 code implementation EMNLP 2020 Reuben Tan, Bryan A. Plummer, Kate Saenko

In addition to the valuable insights gleaned from our user study experiments, we provide a relatively effective approach based on detecting visual-semantic inconsistencies, which will serve as an effective first line of defense and a useful reference for future work in defending against machine-generated disinformation.

Neural Parameter Allocation Search

1 code implementation ICLR 2022 Bryan A. Plummer, Nikoli Dryden, Julius Frost, Torsten Hoefler, Kate Saenko

We introduce Neural Parameter Allocation Search (NPAS), a novel task where the goal is to train a neural network given an arbitrary, fixed parameter budget.

Image Classification Phrase Grounding

Learning to Scale Multilingual Representations for Vision-Language Tasks

no code implementations ECCV 2020 Andrea Burns, Donghyun Kim, Derry Wijaya, Kate Saenko, Bryan A. Plummer

Current multilingual vision-language models either require a large number of additional parameters for each supported language, or suffer performance degradation as languages are added.

Language Modelling Machine Translation +3

Cross-domain Self-supervised Learning for Domain Adaptation with Few Source Labels

no code implementations18 Mar 2020 Donghyun Kim, Kuniaki Saito, Tae-Hyun Oh, Bryan A. Plummer, Stan Sclaroff, Kate Saenko

We show that when labeled source examples are limited, existing methods often fail to learn discriminative features applicable for both source and target domains.

Self-Supervised Learning Unsupervised Domain Adaptation

MILA: Multi-Task Learning from Videos via Efficient Inter-Frame Attention

no code implementations18 Feb 2020 Donghyun Kim, Tian Lan, Chuhang Zou, Ning Xu, Bryan A. Plummer, Stan Sclaroff, Jayan Eledath, Gerard Medioni

We embed the attention module in a ``slow-fast'' architecture, where the slower network runs on sparsely sampled keyframes and the light-weight shallow network runs on non-keyframes at a high frame rate.

Multi-Task Learning

LoGAN: Latent Graph Co-Attention Network for Weakly-Supervised Video Moment Retrieval

no code implementations27 Sep 2019 Reuben Tan, Huijuan Xu, Kate Saenko, Bryan A. Plummer

However, while such approaches tend to focus on identifying relationships between elements of the video and language modalities, there is less emphasis on modeling relational context between video frames given the semantic context of the query.

Moment Retrieval Retrieval

wMAN: WEAKLY-SUPERVISED MOMENT ALIGNMENT NETWORK FOR TEXT-BASED VIDEO SEGMENT RETRIEVAL

no code implementations25 Sep 2019 Reuben Tan, Huijuan Xu, Kate Saenko, Bryan A. Plummer

Given a video and a sentence, the goal of weakly-supervised video moment retrieval is to locate the video segment which is described by the sentence without having access to temporal annotations during training.

Moment Retrieval Retrieval +1

MULE: Multimodal Universal Language Embedding

no code implementations8 Sep 2019 Donghyun Kim, Kuniaki Saito, Kate Saenko, Stan Sclaroff, Bryan A. Plummer

In this paper, we present a modular approach which can easily be incorporated into existing vision-language methods in order to support many languages.

Data Augmentation Machine Translation +2

Learning Similarity Conditions Without Explicit Supervision

1 code implementation ICCV 2019 Reuben Tan, Mariya I. Vasileva, Kate Saenko, Bryan A. Plummer

Many real-world tasks require models to compare images along multiple similarity conditions (e. g. similarity in color, category or shape).

Revisiting Image-Language Networks for Open-ended Phrase Detection

3 code implementations17 Nov 2018 Bryan A. Plummer, Kevin J. Shih, Yichen Li, Ke Xu, Svetlana Lazebnik, Stan Sclaroff, Kate Saenko

Most existing work that grounds natural language phrases in images starts with the assumption that the phrase in question is relevant to the image.

object-detection Object Detection +1

Give me a hint! Navigating Image Databases using Human-in-the-loop Feedback

no code implementations24 Sep 2018 Bryan A. Plummer, M. Hadi Kiapour, Shuai Zheng, Robinson Piramuthu

In this paper, we introduce an attribute-based interactive image search which can leverage human-in-the-loop feedback to iteratively refine image search results.

Attribute Image Retrieval

Multilevel Language and Vision Integration for Text-to-Clip Retrieval

1 code implementation13 Apr 2018 Huijuan Xu, Kun He, Bryan A. Plummer, Leonid Sigal, Stan Sclaroff, Kate Saenko

To capture the inherent structures present in both text and video, we introduce a multilevel model that integrates vision and language features earlier and more tightly than prior work.

Retrieval Sentence

Learning Type-Aware Embeddings for Fashion Compatibility

2 code implementations ECCV 2018 Mariya I. Vasileva, Bryan A. Plummer, Krishna Dusad, Shreya Rajpal, Ranjitha Kumar, David Forsyth

Outfits in online fashion data are composed of items of many different types (e. g. top, bottom, shoes) that share some stylistic relationship with one another.

Vocal Bursts Type Prediction

Conditional Image-Text Embedding Networks

1 code implementation ECCV 2018 Bryan A. Plummer, Paige Kordas, M. Hadi Kiapour, Shuai Zheng, Robinson Piramuthu, Svetlana Lazebnik

This paper presents an approach for grounding phrases in images which jointly learns multiple text-conditioned embeddings in a single end-to-end model.

Phrase Grounding

Enhancing Video Summarization via Vision-Language Embedding

no code implementations CVPR 2017 Bryan A. Plummer, Matthew Brown, Svetlana Lazebnik

This paper addresses video summarization, or the problem of distilling a raw video into a shorter form while still capturing the original story.

Video Summarization

Cannot find the paper you are looking for? You can Submit a new open access paper.