Search Results for author: Bryan A. Plummer

Found 53 papers, 30 papers with code

Koala: Key frame-conditioned long video-LLM

no code implementations • 5 Apr 2024 • Reuben Tan, Ximeng Sun, Ping Hu, Jui-Hsien Wang, Hanieh Deilamsalehy, Bryan A. Plummer, Bryan Russell, Kate Saenko

Long video question answering is a challenging task that involves recognizing short-term activities and reasoning about their fine-grained relationships.

Action Recognition Question Answering +2

Paper
Add Code

Machine-generated Text Localization

1 code implementation • 19 Feb 2024 • Zhongping Zhang, Wenda Qin, Bryan A. Plummer

Machine-Generated Text (MGT) detection aims to identify a piece of text as machine or human written.

Binary Classification Misinformation +1

Paper
Code

Vision-LLMs Can Fool Themselves with Self-Generated Typographic Attacks

1 code implementation • 1 Feb 2024 • Maan Qraitem, Nazia Tasnim, Piotr Teterwak, Kate Saenko, Bryan A. Plummer

Furthermore, prior work's Typographic attacks against CLIP randomly sample a misleading class from a predefined set of categories.

Descriptive

Paper
Code

UniHuman: A Unified Model for Editing Human Images in the Wild

1 code implementation • 22 Dec 2023 • Nannan Li, Qing Liu, Krishna Kumar Singh, Yilin Wang, Jianming Zhang, Bryan A. Plummer, Zhe Lin

In this paper, we propose UniHuman, a unified model that addresses multiple facets of human image editing in real-world settings.

Paper
Code

CLAMP: Contrastive LAnguage Model Prompt-tuning

no code implementations • 4 Dec 2023 • Piotr Teterwak, Ximeng Sun, Bryan A. Plummer, Kate Saenko, Ser-Nam Lim

Our results show that LLMs can, indeed, achieve good image classification performance when adapted this way.

Contrastive Learning Image Captioning +5

Paper
Add Code

Learning to Compose SuperWeights for Neural Parameter Allocation Search

1 code implementation • 3 Dec 2023 • Piotr Teterwak, Soren Nelson, Nikoli Dryden, Dina Bashkirova, Kate Saenko, Bryan A. Plummer

To address this, we generate layer weights by learning to compose sets of SuperWeights, which represent a group of trainable parameters.

Paper
Code

A Unified Framework for Connecting Noise Modeling to Boost Noise Detection

1 code implementation • 30 Nov 2023 • Siqi Wang, Chau Pham, Bryan A. Plummer

In this work, we explore the integration of these two approaches, proposing an interconnected structure with three crucial blocks: noise modeling, source knowledge identification, and enhanced noise detection using noise source-knowledge-integration methods.

Learning with noisy labels

Paper
Code

MixtureGrowth: Growing Neural Networks by Recombining Learned Parameters

1 code implementation • 7 Nov 2023 • Chau Pham, Piotr Teterwak, Soren Nelson, Bryan A. Plummer

Newly grown layer weights are generated by using a new linear combination of existing templates for a layer.

Paper
Code

CHAMMI: A benchmark for channel-adaptive models in microscopy imaging

2 code implementations • NeurIPS 2023 • Zitong Chen, Chau Pham, Siqi Wang, Michael Doron, Nikita Moshkov, Bryan A. Plummer, Juan C. Caicedo

In this paper, we present a benchmark for investigating channel-adaptive models in microscopy imaging, which consists of 1) a dataset of varied-channel single-cell images, and 2) a biologically relevant evaluation framework.

Paper
Code

Let Models Speak Ciphers: Multiagent Debate through Embeddings

no code implementations • 10 Oct 2023 • Chau Pham, Boyi Liu, Yingxiang Yang, Zhengyu Chen, Tianyi Liu, Jianbo Yuan, Bryan A. Plummer, Zhaoran Wang, Hongxia Yang

Although natural language is an obvious choice for communication due to LLM's language understanding capability, the token sampling step needed when generating natural language poses a potential risk of information loss, as it uses only one token to represent the model's belief across the entire vocabulary.

Paper
Add Code

Socratis: Are large multimodal models emotionally aware?

no code implementations • 31 Aug 2023 • Katherine Deng, Arijit Ray, Reuben Tan, Saadia Gabriel, Bryan A. Plummer, Kate Saenko

We further see that current captioning metrics based on large vision-language models also fail to correlate with human preferences.

Paper
Add Code

From Fake to Real: Pretraining on Balanced Synthetic Images to Prevent Bias

no code implementations • 8 Aug 2023 • Maan Qraitem, Kate Saenko, Bryan A. Plummer

By training on real and synthetic data separately, FFR avoids the issue of bias toward signals from the pair $(B, G)$.

Paper
Add Code

Multiscale Video Pretraining for Long-Term Activity Forecasting

no code implementations • 24 Jul 2023 • Reuben Tan, Matthias De Lange, Michael Iuzzolino, Bryan A. Plummer, Kate Saenko, Karl Ridgeway, Lorenzo Torresani

To alleviate this issue, we propose Multiscale Video Pretraining (MVP), a novel self-supervised pretraining approach that learns robust representations for forecasting by learning to predict contextualized representations of future video clips over multiple timescales.

Action Anticipation Long Term Action Anticipation

Paper
Add Code

LNL+K: Learning with Noisy Labels and Noise Source Distribution Knowledge

1 code implementation • 20 Jun 2023 • Siqi Wang, Bryan A. Plummer

Learning with noisy labels (LNL) is challenging as the model tends to memorize noisy labels, which can lead to overfitting.

Learning with noisy labels

Paper
Code

Text-to-image Editing by Image Information Removal

no code implementations • 27 May 2023 • Zhongping Zhang, Jian Zheng, Jacob Zhiyuan Fang, Bryan A. Plummer

Using the input image as a control could mitigate these issues, but since these models are trained via reconstruction, a model can simply hide information about the original image when encoding it to perfectly reconstruct the image without learning the editing task.

Image Generation Image Reconstruction

Paper
Add Code

WikiWeb2M: A Page-Level Multimodal Wikipedia Dataset

2 code implementations • 9 May 2023 • Andrea Burns, Krishna Srinivasan, Joshua Ainslie, Geoff Brown, Bryan A. Plummer, Kate Saenko, Jianmo Ni, Mandy Guo

Webpages have been a rich resource for language and vision-language tasks.

Image Captioning

957

Paper
Code

A Suite of Generative Tasks for Multi-Level Multimodal Webpage Understanding

1 code implementation • 5 May 2023 • Andrea Burns, Krishna Srinivasan, Joshua Ainslie, Geoff Brown, Bryan A. Plummer, Kate Saenko, Jianmo Ni, Mandy Guo

Webpages have been a rich, scalable resource for vision-language and language only tasks.

Image Captioning

957

Paper
Code

ERM++: An Improved Baseline for Domain Generalization

1 code implementation • 4 Apr 2023 • Piotr Teterwak, Kuniaki Saito, Theodoros Tsiligkaridis, Kate Saenko, Bryan A. Plummer

We also explore the relationship between DG performance and similarity to pre-training data, and find that similarity to pre-training data distributions is an important driver of performance, but that ERM++ with stronger initializations can deliver strong performance even on dissimilar datasets. Code is released at https://github. com/piotr-teterwak/erm_plusplus.

Domain Generalization

Paper
Code

Language-Guided Audio-Visual Source Separation via Trimodal Consistency

no code implementations • CVPR 2023 • Reuben Tan, Arijit Ray, Andrea Burns, Bryan A. Plummer, Justin Salamon, Oriol Nieto, Bryan Russell, Kate Saenko

We propose a self-supervised approach for learning to perform audio source separation in videos based on natural language queries, using only unlabeled video and audio pairs as training data.

Audio Source Separation Natural Language Queries

Paper
Add Code

Human Evaluation of Text-to-Image Models on a Multi-Task Benchmark

no code implementations • 22 Nov 2022 • Vitali Petsiuk, Alexander E. Siemenn, Saisamrit Surbehera, Zad Chin, Keith Tyser, Gregory Hunter, Arvind Raghavan, Yann Hicke, Bryan A. Plummer, Ori Kerret, Tonio Buonassisi, Kate Saenko, Armando Solar-Lezama, Iddo Drori

For example, asking a model to generate a varying number of the same object to measure its ability to count or providing a text prompt with several objects that each have a different attribute to identify its ability to match objects and attributes correctly.

Attribute Text-to-Image Generation

Paper
Add Code

Collecting The Puzzle Pieces: Disentangled Self-Driven Human Pose Transfer by Permuting Textures

1 code implementation • ICCV 2023 • Nannan Li, Kevin J. Shih, Bryan A. Plummer

Then we reconstruct the input image by sampling from the permuted textures for patch-level disentanglement.

Disentanglement Pose Transfer +1

Paper
Code

Bias Mimicking: A Simple Sampling Approach for Bias Mitigation

1 code implementation • CVPR 2023 • Maan Qraitem, Kate Saenko, Bryan A. Plummer

Using this notion, BM, through a novel training procedure, ensures that the model is exposed to the entire distribution per epoch without repeating samples.

Paper
Code

NewsStories: Illustrating articles with visual summaries

1 code implementation • 26 Jul 2022 • Reuben Tan, Bryan A. Plummer, Kate Saenko, JP Lewis, Avneesh Sud, Thomas Leung

Thus, we explore a novel setting where the goal is to learn a self-supervised visual-language representation that is robust to varying text length and the number of images.

Retrieval

Paper
Code

Supervised Attribute Information Removal and Reconstruction for Image Manipulation

1 code implementation • 13 Jul 2022 • Nannan Li, Bryan A. Plummer

Thus, the source attribute information can often be hidden in the disentangled features, leading to unwanted image editing effects.

Attribute Image Manipulation +1

Paper
Code

Complex Scene Image Editing by Scene Graph Comprehension

1 code implementation • 24 Mar 2022 • Zhongping Zhang, Huiwen He, Bryan A. Plummer, Zhenyu Liao, Huayan Wang

Unlike object detection methods based solely on object category, our method can accurately recognize the target object by comprehending the objects and their semantic relationships within a complex scene.

Image Inpainting Image Manipulation +4

Paper
Code

Movie Genre Classification by Language Augmentation and Shot Sampling

1 code implementation • 24 Mar 2022 • Zhongping Zhang, Yiwen Gu, Bryan A. Plummer, Xin Miao, Jiayi Liu, Huayan Wang

We evaluate our method on MovieNet and Condensed Movies datasets, achieving approximate 6-9% improvement in mean Average Precision (mAP) over the baselines.

Action Recognition Boundary Detection +6

Paper
Code

A Dataset for Interactive Vision-Language Navigation with Unknown Command Feasibility

1 code implementation • 4 Feb 2022 • Andrea Burns, Deniz Arsan, Sanjna Agrawal, Ranjitha Kumar, Kate Saenko, Bryan A. Plummer

To study VLN with unknown command feasibility, we introduce a new dataset Mobile app Tasks with Iterative Feedback (MoTIF), where the goal is to complete a natural language command in a mobile app.

Common Sense Reasoning Question Answering +1

Paper
Code

Show, Write, and Retrieve: Entity-aware Article Generation and Retrieval

1 code implementation • 11 Dec 2021 • Zhongping Zhang, Yiwen Gu, Bryan A. Plummer

Article comprehension is an important challenge in natural language processing with many applications such as article generation or image-to-article retrieval.

News Generation Retrieval

Paper
Code

Anchoring to Exemplars for Training Mixture-of-Expert Cell Embeddings

no code implementations • 6 Dec 2021 • Siqi Wang, Manyuan Lu, Nikita Moshkov, Juan C. Caicedo, Bryan A. Plummer

Analyzing the morphology of cells in microscopy images can provide insights into the mechanism of compounds or the function of genes.

Drug Discovery

Paper
Add Code

From Coarse to Fine-grained Concept based Discrimination for Phrase Detection

no code implementations • 6 Dec 2021 • Maan Qraitem, Bryan A. Plummer

Phrase detection requires methods to identify if a phrase is relevant to an image and localize it, if applicable.

Paper
Add Code

Look at What I'm Doing: Self-Supervised Spatial Grounding of Narrations in Instructional Videos

no code implementations • 20 Oct 2021 • Reuben Tan, Bryan A. Plummer, Kate Saenko, Hailin Jin, Bryan Russell

Key to our approach is the ability to learn to spatially localize interactions with self-supervision on a large corpus of videos with accompanying transcribed narrations.

Paper
Add Code

MixtureEnsembles: Leveraging Parameter Sharing for Efficient Ensembles

no code implementations • 29 Sep 2021 • Piotr Teterwak, Nikoli Dryden, Dina Bashkirova, Kate Saenko, Bryan A. Plummer

We improve on these methods with MixtureEnsembles, which learns to factorize ensemble members with shared parameters by constructing each layer with a linear combination of templates.

Paper
Add Code

Mobile App Tasks with Iterative Feedback (MoTIF): Addressing Task Feasibility in Interactive Visual Environments

1 code implementation • 17 Apr 2021 • Andrea Burns, Deniz Arsan, Sanjna Agrawal, Ranjitha Kumar, Kate Saenko, Bryan A. Plummer

In recent years, vision-language research has shifted to study tasks which require more complex reasoning, such as interactive question answering, visual common sense reasoning, and question-answer plausibility prediction.

Common Sense Reasoning Question Answering

Paper
Code

CDS: Cross-Domain Self-Supervised Pre-Training

no code implementations • ICCV 2021 • Donghyun Kim, Kuniaki Saito, Tae-Hyun Oh, Bryan A. Plummer, Stan Sclaroff, Kate Saenko

We present a two-stage pre-training approach that improves the generalization ability of standard single-domain pre-training.

Domain Adaptation Transfer Learning

Paper
Add Code

Detecting Cross-Modal Inconsistency to Defend Against Neural Fake News

1 code implementation • EMNLP 2020 • Reuben Tan, Bryan A. Plummer, Kate Saenko

In addition to the valuable insights gleaned from our user study experiments, we provide a relatively effective approach based on detecting visual-semantic inconsistencies, which will serve as an effective first line of defense and a useful reference for future work in defending against machine-generated disinformation.

Paper
Code

Neural Parameter Allocation Search

1 code implementation • ICLR 2022 • Bryan A. Plummer, Nikoli Dryden, Julius Frost, Torsten Hoefler, Kate Saenko

We introduce Neural Parameter Allocation Search (NPAS), a novel task where the goal is to train a neural network given an arbitrary, fixed parameter budget.

Image Classification Phrase Grounding

Paper
Code

Learning to Scale Multilingual Representations for Vision-Language Tasks

no code implementations • ECCV 2020 • Andrea Burns, Donghyun Kim, Derry Wijaya, Kate Saenko, Bryan A. Plummer

Current multilingual vision-language models either require a large number of additional parameters for each supported language, or suffer performance degradation as languages are added.

Language Modelling Machine Translation +3

Paper
Add Code

Cross-domain Self-supervised Learning for Domain Adaptation with Few Source Labels

no code implementations • 18 Mar 2020 • Donghyun Kim, Kuniaki Saito, Tae-Hyun Oh, Bryan A. Plummer, Stan Sclaroff, Kate Saenko

We show that when labeled source examples are limited, existing methods often fail to learn discriminative features applicable for both source and target domains.

Self-Supervised Learning Unsupervised Domain Adaptation

Paper
Add Code

MILA: Multi-Task Learning from Videos via Efficient Inter-Frame Attention

no code implementations • 18 Feb 2020 • Donghyun Kim, Tian Lan, Chuhang Zou, Ning Xu, Bryan A. Plummer, Stan Sclaroff, Jayan Eledath, Gerard Medioni

We embed the attention module in a ``slow-fast'' architecture, where the slower network runs on sparsely sampled keyframes and the light-weight shallow network runs on non-keyframes at a high frame rate.

Multi-Task Learning

Paper
Add Code

LoGAN: Latent Graph Co-Attention Network for Weakly-Supervised Video Moment Retrieval

no code implementations • 27 Sep 2019 • Reuben Tan, Huijuan Xu, Kate Saenko, Bryan A. Plummer

However, while such approaches tend to focus on identifying relationships between elements of the video and language modalities, there is less emphasis on modeling relational context between video frames given the semantic context of the query.

Moment Retrieval Retrieval

Paper
Add Code

wMAN: WEAKLY-SUPERVISED MOMENT ALIGNMENT NETWORK FOR TEXT-BASED VIDEO SEGMENT RETRIEVAL

no code implementations • 25 Sep 2019 • Reuben Tan, Huijuan Xu, Kate Saenko, Bryan A. Plummer

Given a video and a sentence, the goal of weakly-supervised video moment retrieval is to locate the video segment which is described by the sentence without having access to temporal annotations during training.

Moment Retrieval Retrieval +1

Paper
Add Code

MULE: Multimodal Universal Language Embedding

no code implementations • 8 Sep 2019 • Donghyun Kim, Kuniaki Saito, Kate Saenko, Stan Sclaroff, Bryan A. Plummer

In this paper, we present a modular approach which can easily be incorporated into existing vision-language methods in order to support many languages.

Data Augmentation Machine Translation +2

Paper
Add Code

Learning Similarity Conditions Without Explicit Supervision

1 code implementation • ICCV 2019 • Reuben Tan, Mariya I. Vasileva, Kate Saenko, Bryan A. Plummer

Many real-world tasks require models to compare images along multiple similarity conditions (e. g. similarity in color, category or shape).

Paper
Code

Language Features Matter: Effective Language Representations for Vision-Language Tasks

no code implementations • ICCV 2019 • Andrea Burns, Reuben Tan, Kate Saenko, Stan Sclaroff, Bryan A. Plummer

Shouldn't language and vision features be treated equally in vision-language (VL) tasks?

Image Captioning Language Modelling +6

Paper
Add Code

Why do These Match? Explaining the Behavior of Image Similarity Models

1 code implementation • ECCV 2020 • Bryan A. Plummer, Mariya I. Vasileva, Vitali Petsiuk, Kate Saenko, David Forsyth

Explaining a deep learning model can help users understand its behavior and allow researchers to discern its shortcomings.

Attribute General Classification +3

Paper
Code

Revisiting Image-Language Networks for Open-ended Phrase Detection

3 code implementations • 17 Nov 2018 • Bryan A. Plummer, Kevin J. Shih, Yichen Li, Ke Xu, Svetlana Lazebnik, Stan Sclaroff, Kate Saenko

Most existing work that grounds natural language phrases in images starts with the assumption that the phrase in question is relevant to the image.

object-detection Object Detection +1

Paper
Code

Give me a hint! Navigating Image Databases using Human-in-the-loop Feedback

no code implementations • 24 Sep 2018 • Bryan A. Plummer, M. Hadi Kiapour, Shuai Zheng, Robinson Piramuthu

In this paper, we introduce an attribute-based interactive image search which can leverage human-in-the-loop feedback to iteratively refine image search results.

Attribute Image Retrieval

Paper
Add Code

Multilevel Language and Vision Integration for Text-to-Clip Retrieval

1 code implementation • 13 Apr 2018 • Huijuan Xu, Kun He, Bryan A. Plummer, Leonid Sigal, Stan Sclaroff, Kate Saenko

To capture the inherent structures present in both text and video, we introduce a multilevel model that integrates vision and language features earlier and more tightly than prior work.

Retrieval Sentence

Paper
Code

Learning Type-Aware Embeddings for Fashion Compatibility

2 code implementations • ECCV 2018 • Mariya I. Vasileva, Bryan A. Plummer, Krishna Dusad, Shreya Rajpal, Ranjitha Kumar, David Forsyth

Outfits in online fashion data are composed of items of many different types (e. g. top, bottom, shoes) that share some stylistic relationship with one another.

Vocal Bursts Type Prediction

149

Paper
Code

Conditional Image-Text Embedding Networks

1 code implementation • ECCV 2018 • Bryan A. Plummer, Paige Kordas, M. Hadi Kiapour, Shuai Zheng, Robinson Piramuthu, Svetlana Lazebnik

This paper presents an approach for grounding phrases in images which jointly learns multiple text-conditioned embeddings in a single end-to-end model.

Phrase Grounding

Paper
Code

Enhancing Video Summarization via Vision-Language Embedding

no code implementations • CVPR 2017 • Bryan A. Plummer, Matthew Brown, Svetlana Lazebnik

This paper addresses video summarization, or the problem of distilling a raw video into a shorter form while still capturing the original story.

Video Summarization

Paper
Add Code

Phrase Localization and Visual Relationship Detection with Comprehensive Image-Language Cues

1 code implementation • ICCV 2017 • Bryan A. Plummer, Arun Mallya, Christopher M. Cervantes, Julia Hockenmaier, Svetlana Lazebnik

This paper presents a framework for localization or grounding of phrases in images using a large collection of linguistic and visual cues.

Attribute Position +2

Paper
Code

Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models

2 code implementations • ICCV 2015 • Bryan A. Plummer, Li-Wei Wang, Chris M. Cervantes, Juan C. Caicedo, Julia Hockenmaier, Svetlana Lazebnik

The Flickr30k dataset has become a standard benchmark for sentence-based image description.

Ranked #17 on Image Retrieval on Flickr30K 1K test

Retrieval Sentence

148

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.