Search Results for author: Brian Chen

Found 18 papers, 9 papers with code

Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval

1 code implementation • 8 Dec 2021 • Nina Shvetsova, Brian Chen, Andrew Rouditchenko, Samuel Thomas, Brian Kingsbury, Rogerio Feris, David Harwath, James Glass, Hilde Kuehne

Multi-modal learning from video data has seen increased attention recently as it allows to train semantically meaningful embeddings without human annotation enabling tasks like zero-shot retrieval and classification.

Action Localization Retrieval +2

Paper
Code

Everything at Once - Multi-Modal Fusion Transformer for Video Retrieval

1 code implementation • CVPR 2022 • Nina Shvetsova, Brian Chen, Andrew Rouditchenko, Samuel Thomas, Brian Kingsbury, Rogerio S. Feris, David Harwath, James Glass, Hilde Kuehne

In this work, we present a multi-modal, modality agnostic fusion transformer that learns to exchange information between multiple modalities, such as video, audio, and text, and integrate them into a fused representation in a joined multi-modal embedding space.

Action Localization Retrieval +2

Paper
Code

AVLnet: Learning Audio-Visual Language Representations from Instructional Videos

1 code implementation • 16 Jun 2020 • Andrew Rouditchenko, Angie Boggust, David Harwath, Brian Chen, Dhiraj Joshi, Samuel Thomas, Kartik Audhkhasi, Hilde Kuehne, Rameswar Panda, Rogerio Feris, Brian Kingsbury, Michael Picheny, Antonio Torralba, James Glass

Further, we propose a tri-modal model that jointly processes raw audio, video, and text captions from videos to learn a multi-modal semantic embedding space useful for text-video retrieval.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

Paper
Code

Cascaded Multilingual Audio-Visual Learning from Videos

1 code implementation • 8 Nov 2021 • Andrew Rouditchenko, Angie Boggust, David Harwath, Samuel Thomas, Hilde Kuehne, Brian Chen, Rameswar Panda, Rogerio Feris, Brian Kingsbury, Michael Picheny, James Glass

In this paper, we explore self-supervised audio-visual models that learn from instructional videos.

audio-visual learning Retrieval

Paper
Code

RESIN: A Dockerized Schema-Guided Cross-document Cross-lingual Cross-media Information Extraction and Event Tracking System

1 code implementation • NAACL 2021 • Haoyang Wen, Ying Lin, Tuan Lai, Xiaoman Pan, Sha Li, Xudong Lin, Ben Zhou, Manling Li, Haoyu Wang, Hongming Zhang, Xiaodong Yu, Alexander Dong, Zhenhailong Wang, Yi Fung, Piyush Mishra, Qing Lyu, D{\'\i}dac Sur{\'\i}s, Brian Chen, Susan Windisch Brown, Martha Palmer, Chris Callison-Burch, Carl Vondrick, Jiawei Han, Dan Roth, Shih-Fu Chang, Heng Ji

We present a new information extraction system that can automatically construct temporal event graphs from a collection of news documents from multiple sources, multiple languages (English and Spanish for our experiment), and multiple data modalities (speech, text, image and video).

coreference-resolution Event Extraction +1

Paper
Code

Multi-level Multimodal Common Semantic Space for Image-Phrase Grounding

1 code implementation • CVPR 2019 • Hassan Akbari, Svebor Karaman, Surabhi Bhargava, Brian Chen, Carl Vondrick, Shih-Fu Chang

Following dedicated non-linear mappings for visual features at each level, word, and sentence embeddings, we obtain multiple instantiations of our common semantic space in which comparisons between any target text and the visual content is performed with cosine similarity.

Ranked #1 on Phrase Grounding on ReferIt

Language Modelling Phrase Grounding +2

Paper
Code

Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos

1 code implementation • ICCV 2021 • Brian Chen, Andrew Rouditchenko, Kevin Duarte, Hilde Kuehne, Samuel Thomas, Angie Boggust, Rameswar Panda, Brian Kingsbury, Rogerio Feris, David Harwath, James Glass, Michael Picheny, Shih-Fu Chang

Multimodal self-supervised learning is getting more and more attention as it allows not only to train large networks without human supervision but also to search and retrieve data across various modalities.

Ranked #4 on Long Video Retrieval (Background Removed) on YouCook2

Clustering Contrastive Learning +6

Paper
Code

EgoTV: Egocentric Task Verification from Natural Language Task Descriptions

1 code implementation • ICCV 2023 • Rishi Hazra, Brian Chen, Akshara Rai, Nitin Kamra, Ruta Desai

The goal in EgoTV is to verify the execution of tasks from egocentric videos based on the natural language description of these tasks.

Paper
Code

Weakly-Supervised Temporal Article Grounding

1 code implementation • 22 Oct 2022 • Long Chen, Yulei Niu, Brian Chen, Xudong Lin, Guangxing Han, Christopher Thomas, Hammad Ayyubi, Heng Ji, Shih-Fu Chang

Specifically, given an article and a relevant video, WSAG aims to localize all ``groundable'' sentences to the video, and these sentences are possibly at different semantic scales.

Natural Language Queries Sentence +1

Paper
Code

General Partial Label Learning via Dual Bipartite Graph Autoencoder

no code implementations • 5 Jan 2020 • Brian Chen, Bo Wu, Alireza Zareian, Hanwang Zhang, Shih-Fu Chang

Compared to the traditional Partial Label Learning (PLL) problem, GPLL relaxes the supervision assumption from instance-level -- a label set partially labels an instance -- to group-level: 1) a label set partially labels a group of instances, where the within-group instance-label link annotations are missing, and 2) cross-group links are allowed -- instances in a group may be partially linked to the label set from another group.

Ranked #1 on Partial Label Learning on MPII Movie Description

Partial Label Learning

Paper
Add Code

GAIA: A Fine-grained Multimedia Knowledge Extraction System

no code implementations • ACL 2020 • Manling Li, Alireza Zareian, Ying Lin, Xiaoman Pan, Spencer Whitehead, Brian Chen, Bo Wu, Heng Ji, Shih-Fu Chang, Clare Voss, Daniel Napierski, Marjorie Freedman

We present the first comprehensive, open source multimedia knowledge extraction system that takes a massive stream of unstructured, heterogeneous multimedia data from various sources and languages as input, and creates a coherent, structured knowledge base, indexing entities, relations, and events, following a rich, fine-grained ontology.

Paper
Add Code

Meta Variational Monte Carlo

no code implementations • 20 Nov 2020 • Tianchen Zhao, James Stokes, Oliver Knitter, Brian Chen, Shravan Veerapaneni

An identification is found between meta-learning and the problem of determining the ground state of a randomly generated Hamiltonian drawn from a known ensemble.

Meta-Learning Variational Monte Carlo

Paper
Add Code

Joint Multimedia Event Extraction from Video and Article

no code implementations • Findings (EMNLP) 2021 • Brian Chen, Xudong Lin, Christopher Thomas, Manling Li, Shoya Yoshida, Lovish Chum, Heng Ji, Shih-Fu Chang

We introduce the new task of Video MultiMedia Event Extraction (Video M2E2) and propose two novel components to build the first system towards this task.

coreference-resolution Event Coreference Resolution +1

Paper
Add Code

PreViTS: Contrastive Pretraining with Video Tracking Supervision

no code implementations • 1 Dec 2021 • Brian Chen, Ramprasaath R. Selvaraju, Shih-Fu Chang, Juan Carlos Niebles, Nikhil Naik

In this work, we propose PreViTS, an SSL framework that utilizes an unsupervised tracking signal for selecting clips containing the same object, which helps better utilize temporal transformations of objects.

Action Classification Self-Supervised Learning +1

Paper
Add Code

Routing with Self-Attention for Multimodal Capsule Networks

no code implementations • 1 Dec 2021 • Kevin Duarte, Brian Chen, Nina Shvetsova, Andrew Rouditchenko, Samuel Thomas, Alexander Liu, David Harwath, James Glass, Hilde Kuehne, Mubarak Shah

We present a new multimodal capsule network that allows us to leverage the strength of capsules in the context of a multimodal learning framework on large amounts of video data.

Paper
Add Code

Numerical and geometrical aspects of flow-based variational quantum Monte Carlo

no code implementations • 28 Mar 2022 • James Stokes, Brian Chen, Shravan Veerapaneni

This article aims to summarize recent and ongoing efforts to simulate continuous-variable quantum systems using flow-based variational quantum Monte Carlo techniques, focusing for pedagogical purposes on the example of bosons in the field amplitude (quadrature) basis.

Paper
Add Code

Interpretable Graph Convolutional Network of Multi-Modality Brain Imaging for Alzheimer's Disease Diagnosis

no code implementations • 27 Apr 2022 • Houliang Zhou, Lifang He, Yu Zhang, Li Shen, Brian Chen

Identification of brain regions related to the specific neurological disorders are of great importance for biomarker and diagnostic studies.

Paper
Add Code

What, when, and where? -- Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions

no code implementations • 29 Mar 2023 • Brian Chen, Nina Shvetsova, Andrew Rouditchenko, Daniel Kondermann, Samuel Thomas, Shih-Fu Chang, Rogerio Feris, James Glass, Hilde Kuehne

Spatio-temporal grounding describes the task of localizing events in space and time, e. g., in video data, based on verbal descriptions only.

Representation Learning Spatio-Temporal Video Grounding

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.