Search Results for author: Shoubin Yu

Found 5 papers, 5 papers with code

CREMA: Multimodal Compositional Video Reasoning via Efficient Modular Adaptation and Fusion

1 code implementation • 8 Feb 2024 • Shoubin Yu, Jaehong Yoon, Mohit Bansal

Furthermore, we propose a fusion module designed to compress multimodal queries, maintaining computational efficiency in the LLM while combining additional modalities.

Ranked #1 on Question Answering on SQA3D

Computational Efficiency Optical Flow Estimation +2

Paper
Code

A Simple LLM Framework for Long-Range Video Question-Answering

1 code implementation • 28 Dec 2023 • Ce Zhang, Taixi Lu, Md Mohaiminul Islam, Ziyang Wang, Shoubin Yu, Mohit Bansal, Gedas Bertasius

Furthermore, we show that a specialized prompt that asks the LLM first to summarize the noisy short-term visual captions and then answer a given input question leads to a significant LVQA performance boost.

Ranked #1 on Zero-Shot Video Question Answer on NExT-GQA

Large Language Model Long-range modeling +2

Paper
Code

Self-Chained Image-Language Model for Video Localization and Question Answering

1 code implementation • NeurIPS 2023 • Shoubin Yu, Jaemin Cho, Prateek Yadav, Mohit Bansal

SeViLA framework consists of two modules: Localizer and Answerer, where both are parameter-efficiently fine-tuned from BLIP-2.

Ranked #3 on Zero-Shot Video Question Answer on IntentQA (using extra training data)

Language Modelling Representation Learning +2

159

Paper
Code

Regularity Learning via Explicit Distribution Modeling for Skeletal Video Anomaly Detection

1 code implementation • 7 Dec 2021 • Shoubin Yu, Zhongyin Zhao, Haoshu Fang, Andong Deng, Haisheng Su, Dongliang Wang, Weihao Gan, Cewu Lu, Wei Wu

Different from pixel-based anomaly detection methods, pose-based methods utilize highly-structured skeleton data, which decreases the computational burden and also avoids the negative impact of background noise.

Anomaly Detection In Surveillance Videos Optical Flow Estimation +1

Paper
Code

STAR: A Benchmark for Situated Reasoning in Real-World Videos

1 code implementation • NeurIPS 2021 • Bo Wu, Shoubin Yu, Zhenfang Chen, Joshua B. Tenenbaum, Chuang Gan

This paper introduces a new benchmark that evaluates the situated reasoning ability via situation abstraction and logic-grounded question answering for real-world videos, called Situated Reasoning in Real-World Videos (STAR).

Logical Reasoning Question Answering

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.