Search Results for author: Shoubin Yu

Found 5 papers, 5 papers with code

CREMA: Multimodal Compositional Video Reasoning via Efficient Modular Adaptation and Fusion

1 code implementation8 Feb 2024 Shoubin Yu, Jaehong Yoon, Mohit Bansal

Furthermore, we propose a fusion module designed to compress multimodal queries, maintaining computational efficiency in the LLM while combining additional modalities.

Computational Efficiency Optical Flow Estimation +2

A Simple LLM Framework for Long-Range Video Question-Answering

1 code implementation28 Dec 2023 Ce Zhang, Taixi Lu, Md Mohaiminul Islam, Ziyang Wang, Shoubin Yu, Mohit Bansal, Gedas Bertasius

Furthermore, we show that a specialized prompt that asks the LLM first to summarize the noisy short-term visual captions and then answer a given input question leads to a significant LVQA performance boost.

Large Language Model Long-range modeling +2

Self-Chained Image-Language Model for Video Localization and Question Answering

1 code implementation NeurIPS 2023 Shoubin Yu, Jaemin Cho, Prateek Yadav, Mohit Bansal

SeViLA framework consists of two modules: Localizer and Answerer, where both are parameter-efficiently fine-tuned from BLIP-2.

Ranked #3 on Zero-Shot Video Question Answer on IntentQA (using extra training data)

Language Modelling Representation Learning +2

Regularity Learning via Explicit Distribution Modeling for Skeletal Video Anomaly Detection

1 code implementation7 Dec 2021 Shoubin Yu, Zhongyin Zhao, Haoshu Fang, Andong Deng, Haisheng Su, Dongliang Wang, Weihao Gan, Cewu Lu, Wei Wu

Different from pixel-based anomaly detection methods, pose-based methods utilize highly-structured skeleton data, which decreases the computational burden and also avoids the negative impact of background noise.

Anomaly Detection In Surveillance Videos Optical Flow Estimation +1

STAR: A Benchmark for Situated Reasoning in Real-World Videos

1 code implementation NeurIPS 2021 Bo Wu, Shoubin Yu, Zhenfang Chen, Joshua B. Tenenbaum, Chuang Gan

This paper introduces a new benchmark that evaluates the situated reasoning ability via situation abstraction and logic-grounded question answering for real-world videos, called Situated Reasoning in Real-World Videos (STAR).

Logical Reasoning Question Answering

Cannot find the paper you are looking for? You can Submit a new open access paper.