Search Results for author: Xiaohan Nie

Found 7 papers, 1 papers with code

M-LLM Based Video Frame Selection for Efficient Video Understanding

no code implementations27 Feb 2025 Kai Hu, Feng Gao, Xiaohan Nie, Peng Zhou, Son Tran, Tal Neiman, Lingyun Wang, Mubarak Shah, Raffay Hamid, Bing Yin, Trishul Chilimbi

Empirical results show that the proposed M-LLM video frame selector improves the performances various downstream video Large Language Model (video-LLM) across medium (ActivityNet, NExT-QA) and long (EgoSchema, LongVideoBench) context video question answering benchmarks.

EgoSchema Language Modeling +6

Multimodal Instruction Tuning with Hybrid State Space Models

no code implementations13 Nov 2024 Jianing Zhou, Han Li, Shuai Zhang, Ning Xie, Ruijie Wang, Xiaohan Nie, Sheng Liu, Lingyun Wang

Remarkably, our model enhances inference efficiency for high-resolution images and high-frame-rate videos by about 4 times compared to current models, with efficiency gains increasing as image resolution or video frames rise.

Mamba State Space Models

Depth-Guided Sparse Structure-from-Motion for Movies and TV Shows

1 code implementation CVPR 2022 Sheng Liu, Xiaohan Nie, Raffay Hamid

We demonstrate that our approach: (a) significantly improves the quality of 3-D reconstruction for our small-parallax setting, (b) does not cause any degradation for data with large-parallax, and (c) maintains the generalizability and scalability of geometry-based sparse SfM.

Shot Contrastive Self-Supervised Learning for Scene Boundary Detection

no code implementations CVPR 2021 Shixing Chen, Xiaohan Nie, David Fan, Dongqing Zhang, Vimal Bhat, Raffay Hamid

To assess the effectiveness of ShotCoL on novel applications of scene boundary detection, we take on the problem of finding timestamps in movies and TV episodes where video-ads can be inserted while offering a minimally disruptive viewing experience.

Boundary Detection Contrastive Learning +1

Cross-view Action Modeling, Learning and Recognition

no code implementations CVPR 2014 Jiang wang, Xiaohan Nie, Yin Xia, Ying Wu, Song-Chun Zhu

We present a novel multiview spatio-temporal AND-OR graph (MST-AOG) representation for cross-view action recognition, i. e., the recognition is performed on the video from an unknown and unseen view.

Action Recognition Temporal Action Localization

Cannot find the paper you are looking for? You can Submit a new open access paper.