no code implementations • 27 Feb 2025 • Kai Hu, Feng Gao, Xiaohan Nie, Peng Zhou, Son Tran, Tal Neiman, Lingyun Wang, Mubarak Shah, Raffay Hamid, Bing Yin, Trishul Chilimbi
Empirical results show that the proposed M-LLM video frame selector improves the performances various downstream video Large Language Model (video-LLM) across medium (ActivityNet, NExT-QA) and long (EgoSchema, LongVideoBench) context video question answering benchmarks.
no code implementations • 13 Nov 2024 • Jianing Zhou, Han Li, Shuai Zhang, Ning Xie, Ruijie Wang, Xiaohan Nie, Sheng Liu, Lingyun Wang
Remarkably, our model enhances inference efficiency for high-resolution images and high-frame-rate videos by about 4 times compared to current models, with efficiency gains increasing as image resolution or video frames rise.
1 code implementation • CVPR 2022 • Sheng Liu, Xiaohan Nie, Raffay Hamid
We demonstrate that our approach: (a) significantly improves the quality of 3-D reconstruction for our small-parallax setting, (b) does not cause any degradation for data with large-parallax, and (c) maintains the generalizability and scalability of geometry-based sparse SfM.
no code implementations • CVPR 2023 • Shixing Chen, Chun-Hao Liu, Xiang Hao, Xiaohan Nie, Maxim Arap, Raffay Hamid
However, labeling individual scenes is a time-consuming process.
no code implementations • CVPR 2021 • Shixing Chen, Xiaohan Nie, David Fan, Dongqing Zhang, Vimal Bhat, Raffay Hamid
To assess the effectiveness of ShotCoL on novel applications of scene boundary detection, we take on the problem of finding timestamps in movies and TV episodes where video-ads can be inserted while offering a minimally disruptive viewing experience.
no code implementations • IEEE Winter Conference on Applications of Computer Vision (WACV) 2021 • Xiaohan Nie, Shixing Chen, and Raffay Hamid
We propose a novel framework to register sports-fields as they appear in broadcast sports videos.
no code implementations • CVPR 2014 • Jiang wang, Xiaohan Nie, Yin Xia, Ying Wu, Song-Chun Zhu
We present a novel multiview spatio-temporal AND-OR graph (MST-AOG) representation for cross-view action recognition, i. e., the recognition is performed on the video from an unknown and unseen view.