1 code implementation • 27 Nov 2024 • Alejandro Pardo, Fabio Pizzati, Tong Zhang, Alexander Pondaven, Philip Torr, Juan Camilo Perez, Bernard Ghanem
Match-cuts are powerful cinematic tools that create seamless transitions between scenes, delivering strong visual and metaphorical connections.
no code implementations • 19 Nov 2024 • Alejandro Pardo, Jui-Hsien Wang, Bernard Ghanem, Josef Sivic, Bryan Russell, Fabian Caba Heilbron
The objective of this work is to manipulate visual timelines (e. g. a video) through natural language instructions, making complex timeline editing tasks accessible to non-expert or potentially even disabled users.
no code implementations • 27 May 2024 • Juan C. Pérez, Alejandro Pardo, Mattia Soldan, Hani Itani, Juan Leon-Alcazar, Bernard Ghanem
These results suggest that CLMs can understand the semantics of compressed data when directly operating on the byte streams of files produced by CFFs.
no code implementations • 23 Apr 2024 • Merey Ramazanova, Alejandro Pardo, Bernard Ghanem, Motasem Alfarra
Understanding videos that contain multiple modalities is crucial, especially in egocentric videos, where combining various sensory inputs significantly improves tasks like action recognition and moment localization.
no code implementations • CVPR 2024 • Dawit Mureja Argaw, Mattia Soldan, Alejandro Pardo, Chen Zhao, Fabian Caba Heilbron, Joon Son Chung, Bernard Ghanem
Movie trailers are an essential tool for promoting films and attracting audiences.
no code implementations • 21 Jan 2024 • Merey Ramazanova, Alejandro Pardo, Humam Alwassel, Bernard Ghanem
Multimodal video understanding is crucial for analyzing egocentric videos, where integrating multiple sensory signals significantly enhances action recognition and moment localization.
1 code implementation • 10 Apr 2023 • Motasem Alfarra, Hani Itani, Alejandro Pardo, Shyma Alhuwaider, Merey Ramazanova, Juan C. Pérez, Zhipeng Cai, Matthias Müller, Bernard Ghanem
To address this issue, we propose a more realistic evaluation protocol for TTA methods, where data is received in an online fashion from a constant-speed data stream, thereby accounting for the method's adaptation speed.
1 code implementation • CVPR 2022 • Mattia Soldan, Alejandro Pardo, Juan León Alcázar, Fabian Caba Heilbron, Chen Zhao, Silvio Giancola, Bernard Ghanem
The recent and increasing interest in video-language research has driven the development of large-scale datasets that enable data-intensive machine learning techniques.
Ranked #4 on Natural Language Moment Retrieval on MAD
1 code implementation • 12 Sep 2021 • Alejandro Pardo, Fabian Caba Heilbron, Juan León Alcázar, Ali Thabet, Bernard Ghanem
Advances in automatic Cut-type recognition can unleash new experiences in the video editing industry, such as movie analysis for education, video re-editing, virtual cinematography, machine-assisted trailer generation, machine-assisted video editing, among others.
1 code implementation • ICCV 2021 • Alejandro Pardo, Fabian Caba Heilbron, Juan León Alcázar, Ali Thabet, Bernard Ghanem
Video content creation keeps growing at an incredible pace; yet, creating engaging stories remains challenging and requires non-trivial video editing expertise.
no code implementations • 10 Apr 2019 • Alejandro Pardo, Mengmeng Xu, Ali Thabet, Pablo Arbelaez, Bernard Ghanem
We adopt a hybrid supervised learning framework to train the object detector from both these types of annotation.
1 code implementation • 30 Mar 2019 • Alejandro Pardo, Humam Alwassel, Fabian Caba Heilbron, Ali Thabet, Bernard Ghanem
RefineLoc shows competitive results with the state-of-the-art in weakly-supervised temporal localization.
Temporal Localization Weakly Supervised Action Localization +2