1 code implementation • 27 Feb 2025 • Shuming Liu, Chen Zhao, Fatimah Zohra, Mattia Soldan, Alejandro Pardo, Mengmeng Xu, Lama Alssum, Merey Ramazanova, Juan León Alcázar, Anthony Cioppa, Silvio Giancola, Carlos Hinojosa, Bernard Ghanem
Temporal action detection (TAD) is a fundamental video understanding task that aims to identify human actions and localize their temporal boundaries in videos.
no code implementations • 27 May 2024 • Juan C. Pérez, Alejandro Pardo, Mattia Soldan, Hani Itani, Juan Leon-Alcazar, Bernard Ghanem
These results suggest that CLMs can understand the semantics of compressed data when directly operating on the byte streams of files produced by CFFs.
no code implementations • CVPR 2024 • Dawit Mureja Argaw, Mattia Soldan, Alejandro Pardo, Chen Zhao, Fabian Caba Heilbron, Joon Son Chung, Bernard Ghanem
Movie trailers are an essential tool for promoting films and attracting audiences.
1 code implementation • 6 Apr 2023 • Mengmeng Xu, Mattia Soldan, Jialin Gao, Shuming Liu, Juan-Manuel Pérez-Rúa, Bernard Ghanem
To alleviate the boundary ambiguity, we propose to study the video activity localization problem from a denoising perspective.
Ranked #1 on
Video Grounding
on MAD
1 code implementation • ICCV 2023 • Wayner Barrios, Mattia Soldan, Alberto Mario Ceballos-Arroyo, Fabian Caba Heilbron, Bernard Ghanem
In this paper, we propose a method for improving the performance of natural language grounding in long videos by identifying and pruning out non-describable windows.
Ranked #3 on
Natural Language Moment Retrieval
on MAD
Natural Language Moment Retrieval
Natural Language Visual Grounding
+2
1 code implementation • 4 Jul 2022 • Kevin Qinghong Lin, Alex Jinpeng Wang, Mattia Soldan, Michael Wray, Rui Yan, Eric Zhongcong Xu, Difei Gao, RongCheng Tu, Wenzhe Zhao, Weijie Kong, Chengfei Cai, Hongfa Wang, Dima Damen, Bernard Ghanem, Wei Liu, Mike Zheng Shou
In this report, we propose a video-language pretraining (VLP) based solution \cite{kevin2022egovlp} for four Ego4D challenge tasks, including Natural Language Query (NLQ), Moment Query (MQ), Object State Change Classification (OSCC), and PNR Localization (PNR).
2 code implementations • 3 Jun 2022 • Kevin Qinghong Lin, Alex Jinpeng Wang, Mattia Soldan, Michael Wray, Rui Yan, Eric Zhongcong Xu, Difei Gao, RongCheng Tu, Wenzhe Zhao, Weijie Kong, Chengfei Cai, Hongfa Wang, Dima Damen, Bernard Ghanem, Wei Liu, Mike Zheng Shou
Video-Language Pretraining (VLP), which aims to learn transferable representation to advance a wide range of video-text downstream tasks, has recently received increasing attention.
1 code implementation • CVPR 2022 • Mattia Soldan, Alejandro Pardo, Juan León Alcázar, Fabian Caba Heilbron, Chen Zhao, Silvio Giancola, Bernard Ghanem
The recent and increasing interest in video-language research has driven the development of large-scale datasets that enable data-intensive machine learning techniques.
Ranked #4 on
Natural Language Moment Retrieval
on MAD
1 code implementation • 19 Nov 2020 • Mattia Soldan, Mengmeng Xu, Sisi Qu, Jesper Tegner, Bernard Ghanem
Grounding language queries in videos aims at identifying the time interval (or moment) semantically relevant to a language query.
Ranked #1 on
Natural Language Moment Retrieval
on DiDeMo
1 code implementation • 19 Nov 2019 • Riccardo Bonetto, Mattia Soldan, Alberto Lanaro, Simone Milani, Michele Rossi
Smartphones and wearable devices are fast growing technologies that, in conjunction with advances in wireless sensor hardware, are enabling ubiquitous sensing applications.
2 code implementations • 30 Jul 2019 • Victor Escorcia, Mattia Soldan, Josef Sivic, Bernard Ghanem, Bryan Russell
We evaluate our approach on two recently proposed datasets for temporal localization of moments in video with natural language (DiDeMo and Charades-STA) extended to our video corpus moment retrieval setting.