no code implementations • 19 Nov 2024 • Alejandro Pardo, Jui-Hsien Wang, Bernard Ghanem, Josef Sivic, Bryan Russell, Fabian Caba Heilbron
The objective of this work is to manipulate visual timelines (e. g. a video) through natural language instructions, making complex timeline editing tasks accessible to non-expert or potentially even disabled users.
no code implementations • 2 Sep 2024 • Ishan Rajendrakumar Dave, Fabian Caba Heilbron, Mubarak Shah, Simon Jenni
Temporal video alignment aims to synchronize the key events like object interactions or action phase transitions in two videos.
no code implementations • 6 May 2024 • Jiacheng Cheng, Hijung Valentina Shin, Nuno Vasconcelos, Bryan Russell, Fabian Caba Heilbron
In this work, we consider the task of paraphrased text-to-image retrieval where a model aims to return similar results given a pair of paraphrased queries.
no code implementations • CVPR 2024 • Gihyun Kwon, Simon Jenni, DIngzeyu Li, Joon-Young Lee, Jong Chul Ye, Fabian Caba Heilbron
While there has been significant progress in customizing text-to-image generation models, generating images that combine multiple personalized concepts remains challenging.
no code implementations • CVPR 2024 • Dawit Mureja Argaw, Seunghyun Yoon, Fabian Caba Heilbron, Hanieh Deilamsalehy, Trung Bui, Zhaowen Wang, Franck Dernoncourt, Joon Son Chung
Long-form video content constitutes a significant portion of internet traffic, making automated video summarization an essential research problem.
no code implementations • CVPR 2024 • Dawit Mureja Argaw, Mattia Soldan, Alejandro Pardo, Chen Zhao, Fabian Caba Heilbron, Joon Son Chung, Bernard Ghanem
Movie trailers are an essential tool for promoting films and attracting audiences.
no code implementations • ICCV 2023 • Dawit Mureja Argaw, Joon-Young Lee, Markus Woodson, In So Kweon, Fabian Caba Heilbron
While great progress has been attained, there is still a need for a pretrained multimodal model that can perform well in the ever-growing set of movie understanding tasks the community has been establishing.
1 code implementation • CVPR 2023 • Chun-Hsiao Yeh, Bryan Russell, Josef Sivic, Fabian Caba Heilbron, Simon Jenni
Large-scale vision-language models (VLM) have shown impressive results for language-guided search applications.
1 code implementation • ICCV 2023 • Wayner Barrios, Mattia Soldan, Alberto Mario Ceballos-Arroyo, Fabian Caba Heilbron, Bernard Ghanem
In this paper, we propose a method for improving the performance of natural language grounding in long videos by identifying and pruning out non-describable windows.
Ranked #3 on Natural Language Moment Retrieval on MAD
Natural Language Moment Retrieval Natural Language Visual Grounding +2
no code implementations • CVPR 2023 • Andrés Villa, Juan León Alcázar, Motasem Alfarra, Kumail Alhamoud, Julio Hurtado, Fabian Caba Heilbron, Alvaro Soto, Bernard Ghanem
In this paper, we address the problem of continual learning for video data.
no code implementations • 22 Nov 2022 • David Chuan-En Lin, Fabian Caba Heilbron, Joon-Young Lee, Oliver Wang, Nikolas Martelaro
This paper investigates the challenge of extracting highlight moments from videos.
no code implementations • 22 Nov 2022 • David Chuan-En Lin, Fabian Caba Heilbron, Joon-Young Lee, Oliver Wang, Nikolas Martelaro
Video editing is a creative and complex endeavor and we believe that there is potential for reimagining a new video editing interface to better support the creative and exploratory nature of video editing.
1 code implementation • 20 Jul 2022 • Dawit Mureja Argaw, Fabian Caba Heilbron, Joon-Young Lee, Markus Woodson, In So Kweon
Machine learning is transforming the video editing industry.
no code implementations • 11 May 2022 • Simon Jenni, Markus Woodson, Fabian Caba Heilbron
Furthermore, we propose an optimization for video re-timing that enables precise control over the target duration and performs more robustly on longer videos than prior methods.
2 code implementations • 24 Mar 2022 • Santiago Castro, Fabian Caba Heilbron
Large-scale pretrained image-text models have shown incredible zero-shot performance in a handful of tasks, including video ones such as action recognition and text-to-video retrieval.
no code implementations • 10 Feb 2022 • Merey Ramazanova, Victor Escorcia, Fabian Caba Heilbron, Chen Zhao, Bernard Ghanem
We validate our approach in two large-scale datasets, EPIC-Kitchens, and HOMAGE.
no code implementations • CVPR 2022 • Andrés Villa, Kumail Alhamoud, Juan León Alcázar, Fabian Caba Heilbron, Victor Escorcia, Bernard Ghanem
We perform in-depth evaluations of existing CL methods in vCLIMB, and observe two unique challenges in video data.
1 code implementation • CVPR 2022 • Mattia Soldan, Alejandro Pardo, Juan León Alcázar, Fabian Caba Heilbron, Chen Zhao, Silvio Giancola, Bernard Ghanem
The recent and increasing interest in video-language research has driven the development of large-scale datasets that enable data-intensive machine learning techniques.
Ranked #4 on Natural Language Moment Retrieval on MAD
1 code implementation • 12 Sep 2021 • Alejandro Pardo, Fabian Caba Heilbron, Juan León Alcázar, Ali Thabet, Bernard Ghanem
Advances in automatic Cut-type recognition can unleash new experiences in the video editing industry, such as movie analysis for education, video re-editing, virtual cinematography, machine-assisted trailer generation, machine-assisted video editing, among others.
1 code implementation • ICCV 2021 • Alejandro Pardo, Fabian Caba Heilbron, Juan León Alcázar, Ali Thabet, Bernard Ghanem
Video content creation keeps growing at an incredible pace; yet, creating engaging stories remains challenging and requires non-trivial video editing expertise.
no code implementations • 25 Jul 2021 • Yu Xiong, Fabian Caba Heilbron, Dahua Lin
To meet the demands for non-experts, we present Transcript-to-Video -- a weakly-supervised framework that uses texts as input to automatically create video sequences from an extensive collection of shots.
1 code implementation • 3 Jun 2021 • Juan Leon Alcazar, Long Mai, Federico Perazzi, Joon-Young Lee, Pablo Arbelaez, Bernard Ghanem, Fabian Caba Heilbron
To showcase the potential of our new dataset, we propose an audiovisual baseline and benchmark for person retrieval.
1 code implementation • ICCV 2021 • Juan León-Alcázar, Fabian Caba Heilbron, Ali Thabet, Bernard Ghanem
Active speaker detection requires a solid integration of multi-modal cues.
Active Speaker Detection Audio-Visual Active Speaker Detection
1 code implementation • 7 Jul 2020 • Ping Hu, Federico Perazzi, Fabian Caba Heilbron, Oliver Wang, Zhe Lin, Kate Saenko, Stan Sclaroff
The proposed architecture relies on our fast spatial attention, which is a simple yet efficient modification of the popular self-attention mechanism and captures the same rich spatial context at a small fraction of the computational cost, by changing the order of operations.
Ranked #32 on Semantic Segmentation on DensePASS
1 code implementation • CVPR 2020 • Juan Leon Alcazar, Fabian Caba Heilbron, Long Mai, Federico Perazzi, Joon-Young Lee, Pablo Arbelaez, Bernard Ghanem
Current methods for active speak er detection focus on modeling short-term audiovisual information from a single speaker.
Active Speaker Detection Audio-Visual Active Speaker Detection
1 code implementation • CVPR 2020 • Ping Hu, Fabian Caba Heilbron, Oliver Wang, Zhe Lin, Stan Sclaroff, Federico Perazzi
We present TDNet, a temporally distributed network designed for fast and accurate video semantic segmentation.
Ranked #2 on Video Semantic Segmentation on Cityscapes val
1 code implementation • 26 Mar 2020 • Marcos Baptista Rios, Roberto J. López-Sastre, Fabian Caba Heilbron, Jan van Gemert, F. Javier Acevedo-Rodríguez, S. Maldonado-Bascón
Our results confirm the problems of the previous evaluation protocols, and suggest that an IA-based protocol is more adequate to the online scenario.
1 code implementation • 22 Mar 2020 • Marcos Baptista Rios, Roberto J. López-Sastre, Fabian Caba Heilbron, Jan van Gemert, Francisco Javier Acevedo-Rodríguez, Saturnino Maldonado-Bascón
The problem of Online Human Behaviour Recognition in untrimmed videos, aka Online Action Detection (OAD), needs to be revisited.
1 code implementation • 30 Mar 2019 • Alejandro Pardo, Humam Alwassel, Fabian Caba Heilbron, Ali Thabet, Bernard Ghanem
RefineLoc shows competitive results with the state-of-the-art in weakly-supervised temporal localization.
Temporal Localization Weakly Supervised Action Localization +2
no code implementations • ECCV 2018 • Fabian Caba Heilbron, Joon-Young Lee, Hailin Jin, Bernard Ghanem
In this paper, we introduce a novel active learning framework for temporal localization that aims to mitigate this data dependency issue.
no code implementations • 11 Aug 2018 • Bernard Ghanem, Juan Carlos Niebles, Cees Snoek, Fabian Caba Heilbron, Humam Alwassel, Victor Escorcia, Ranjay Krishna, Shyamal Buch, Cuong Duc Dao
The guest tasks focused on complementary aspects of the activity recognition problem at large scale and involved three challenging and recently compiled datasets: the Kinetics-600 dataset from Google DeepMind, the AVA dataset from Berkeley and Google, and the Moments in Time dataset from MIT and IBM Research.
1 code implementation • ECCV 2018 • Humam Alwassel, Fabian Caba Heilbron, Victor Escorcia, Bernard Ghanem
Despite the recent progress in video understanding and the continuous rate of improvement in temporal action localization throughout the years, it is still unclear how far (or close?)
no code implementations • 22 Oct 2017 • Bernard Ghanem, Juan Carlos Niebles, Cees Snoek, Fabian Caba Heilbron, Humam Alwassel, Ranjay Khrisna, Victor Escorcia, Kenji Hata, Shyamal Buch
The ActivityNet Large Scale Activity Recognition Challenge 2017 Summary: results and challenge participants papers.
no code implementations • CVPR 2017 • Fabian Caba Heilbron, Wayner Barrios, Victor Escorcia, Bernard Ghanem
Despite the recent advances in large-scale video analysis, action detection remains as one of the most challenging unsolved problems in computer vision.
1 code implementation • ECCV 2018 • Humam Alwassel, Fabian Caba Heilbron, Bernard Ghanem
To address this need, we propose the new problem of action spotting in video, which we define as finding a specific action in a video while observing a small portion of that video.
no code implementations • CVPR 2016 • Fabian Caba Heilbron, Juan Carlos Niebles, Bernard Ghanem
In many large-scale video analysis scenarios, one is interested in localizing and recognizing human activities that occur in short temporal intervals within long untrimmed videos.
1 code implementation • CVPR 2015 • Fabian Caba Heilbron, Victor Escorcia, Bernard Ghanem, Juan Carlos Niebles
In spite of many dataset efforts for human action recognition, current computer vision algorithms are still severely limited in terms of the variability and complexity of the actions that they can recognize.
no code implementations • CVPR 2015 • Bernard Ghanem, Ali Thabet, Juan Carlos Niebles, Fabian Caba Heilbron
This paper proposes a new framework for estimating the Manhattan Frame (MF) of an indoor scene from a single RGB-D image.