Search Results for author: Fabian Caba Heilbron

Found 35 papers, 18 papers with code

Concept Weaver: Enabling Multi-Concept Fusion in Text-to-Image Models

no code implementations • 5 Apr 2024 • Gihyun Kwon, Simon Jenni, DIngzeyu Li, Joon-Young Lee, Jong Chul Ye, Fabian Caba Heilbron

While there has been significant progress in customizing text-to-image generation models, generating images that combine multiple personalized concepts remains challenging.

Text-to-Image Generation

Paper
Add Code

Scaling Up Video Summarization Pretraining with Large Language Models

no code implementations • 4 Apr 2024 • Dawit Mureja Argaw, Seunghyun Yoon, Fabian Caba Heilbron, Hanieh Deilamsalehy, Trung Bui, Zhaowen Wang, Franck Dernoncourt, Joon Son Chung

Long-form video content constitutes a significant portion of internet traffic, making automated video summarization an essential research problem.

Video Alignment Video Summarization

Paper
Add Code

Towards Automated Movie Trailer Generation

no code implementations • 4 Apr 2024 • Dawit Mureja Argaw, Mattia Soldan, Alejandro Pardo, Chen Zhao, Fabian Caba Heilbron, Joon Son Chung, Bernard Ghanem

Movie trailers are an essential tool for promoting films and attracting audiences.

Machine Translation

Paper
Add Code

Long-range Multimodal Pretraining for Movie Understanding

no code implementations • ICCV 2023 • Dawit Mureja Argaw, Joon-Young Lee, Markus Woodson, In So Kweon, Fabian Caba Heilbron

While great progress has been attained, there is still a need for a pretrained multimodal model that can perform well in the ever-growing set of movie understanding tasks the community has been establishing.

Paper
Add Code

Meta-Personalizing Vision-Language Models to Find Named Instances in Video

1 code implementation • CVPR 2023 • Chun-Hsiao Yeh, Bryan Russell, Josef Sivic, Fabian Caba Heilbron, Simon Jenni

Large-scale vision-language models (VLM) have shown impressive results for language-guided search applications.

Retrieval Word Embeddings

Paper
Code

Localizing Moments in Long Video Via Multimodal Guidance

1 code implementation • ICCV 2023 • Wayner Barrios, Mattia Soldan, Alberto Mario Ceballos-Arroyo, Fabian Caba Heilbron, Bernard Ghanem

In this paper, we propose a method for improving the performance of natural language grounding in long videos by identifying and pruning out non-describable windows.

Ranked #1 on Natural Language Moment Retrieval on MAD

Natural Language Moment Retrieval Natural Language Visual Grounding +2

Paper
Code

PIVOT: Prompting for Video Continual Learning

no code implementations • CVPR 2023 • Andrés Villa, Juan León Alcázar, Motasem Alfarra, Kumail Alhamoud, Julio Hurtado, Fabian Caba Heilbron, Alvaro Soto, Bernard Ghanem

In this paper, we address the problem of continual learning for video data.

Continual Learning

Paper
Add Code

VideoMap: Video Editing in Latent Space

no code implementations • 22 Nov 2022 • David Chuan-En Lin, Fabian Caba Heilbron, Joon-Young Lee, Oliver Wang, Nikolas Martelaro

Video has become a dominant form of media.

Asset Management Video Editing

Paper
Add Code

Videogenic: Video Highlights via Photogenic Moments

no code implementations • 22 Nov 2022 • David Chuan-En Lin, Fabian Caba Heilbron, Joon-Young Lee, Oliver Wang, Nikolas Martelaro

This paper investigates the challenge of extracting highlight moments from videos.

Retrieval

Paper
Add Code

The Anatomy of Video Editing: A Dataset and Benchmark Suite for AI-Assisted Video Editing

1 code implementation • 20 Jul 2022 • Dawit Mureja Argaw, Fabian Caba Heilbron, Joon-Young Lee, Markus Woodson, In So Kweon

Machine learning is transforming the video editing industry.

Anatomy Video Editing

Paper
Code

Video-ReTime: Learning Temporally Varying Speediness for Time Remapping

no code implementations • 11 May 2022 • Simon Jenni, Markus Woodson, Fabian Caba Heilbron

Furthermore, we propose an optimization for video re-timing that enables precise control over the target duration and performs more robustly on longer videos than prior methods.

Action Recognition

Paper
Add Code

FitCLIP: Refining Large-Scale Pretrained Image-Text Models for Zero-Shot Video Understanding Tasks

2 code implementations • 24 Mar 2022 • Santiago Castro, Fabian Caba Heilbron

Large-scale pretrained image-text models have shown incredible zero-shot performance in a handful of tasks, including video ones such as action recognition and text-to-video retrieval.

Action Recognition Retrieval +4

Paper
Code

OWL (Observe, Watch, Listen): Audiovisual Temporal Context for Localizing Actions in Egocentric Videos

no code implementations • 10 Feb 2022 • Merey Ramazanova, Victor Escorcia, Fabian Caba Heilbron, Chen Zhao, Bernard Ghanem

We validate our approach in two large-scale datasets, EPIC-Kitchens, and HOMAGE.

Temporal Action Localization Temporal Localization

Paper
Add Code

vCLIMB: A Novel Video Class Incremental Learning Benchmark

no code implementations • CVPR 2022 • Andrés Villa, Kumail Alhamoud, Juan León Alcázar, Fabian Caba Heilbron, Victor Escorcia, Bernard Ghanem

We perform in-depth evaluations of existing CL methods in vCLIMB, and observe two unique challenges in video data.

Class Incremental Learning Incremental Learning

Paper
Add Code

MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions

1 code implementation • CVPR 2022 • Mattia Soldan, Alejandro Pardo, Juan León Alcázar, Fabian Caba Heilbron, Chen Zhao, Silvio Giancola, Bernard Ghanem

The recent and increasing interest in video-language research has driven the development of large-scale datasets that enable data-intensive machine learning techniques.

Ranked #2 on Natural Language Moment Retrieval on MAD

Moment Retrieval Natural Language Moment Retrieval

134

Paper
Code

MovieCuts: A New Dataset and Benchmark for Cut Type Recognition

1 code implementation • 12 Sep 2021 • Alejandro Pardo, Fabian Caba Heilbron, Juan León Alcázar, Ali Thabet, Bernard Ghanem

Advances in automatic Cut-type recognition can unleash new experiences in the video editing industry, such as movie analysis for education, video re-editing, virtual cinematography, machine-assisted trailer generation, machine-assisted video editing, among others.

Video Editing Vocal Bursts Type Prediction

Paper
Code

Learning to Cut by Watching Movies

1 code implementation • ICCV 2021 • Alejandro Pardo, Fabian Caba Heilbron, Juan León Alcázar, Ali Thabet, Bernard Ghanem

Video content creation keeps growing at an incredible pace; yet, creating engaging stories remains challenging and requires non-trivial video editing expertise.

Contrastive Learning Video Editing

Paper
Code

Transcript to Video: Efficient Clip Sequencing from Texts

no code implementations • 25 Jul 2021 • Yu Xiong, Fabian Caba Heilbron, Dahua Lin

To meet the demands for non-experts, we present Transcript-to-Video -- a weakly-supervised framework that uses texts as input to automatically create video sequences from an extensive collection of shots.

Retrieval

Paper
Add Code

APES: Audiovisual Person Search in Untrimmed Video

1 code implementation • 3 Jun 2021 • Juan Leon Alcazar, Long Mai, Federico Perazzi, Joon-Young Lee, Pablo Arbelaez, Bernard Ghanem, Fabian Caba Heilbron

To showcase the potential of our new dataset, we propose an audiovisual baseline and benchmark for person retrieval.

Person Retrieval Person Search +3

Paper
Code

MAAS: Multi-modal Assignation for Active Speaker Detection

1 code implementation • ICCV 2021 • Juan León-Alcázar, Fabian Caba Heilbron, Ali Thabet, Bernard Ghanem

Active speaker detection requires a solid integration of multi-modal cues.

Ranked #13 on Audio-Visual Active Speaker Detection on AVA-ActiveSpeaker

Audio-Visual Active Speaker Detection

Paper
Code

Real-time Semantic Segmentation with Fast Attention

1 code implementation • 7 Jul 2020 • Ping Hu, Federico Perazzi, Fabian Caba Heilbron, Oliver Wang, Zhe Lin, Kate Saenko, Stan Sclaroff

The proposed architecture relies on our fast spatial attention, which is a simple yet efficient modification of the popular self-attention mechanism and captures the same rich spatial context at a small fraction of the computational cost, by changing the order of operations.

Ranked #32 on Semantic Segmentation on DensePASS

Real-Time Semantic Segmentation Segmentation

Paper
Code

Active Speakers in Context

1 code implementation • CVPR 2020 • Juan Leon Alcazar, Fabian Caba Heilbron, Long Mai, Federico Perazzi, Joon-Young Lee, Pablo Arbelaez, Bernard Ghanem

Current methods for active speak er detection focus on modeling short-term audiovisual information from a single speaker.

Ranked #15 on Audio-Visual Active Speaker Detection on AVA-ActiveSpeaker

Audio-Visual Active Speaker Detection

Paper
Code

Temporally Distributed Networks for Fast Video Semantic Segmentation

1 code implementation • CVPR 2020 • Ping Hu, Fabian Caba Heilbron, Oliver Wang, Zhe Lin, Stan Sclaroff, Federico Perazzi

We present TDNet, a temporally distributed network designed for fast and accurate video semantic segmentation.

Ranked #2 on Video Semantic Segmentation on Cityscapes val

Knowledge Distillation Real-Time Semantic Segmentation +2

198

Paper
Code

Rethinking Online Action Detection in Untrimmed Videos: A Novel Online Evaluation Protocol

1 code implementation • 26 Mar 2020 • Marcos Baptista Rios, Roberto J. López-Sastre, Fabian Caba Heilbron, Jan van Gemert, F. Javier Acevedo-Rodríguez, S. Maldonado-Bascón

Our results confirm the problems of the previous evaluation protocols, and suggest that an IA-based protocol is more adequate to the online scenario.

Online Action Detection

Paper
Code

The Instantaneous Accuracy: a Novel Metric for the Problem of Online Human Behaviour Recognition in Untrimmed Videos

1 code implementation • 22 Mar 2020 • Marcos Baptista Rios, Roberto J. López-Sastre, Fabian Caba Heilbron, Jan van Gemert, Francisco Javier Acevedo-Rodríguez, Saturnino Maldonado-Bascón

The problem of Online Human Behaviour Recognition in untrimmed videos, aka Online Action Detection (OAD), needs to be revisited.

Online Action Detection

Paper
Code

RefineLoc: Iterative Refinement for Weakly-Supervised Action Localization

1 code implementation • 30 Mar 2019 • Alejandro Pardo, Humam Alwassel, Fabian Caba Heilbron, Ali Thabet, Bernard Ghanem

RefineLoc shows competitive results with the state-of-the-art in weakly-supervised temporal localization.

Temporal Localization Weakly Supervised Action Localization +2

Paper
Code

What do I Annotate Next? An Empirical Study of Active Learning for Action Localization

no code implementations • ECCV 2018 • Fabian Caba Heilbron, Joon-Young Lee, Hailin Jin, Bernard Ghanem

In this paper, we introduce a novel active learning framework for temporal localization that aims to mitigate this data dependency issue.

Active Learning Temporal Action Localization +1

Paper
Add Code

The ActivityNet Large-Scale Activity Recognition Challenge 2018 Summary

no code implementations • 11 Aug 2018 • Bernard Ghanem, Juan Carlos Niebles, Cees Snoek, Fabian Caba Heilbron, Humam Alwassel, Victor Escorcia, Ranjay Krishna, Shyamal Buch, Cuong Duc Dao

The guest tasks focused on complementary aspects of the activity recognition problem at large scale and involved three challenging and recently compiled datasets: the Kinetics-600 dataset from Google DeepMind, the AVA dataset from Berkeley and Google, and the Moments in Time dataset from MIT and IBM Research.

Activity Recognition

Paper
Add Code

Diagnosing Error in Temporal Action Detectors

1 code implementation • ECCV 2018 • Humam Alwassel, Fabian Caba Heilbron, Victor Escorcia, Bernard Ghanem

Despite the recent progress in video understanding and the continuous rate of improvement in temporal action localization throughout the years, it is still unclear how far (or close?)

Temporal Action Localization Video Understanding

Paper
Code

ActivityNet Challenge 2017 Summary

no code implementations • 22 Oct 2017 • Bernard Ghanem, Juan Carlos Niebles, Cees Snoek, Fabian Caba Heilbron, Humam Alwassel, Ranjay Khrisna, Victor Escorcia, Kenji Hata, Shyamal Buch

The ActivityNet Large Scale Activity Recognition Challenge 2017 Summary: results and challenge participants papers.

Activity Recognition

Paper
Add Code

SCC: Semantic Context Cascade for Efficient Action Detection

no code implementations • CVPR 2017 • Fabian Caba Heilbron, Wayner Barrios, Victor Escorcia, Bernard Ghanem

Despite the recent advances in large-scale video analysis, action detection remains as one of the most challenging unsolved problems in computer vision.

Action Detection

Paper
Add Code

Action Search: Spotting Actions in Videos and Its Application to Temporal Action Localization

1 code implementation • ECCV 2018 • Humam Alwassel, Fabian Caba Heilbron, Bernard Ghanem

To address this need, we propose the new problem of action spotting in video, which we define as finding a specific action in a video while observing a small portion of that video.

Action Spotting Temporal Action Localization

Paper
Code

Fast Temporal Activity Proposals for Efficient Detection of Human Actions in Untrimmed Videos

no code implementations • CVPR 2016 • Fabian Caba Heilbron, Juan Carlos Niebles, Bernard Ghanem

In many large-scale video analysis scenarios, one is interested in localizing and recognizing human activities that occur in short temporal intervals within long untrimmed videos.

Action Detection Action Recognition +2

Paper
Add Code

Robust Manhattan Frame Estimation From a Single RGB-D Image

no code implementations • CVPR 2015 • Bernard Ghanem, Ali Thabet, Juan Carlos Niebles, Fabian Caba Heilbron

This paper proposes a new framework for estimating the Manhattan Frame (MF) of an indoor scene from a single RGB-D image.

Paper
Add Code

ActivityNet: A Large-Scale Video Benchmark for Human Activity Understanding

1 code implementation • CVPR 2015 • Fabian Caba Heilbron, Victor Escorcia, Bernard Ghanem, Juan Carlos Niebles

In spite of many dataset efforts for human action recognition, current computer vision algorithms are still severely limited in terms of the variability and complexity of the actions that they can recognize.

Action Detection Action Recognition +4

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.