Search Results for author: Fabian Caba Heilbron

Found 35 papers, 18 papers with code

Concept Weaver: Enabling Multi-Concept Fusion in Text-to-Image Models

no code implementations5 Apr 2024 Gihyun Kwon, Simon Jenni, DIngzeyu Li, Joon-Young Lee, Jong Chul Ye, Fabian Caba Heilbron

While there has been significant progress in customizing text-to-image generation models, generating images that combine multiple personalized concepts remains challenging.

Text-to-Image Generation

Scaling Up Video Summarization Pretraining with Large Language Models

no code implementations4 Apr 2024 Dawit Mureja Argaw, Seunghyun Yoon, Fabian Caba Heilbron, Hanieh Deilamsalehy, Trung Bui, Zhaowen Wang, Franck Dernoncourt, Joon Son Chung

Long-form video content constitutes a significant portion of internet traffic, making automated video summarization an essential research problem.

Video Alignment Video Summarization

Long-range Multimodal Pretraining for Movie Understanding

no code implementations ICCV 2023 Dawit Mureja Argaw, Joon-Young Lee, Markus Woodson, In So Kweon, Fabian Caba Heilbron

While great progress has been attained, there is still a need for a pretrained multimodal model that can perform well in the ever-growing set of movie understanding tasks the community has been establishing.

Video-ReTime: Learning Temporally Varying Speediness for Time Remapping

no code implementations11 May 2022 Simon Jenni, Markus Woodson, Fabian Caba Heilbron

Furthermore, we propose an optimization for video re-timing that enables precise control over the target duration and performs more robustly on longer videos than prior methods.

Action Recognition

FitCLIP: Refining Large-Scale Pretrained Image-Text Models for Zero-Shot Video Understanding Tasks

2 code implementations24 Mar 2022 Santiago Castro, Fabian Caba Heilbron

Large-scale pretrained image-text models have shown incredible zero-shot performance in a handful of tasks, including video ones such as action recognition and text-to-video retrieval.

Action Recognition Retrieval +4

MovieCuts: A New Dataset and Benchmark for Cut Type Recognition

1 code implementation12 Sep 2021 Alejandro Pardo, Fabian Caba Heilbron, Juan León Alcázar, Ali Thabet, Bernard Ghanem

Advances in automatic Cut-type recognition can unleash new experiences in the video editing industry, such as movie analysis for education, video re-editing, virtual cinematography, machine-assisted trailer generation, machine-assisted video editing, among others.

Video Editing Vocal Bursts Type Prediction

Learning to Cut by Watching Movies

1 code implementation ICCV 2021 Alejandro Pardo, Fabian Caba Heilbron, Juan León Alcázar, Ali Thabet, Bernard Ghanem

Video content creation keeps growing at an incredible pace; yet, creating engaging stories remains challenging and requires non-trivial video editing expertise.

Contrastive Learning Video Editing

Transcript to Video: Efficient Clip Sequencing from Texts

no code implementations25 Jul 2021 Yu Xiong, Fabian Caba Heilbron, Dahua Lin

To meet the demands for non-experts, we present Transcript-to-Video -- a weakly-supervised framework that uses texts as input to automatically create video sequences from an extensive collection of shots.

Retrieval

Real-time Semantic Segmentation with Fast Attention

1 code implementation7 Jul 2020 Ping Hu, Federico Perazzi, Fabian Caba Heilbron, Oliver Wang, Zhe Lin, Kate Saenko, Stan Sclaroff

The proposed architecture relies on our fast spatial attention, which is a simple yet efficient modification of the popular self-attention mechanism and captures the same rich spatial context at a small fraction of the computational cost, by changing the order of operations.

Real-Time Semantic Segmentation Segmentation

The ActivityNet Large-Scale Activity Recognition Challenge 2018 Summary

no code implementations11 Aug 2018 Bernard Ghanem, Juan Carlos Niebles, Cees Snoek, Fabian Caba Heilbron, Humam Alwassel, Victor Escorcia, Ranjay Krishna, Shyamal Buch, Cuong Duc Dao

The guest tasks focused on complementary aspects of the activity recognition problem at large scale and involved three challenging and recently compiled datasets: the Kinetics-600 dataset from Google DeepMind, the AVA dataset from Berkeley and Google, and the Moments in Time dataset from MIT and IBM Research.

Activity Recognition

Diagnosing Error in Temporal Action Detectors

1 code implementation ECCV 2018 Humam Alwassel, Fabian Caba Heilbron, Victor Escorcia, Bernard Ghanem

Despite the recent progress in video understanding and the continuous rate of improvement in temporal action localization throughout the years, it is still unclear how far (or close?)

Temporal Action Localization Video Understanding

ActivityNet Challenge 2017 Summary

no code implementations22 Oct 2017 Bernard Ghanem, Juan Carlos Niebles, Cees Snoek, Fabian Caba Heilbron, Humam Alwassel, Ranjay Khrisna, Victor Escorcia, Kenji Hata, Shyamal Buch

The ActivityNet Large Scale Activity Recognition Challenge 2017 Summary: results and challenge participants papers.

Activity Recognition

SCC: Semantic Context Cascade for Efficient Action Detection

no code implementations CVPR 2017 Fabian Caba Heilbron, Wayner Barrios, Victor Escorcia, Bernard Ghanem

Despite the recent advances in large-scale video analysis, action detection remains as one of the most challenging unsolved problems in computer vision.

Action Detection

Action Search: Spotting Actions in Videos and Its Application to Temporal Action Localization

1 code implementation ECCV 2018 Humam Alwassel, Fabian Caba Heilbron, Bernard Ghanem

To address this need, we propose the new problem of action spotting in video, which we define as finding a specific action in a video while observing a small portion of that video.

Action Spotting Temporal Action Localization

Fast Temporal Activity Proposals for Efficient Detection of Human Actions in Untrimmed Videos

no code implementations CVPR 2016 Fabian Caba Heilbron, Juan Carlos Niebles, Bernard Ghanem

In many large-scale video analysis scenarios, one is interested in localizing and recognizing human activities that occur in short temporal intervals within long untrimmed videos.

Action Detection Action Recognition +2

Robust Manhattan Frame Estimation From a Single RGB-D Image

no code implementations CVPR 2015 Bernard Ghanem, Ali Thabet, Juan Carlos Niebles, Fabian Caba Heilbron

This paper proposes a new framework for estimating the Manhattan Frame (MF) of an indoor scene from a single RGB-D image.

ActivityNet: A Large-Scale Video Benchmark for Human Activity Understanding

1 code implementation CVPR 2015 Fabian Caba Heilbron, Victor Escorcia, Bernard Ghanem, Juan Carlos Niebles

In spite of many dataset efforts for human action recognition, current computer vision algorithms are still severely limited in terms of the variability and complexity of the actions that they can recognize.

Action Detection Action Recognition +4

Cannot find the paper you are looking for? You can Submit a new open access paper.