Search Results for author: Lluis Castrejon

Found 13 papers, 4 papers with code

HAMMR: HierArchical MultiModal React agents for generic VQA

no code implementations8 Apr 2024 Lluis Castrejon, Thomas Mensink, Howard Zhou, Vittorio Ferrari, Andre Araujo, Jasper Uijlings

We start from a multimodal ReAct-based system and make it hierarchical by enabling our HAMMR agents to call upon other specialized agents.

Optical Character Recognition (OCR) Question Answering +1

Encyclopedic VQA: Visual questions about detailed properties of fine-grained categories

1 code implementation ICCV 2023 Thomas Mensink, Jasper Uijlings, Lluis Castrejon, Arushi Goel, Felipe Cadar, Howard Zhou, Fei Sha, André Araujo, Vittorio Ferrari

Empirically, we show that our dataset poses a hard challenge for large vision+language models as they perform poorly on our dataset: PaLI [14] is state-of-the-art on OK-VQA [37], yet it only achieves 13. 0% accuracy on our dataset.

Question Answering Retrieval +1

INFERNO: Inferring Object-Centric 3D Scene Representations without Supervision

no code implementations29 Sep 2021 Lluis Castrejon, Nicolas Ballas, Aaron Courville

Each object representation defines a localized neural radiance field that is used to generate 2D views of the scene through a differentiable rendering process.

Object Video Object Tracking +1

Hierarchical Video Generation for Complex Data

no code implementations4 Jun 2021 Lluis Castrejon, Nicolas Ballas, Aaron Courville

Inspired by this we propose a hierarchical model for video generation which follows a coarse to fine approach.

Video Generation

SSW-GAN: Scalable Stage-wise Training of Video GANs

no code implementations1 Jan 2021 Lluis Castrejon, Nicolas Ballas, Aaron Courville

Current state-of-the-art generative models for videos have high computational requirements that impede high resolution generations beyond a few frames.

Supervision Accelerates Pre-training in Contrastive Semi-Supervised Learning of Visual Representations

2 code implementations18 Jun 2020 Mahmoud Assran, Nicolas Ballas, Lluis Castrejon, Michael Rabbat

We investigate a strategy for improving the efficiency of contrastive learning of visual representations by leveraging a small amount of supervised information during pre-training.

Contrastive Learning

Improved Conditional VRNNs for Video Prediction

1 code implementation ICCV 2019 Lluis Castrejon, Nicolas Ballas, Aaron Courville

To address this issue, we propose to increase the expressiveness of the latent distributions and to use higher capacity likelihood models.

Video Generation Video Prediction

MovieGraphs: Towards Understanding Human-Centric Situations from Videos

no code implementations CVPR 2018 Paul Vicol, Makarand Tapaswi, Lluis Castrejon, Sanja Fidler

Towards this goal, we introduce a novel dataset called MovieGraphs which provides detailed, graph-based annotations of social situations depicted in movie clips.

Common Sense Reasoning

Annotating Object Instances with a Polygon-RNN

2 code implementations CVPR 2017 Lluis Castrejon, Kaustav Kundu, Raquel Urtasun, Sanja Fidler

We show that our approach speeds up the annotation process by a factor of 4. 7 across all classes in Cityscapes, while achieving 78. 4% agreement in IoU with original ground-truth, matching the typical agreement between human annotators.

Object Segmentation +1

Cross-Modal Scene Networks

no code implementations27 Oct 2016 Yusuf Aytar, Lluis Castrejon, Carl Vondrick, Hamed Pirsiavash, Antonio Torralba

Our experiments suggest that our scene representation can help transfer representations across modalities for retrieval.

Retrieval

Learning Aligned Cross-Modal Representations from Weakly Aligned Data

no code implementations CVPR 2016 Lluis Castrejon, Yusuf Aytar, Carl Vondrick, Hamed Pirsiavash, Antonio Torralba

Our experiments suggest that our scene representation can help transfer representations across modalities for retrieval.

Retrieval

Cannot find the paper you are looking for? You can Submit a new open access paper.