Search Results for author: Cristian Rodriguez-Opazo

Found 11 papers, 7 papers with code

LocFormer: Enabling Transformers to Perform Temporal Moment Localization on Long Untrimmed Videos With a Feature Sampling Approach

no code implementations19 Dec 2021 Cristian Rodriguez-Opazo, Edison Marrese-Taylor, Basura Fernando, Hiroya Takamura, Qi Wu

We propose LocFormer, a Transformer-based model for video grounding which operates at a constant memory footprint regardless of the video length, i. e. number of frames.

Inductive Bias Video Grounding

Image Retrieval on Real-life Images with Pre-trained Vision-and-Language Models

3 code implementations ICCV 2021 Zheyuan Liu, Cristian Rodriguez-Opazo, Damien Teney, Stephen Gould

We demonstrate that with a relatively simple architecture, CIRPLANT outperforms existing methods on open-domain images, while matching state-of-the-art accuracy on the existing narrow datasets, such as fashion.

Composed Image Retrieval (CoIR) Retrieval +1

Language and Visual Entity Relationship Graph for Agent Navigation

1 code implementation NeurIPS 2020 Yicong Hong, Cristian Rodriguez-Opazo, Yuankai Qi, Qi Wu, Stephen Gould

From both the textual and visual perspectives, we find that the relationships among the scene, its objects, and directional clues are essential for the agent to interpret complex instructions and correctly perceive the environment.

Dynamic Time Warping Navigate +2

Sub-Instruction Aware Vision-and-Language Navigation

1 code implementation EMNLP 2020 Yicong Hong, Cristian Rodriguez-Opazo, Qi Wu, Stephen Gould

Vision-and-language navigation requires an agent to navigate through a real 3D environment following natural language instructions.

Navigate Vision and Language Navigation

Proposal-free Temporal Moment Localization of a Natural-Language Query in Video using Guided Attention

1 code implementation20 Aug 2019 Cristian Rodriguez-Opazo, Edison Marrese-Taylor, Fatemeh Sadat Saleh, Hongdong Li, Stephen Gould

Given an untrimmed video and a sentence as the query, the goal is to determine the starting, and the ending, of the relevant visual moment in the video, that corresponds to the query sentence.

Sentence

Cannot find the paper you are looking for? You can Submit a new open access paper.