no code implementations • 22 Dec 2023 • Cristian Rodriguez-Opazo, Edison Marrese-Taylor, Ehsan Abbasnejad, Hamed Damirchi, Ignacio M. Jara, Felipe Bravo-Marquez, Anton Van Den Hengel
Contrastive Language-Image Pretraining (CLIP) stands out as a prominent method for image representation learning.
no code implementations • 19 Dec 2021 • Cristian Rodriguez-Opazo, Edison Marrese-Taylor, Basura Fernando, Hiroya Takamura, Qi Wu
We propose LocFormer, a Transformer-based model for video grounding which operates at a constant memory footprint regardless of the video length, i. e. number of frames.
3 code implementations • ICCV 2021 • Zheyuan Liu, Cristian Rodriguez-Opazo, Damien Teney, Stephen Gould
We demonstrate that with a relatively simple architecture, CIRPLANT outperforms existing methods on open-domain images, while matching state-of-the-art accuracy on the existing narrow datasets, such as fashion.
Ranked #10 on Image Retrieval on CIRR
no code implementations • CVPR 2021 • Yicong Hong, Qi Wu, Yuankai Qi, Cristian Rodriguez-Opazo, Stephen Gould
In this paper we propose a recurrent BERT model that is time-aware for use in VLN.
1 code implementation • 26 Nov 2020 • Yicong Hong, Qi Wu, Yuankai Qi, Cristian Rodriguez-Opazo, Stephen Gould
In this paper we propose a recurrent BERT model that is time-aware for use in VLN.
Ranked #7 on Visual Navigation on R2R
1 code implementation • NeurIPS 2020 • Yicong Hong, Cristian Rodriguez-Opazo, Yuankai Qi, Qi Wu, Stephen Gould
From both the textual and visual perspectives, we find that the relationships among the scene, its objects, and directional clues are essential for the agent to interpret complex instructions and correctly perceive the environment.
1 code implementation • 13 Oct 2020 • Cristian Rodriguez-Opazo, Edison Marrese-Taylor, Basura Fernando, Hongdong Li, Stephen Gould
This paper studies the task of temporal moment localization in a long untrimmed video using natural language query.
1 code implementation • 1 Jul 2020 • Yizhak Ben-Shabat, Xin Yu, Fatemeh Sadat Saleh, Dylan Campbell, Cristian Rodriguez-Opazo, Hongdong Li, Stephen Gould
The availability of a large labeled dataset is a key requirement for applying deep learning methods to solve various computer vision tasks.
no code implementations • WS 2020 • Edison Marrese-Taylor, Cristian Rodriguez-Opazo, Jorge A. Balazs, Stephen Gould, Yutaka Matsuo
Despite the recent advances in opinion mining for written reviews, few works have tackled the problem on other sources of reviews.
1 code implementation • EMNLP 2020 • Yicong Hong, Cristian Rodriguez-Opazo, Qi Wu, Stephen Gould
Vision-and-language navigation requires an agent to navigate through a real 3D environment following natural language instructions.
1 code implementation • 20 Aug 2019 • Cristian Rodriguez-Opazo, Edison Marrese-Taylor, Fatemeh Sadat Saleh, Hongdong Li, Stephen Gould
Given an untrimmed video and a sentence as the query, the goal is to determine the starting, and the ending, of the relevant visual moment in the video, that corresponds to the query sentence.