1 code implementation • CVPR 2022 • Yanyuan Qiao, Yuankai Qi, Yicong Hong, Zheng Yu, Peng Wang, Qi Wu
Pre-training has been adopted in a few of recent works for Vision-and-Language Navigation (VLN).
Ranked #4 on Visual Navigation on R2R
1 code implementation • ICCV 2023 • Yanyuan Qiao, Yuankai Qi, Zheng Yu, Jing Liu, Qi Wu
Nevertheless, this poses more challenges than other VLN tasks since it requires agents to infer a navigation plan only based on a short instruction.
1 code implementation • ICCV 2023 • Yanyuan Qiao, Zheng Yu, Qi Wu
The performance of the Vision-and-Language Navigation~(VLN) tasks has witnessed rapid progress recently thanks to the use of large pre-trained vision-and-language models.
no code implementations • 19 Jul 2020 • Yanyuan Qiao, Chaorui Deng, Qi Wu
In this survey, we first examine the state of the art by comparing modern approaches to the problem.
no code implementations • IEEE Transactions on Pattern Analysis and Machine Intelligence 2023 • Yanyuan Qiao, Yuankai Qi, Yicong Hong, Zheng Yu, Peng Wang, and Qi Wu ̊
To address these problems, we present a history-enhanced and order-aware pre-training with the complementing fine-tuning paradigm (HOP+) for VLN.
no code implementations • 30 Oct 2023 • Xiangyu Shi, Yanyuan Qiao, Qi Wu, Lingqiao Liu, Feras Dayoub
Effective object detection in mobile robots is challenged by deployment in diverse and unfamiliar environments.
no code implementations • 20 Mar 2024 • Yanyuan Qiao, Zheng Yu, Longteng Guo, Sihan Chen, Zijia Zhao, Mingzhen Sun, Qi Wu, Jing Liu
The extensive experiments on diverse multimodal benchmarks with competitive performance show the effectiveness of our proposed VL-Mamba and demonstrate the great potential of applying state space models for multimodal learning tasks.
Ranked #54 on Visual Question Answering on MM-Vet