Search Results for author: Jiwen Zhang

Found 5 papers, 3 papers with code

DELAN: Dual-Level Alignment for Vision-and-Language Navigation by Cross-Modal Contrastive Learning

1 code implementation2 Apr 2024 Mengfei Du, Binhao Wu, Jiwen Zhang, Zhihao Fan, Zejun Li, Ruipu Luo, Xuanjing Huang, Zhongyu Wei

For task completion, the agent needs to align and integrate various navigation modalities, including instruction, observation and navigation history.

Contrastive Learning Decision Making +2

Android in the Zoo: Chain-of-Action-Thought for GUI Agents

1 code implementation5 Mar 2024 Jiwen Zhang, Jihao Wu, Yihua Teng, Minghui Liao, Nuo Xu, Xiao Xiao, Zhongyu Wei, Duyu Tang

To address this, this work presents Chain-of-Action-Thought (dubbed CoAT), which takes the description of the previous actions, the current screen, and more importantly the action thinking of what actions should be performed and the outcomes led by the chosen action.

Language Modelling Large Language Model

Curriculum Learning for Vision-and-Language Navigation

no code implementations NeurIPS 2021 Jiwen Zhang, Zhongyu Wei, Jianqing Fan, Jiajie Peng

Vision-and-Language Navigation (VLN) is a task where an agent navigates in an embodied indoor environment under human instructions.

Vision and Language Navigation

Cannot find the paper you are looking for? You can Submit a new open access paper.