no code implementations • 16 Jul 2018 • Guiliang Liu, Oliver Schulte, Wang Zhu, Qingcan Li
An LMUT is learned using a novel on-line algorithm that is well-suited for an active play setting, where the mimic learner observes an ongoing interaction between the neural net and the environment.
1 code implementation • ACL 2020 • Wang Zhu, Hexiang Hu, Jiacheng Chen, Zhiwei Deng, Vihan Jain, Eugene Ie, Fei Sha
To this end, we propose BabyWalk, a new VLN agent that is learned to navigate by decomposing long instructions into shorter ones (BabySteps) and completing them sequentially.
1 code implementation • 9 Nov 2021 • Wang Zhu, Peter Shaw, Tal Linzen, Fei Sha
Neural network models often generalize poorly to mismatched domains or distributions.
no code implementations • CVPR 2023 • Jacob Krantz, Shurjo Banerjee, Wang Zhu, Jason Corso, Peter Anderson, Stefan Lee, Jesse Thomason
We present Iterative Vision-and-Language Navigation (IVLN), a paradigm for evaluating language-guided agents navigating in a persistent environment over time.
1 code implementation • 16 Oct 2022 • Yejia Liu, Wang Zhu, Shaolei Ren
To provide an approximate solution to this problem in the online continual learning setting, we further propose the Global Pseudo-task Simulation (GPS), which mimics future catastrophic forgetting of the current task by permutation.
no code implementations • 26 Oct 2022 • Wang Zhu, Jesse Thomason, Robin Jia
For vision-and-language reasoning tasks, both fully connectionist, end-to-end methods and hybrid, neuro-symbolic methods have achieved high in-distribution performance.
no code implementations • 24 May 2023 • Wang Zhu, Jesse Thomason, Robin Jia
We train a language model (LM) to robustly answer multistep questions by generating and answering sub-questions.
no code implementations • 16 Nov 2023 • Wang Zhu, Alekh Agarwal, Mandar Joshi, Robin Jia, Jesse Thomason, Kristina Toutanova
Pre-processing tools, such as optical character recognition (OCR), can map document image inputs to textual tokens, then large language models (LLMs) can reason over text.
no code implementations • 28 Nov 2023 • Wang Zhu, Ishika Singh, Yuan Huang, Robin Jia, Jesse Thomason
Data augmentation via back-translation is common when pretraining Vision-and-Language Navigation (VLN) models, even though the generated instructions are noisy.
no code implementations • 29 Jan 2024 • Wang Zhu, Doudou Zhang, Baichao Long, Jianli Xiao
Long-term traffic prediction has always been a challenging task due to its dynamic temporal dependencies and complex spatial dependencies.