1 code implementation • 17 Nov 2022 • ShiZhe Chen, Pierre-Louis Guhur, Makarand Tapaswi, Cordelia Schmid, Ivan Laptev
In this work we propose a language-conditioned transformer model for grounding 3D objects and their spatial relations.
2 code implementations • 11 Sep 2022 • Pierre-Louis Guhur, ShiZhe Chen, Ricardo Garcia, Makarand Tapaswi, Ivan Laptev, Cordelia Schmid
In human environments, robots are expected to accomplish a variety of manipulation tasks given simple natural language instructions.
Ranked #2 on Robot Manipulation on RLBench (Succ. Rate (10 tasks, 100 demos/task) metric)
1 code implementation • 24 Aug 2022 • ShiZhe Chen, Pierre-Louis Guhur, Makarand Tapaswi, Cordelia Schmid, Ivan Laptev
Our resulting HM3D-AutoVLN dataset is an order of magnitude larger than existing VLN datasets in terms of navigation environments and instructions.
Ranked #1 on Visual Navigation on SOON Test
1 code implementation • CVPR 2022 • ShiZhe Chen, Pierre-Louis Guhur, Makarand Tapaswi, Cordelia Schmid, Ivan Laptev
To balance the complexity of large action space reasoning and fine-grained language grounding, we dynamically combine a fine-scale encoding over local observations and a coarse-scale encoding on a global map via graph transformers.
Ranked #4 on Visual Navigation on SOON Test
1 code implementation • NeurIPS 2021 • ShiZhe Chen, Pierre-Louis Guhur, Cordelia Schmid, Ivan Laptev
Vision-and-language navigation (VLN) aims to build autonomous visual agents that follow instructions and navigate in real scenes.
Ranked #3 on Vision and Language Navigation on RxR
2 code implementations • ICCV 2021 • Pierre-Louis Guhur, Makarand Tapaswi, ShiZhe Chen, Ivan Laptev, Cordelia Schmid
Given the scarcity of domain-specific training data and the high diversity of image and language inputs, the generalization of VLN agents to unseen environments remains challenging.
Ranked #3 on Vision and Language Navigation on VLN Challenge