Search Results for author: Pierre-Louis Guhur

Found 6 papers, 6 papers with code

Language Conditioned Spatial Relation Reasoning for 3D Object Grounding

1 code implementation • 17 Nov 2022 • ShiZhe Chen, Pierre-Louis Guhur, Makarand Tapaswi, Cordelia Schmid, Ivan Laptev

In this work we propose a language-conditioned transformer model for grounding 3D objects and their spatial relations.

Paper
Code

Instruction-driven history-aware policies for robotic manipulations

2 code implementations • 11 Sep 2022 • Pierre-Louis Guhur, ShiZhe Chen, Ricardo Garcia, Makarand Tapaswi, Ivan Laptev, Cordelia Schmid

In human environments, robots are expected to accomplish a variety of manipulation tasks given simple natural language instructions.

Ranked #2 on Robot Manipulation on RLBench (Succ. Rate (10 tasks, 100 demos/task) metric)

Robot Manipulation

Paper
Code

Learning from Unlabeled 3D Environments for Vision-and-Language Navigation

1 code implementation • 24 Aug 2022 • ShiZhe Chen, Pierre-Louis Guhur, Makarand Tapaswi, Cordelia Schmid, Ivan Laptev

Our resulting HM3D-AutoVLN dataset is an order of magnitude larger than existing VLN datasets in terms of navigation environments and instructions.

Ranked #1 on Visual Navigation on SOON Test

Language Modelling Navigate +3

Paper
Code

Think Global, Act Local: Dual-scale Graph Transformer for Vision-and-Language Navigation

1 code implementation • CVPR 2022 • ShiZhe Chen, Pierre-Louis Guhur, Makarand Tapaswi, Cordelia Schmid, Ivan Laptev

To balance the complexity of large action space reasoning and fine-grained language grounding, we dynamically combine a fine-scale encoding over local observations and a coarse-scale encoding on a global map via graph transformers.

Ranked #4 on Visual Navigation on SOON Test

Efficient Exploration Navigate +2

Paper
Code

History Aware Multimodal Transformer for Vision-and-Language Navigation

1 code implementation • NeurIPS 2021 • ShiZhe Chen, Pierre-Louis Guhur, Cordelia Schmid, Ivan Laptev

Vision-and-language navigation (VLN) aims to build autonomous visual agents that follow instructions and navigate in real scenes.

Ranked #3 on Vision and Language Navigation on RxR

Decision Making Navigate +2

Paper
Code

Airbert: In-domain Pretraining for Vision-and-Language Navigation

2 code implementations • ICCV 2021 • Pierre-Louis Guhur, Makarand Tapaswi, ShiZhe Chen, Ivan Laptev, Cordelia Schmid

Given the scarcity of domain-specific training data and the high diversity of image and language inputs, the generalization of VLN agents to unseen environments remains challenging.

Ranked #3 on Vision and Language Navigation on VLN Challenge

Navigate Referring Expression +1

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.