no code implementations • 5 Feb 2024 • William Chen, Oier Mees, Aviral Kumar, Sergey Levine
We find that our policies trained on embeddings extracted from general-purpose VLMs outperform equivalent policies trained on generic, non-promptable image embeddings.
no code implementations • 13 Mar 2023 • Chenguang Huang, Oier Mees, Andy Zeng, Wolfram Burgard
While interacting in the world is a multi-sensory experience, many robots continue to predominantly rely on visual perception to map and navigate in their environments.
1 code implementation • 11 Oct 2022 • Chenguang Huang, Oier Mees, Andy Zeng, Wolfram Burgard
Grounding language to the visual observations of a navigating agent can be performed using off-the-shelf visual-language models pretrained on Internet-scale data (e. g., image captions).
1 code implementation • 4 Oct 2022 • Oier Mees, Jessica Borja-Diaz, Wolfram Burgard
Recent works have shown that Large Language Models (LLMs) can be applied to ground natural language to a wide variety of robot skills.
Ranked #1 on Avg. sequence length on CALVIN
1 code implementation • 19 Sep 2022 • Erick Rosete-Beas, Oier Mees, Gabriel Kalweit, Joschka Boedecker, Wolfram Burgard
Concretely, we combine a low-level policy that learns latent skills via imitation learning and a high-level policy learned from offline reinforcement learning for skill-chaining the latent behavior priors.
2 code implementations • 13 Apr 2022 • Oier Mees, Lukas Hermann, Wolfram Burgard
We have open-sourced our implementation to facilitate future research in learning to perform many complex manipulation skills in a row specified with natural language.
1 code implementation • 1 Mar 2022 • Jessica Borja-Diaz, Oier Mees, Gabriel Kalweit, Lukas Hermann, Joschka Boedecker, Wolfram Burgard
Robots operating in human-centered environments should have the ability to understand how objects function: what can be done with each object, where this interaction may occur, and how the object is used to achieve a goal.
1 code implementation • 6 Dec 2021 • Oier Mees, Lukas Hermann, Erick Rosete-Beas, Wolfram Burgard
We show that a baseline model based on multi-context imitation learning performs poorly on CALVIN, suggesting that there is significant room for developing innovative agents that learn to relate human language to their world models with this benchmark.
2 code implementations • 16 Feb 2021 • Oier Mees, Wolfram Burgard
Controlling robots to perform tasks via natural language is one of the most challenging topics in human-robot interaction.
no code implementations • 2 Aug 2020 • Iman Nematollahi, Oier Mees, Lukas Hermann, Wolfram Burgard
A key challenge for an agent learning to interact with the world is to reason about physical properties of objects and to foresee their dynamics under the effect of applied forces.
2 code implementations • 23 Jan 2020 • Oier Mees, Alp Emek, Johan Vertens, Wolfram Burgard
One particular requirement for such robots is that they are able to understand spatial relations and can place objects in accordance with the spatial relations expressed by their user.
1 code implementation • 21 Oct 2019 • Oier Mees, Markus Merklinger, Gabriel Kalweit, Wolfram Burgard
Our method learns a general skill embedding independently from the task context by using an adversarial loss.
1 code implementation • 17 Oct 2019 • Oier Mees, Maxim Tatarchenko, Thomas Brox, Wolfram Burgard
We present a convolutional neural network for joint 3D shape prediction and viewpoint estimation from a single input image.
3D Object Reconstruction From A Single Image 3D Reconstruction +3
1 code implementation • 18 Jul 2017 • Oier Mees, Andreas Eitel, Wolfram Burgard
Object detection is an essential task for autonomous robots operating in dynamic and changing environments.
1 code implementation • 6 Mar 2017 • Oier Mees, Nichola Abdo, Mladen Mazuran, Wolfram Burgard
Human-centered environments are rich with a wide variety of spatial relations between everyday objects.