no code implementations • 7 Jun 2025 • Chenguang Huang, Oier Mees, Andy Zeng, Wolfram Burgard
Experiments in simulation and real-world settings demonstrate that our multimodal spatial language maps enable zero-shot spatial and multimodal goal navigation and improve recall by 50% in ambiguous scenarios.
no code implementations • 16 Jan 2025 • Karl Pertsch, Kyle Stachowicz, Brian Ichter, Danny Driess, Suraj Nair, Quan Vuong, Oier Mees, Chelsea Finn, Sergey Levine
However, such models require us to choose a tokenization of our continuous action signals, which determines how the discrete symbols predicted by the model map to continuous robot actions.
no code implementations • 8 Jan 2025 • Joshua Jones, Oier Mees, Carmelo Sferrazza, Kyle Stachowicz, Pieter Abbeel, Sergey Levine
However, state-of-the-art generalist robot policies are typically trained on large datasets to predict robot actions solely from visual and proprioceptive observations.
no code implementations • 26 Oct 2024 • Kyle B. Hatch, Ashwin Balakrishna, Oier Mees, Suraj Nair, Seohong Park, Blake Wulfe, Masha Itkina, Benjamin Eysenbach, Sergey Levine, Thomas Kollar, Benjamin Burchfiel
Image and video generative models that are pre-trained on Internet-scale data can greatly increase the generalization capacity of robot learning systems.
no code implementations • 23 Oct 2024 • Nils Blank, Moritz Reuss, Marcel Rühle, Ömer Erdinç Yağmurlu, Fabian Wenzel, Oier Mees, Rudolf Lioutikov
A central challenge towards developing robots that can relate human language to their perception and actions is the scarcity of natural language annotations in diverse robot datasets.
no code implementations • 17 Oct 2024 • Mitsuhiko Nakamoto, Oier Mees, Aviral Kumar, Sergey Levine
Large, general-purpose robotic policies trained on diverse demonstration datasets have been shown to be remarkably effective both for controlling a variety of robots in a range of different scenes, and for acquiring broad repertoires of manipulation skills.
no code implementations • 14 Oct 2024 • Sudeep Dasari, Oier Mees, Sebastian Zhao, Mohan Kumar Srirama, Sergey Levine
By combining the results of our investigation with our improved model components, we are able to present a novel architecture, named \method, that significantly outperforms the state of the art in solving long-horizon ($1500+$ time-steps) dexterous tasks on a bi-manual ALOHA robot.
no code implementations • 29 Aug 2024 • Vivek Myers, Bill Chunyuan Zheng, Oier Mees, Sergey Levine, Kuan Fang
Learned language-conditioned robot policies often struggle to effectively adapt to new real-world tasks even when pre-trained across a diverse set of instructions.
no code implementations • 21 Aug 2024 • Ria Doshi, Homer Walke, Oier Mees, Sudeep Dasari, Sergey Levine
By training a single policy across many different kinds of robots, a robot learning method can leverage much broader and more diverse datasets, which in turn can lead to better generalization and robustness.
1 code implementation • 30 Jul 2024 • Zhiyuan Zhou, Pranav Atreya, Abraham Lee, Homer Walke, Oier Mees, Sergey Levine
Intelligent instruction-following robots capable of improving from autonomously collected experience have the potential to transform robot learning: instead of collecting costly teleoperated demonstration data, large-scale deployment of fleets of robots can quickly collect larger quantities of autonomous data that can collectively improve their performance.
no code implementations • 11 Jul 2024 • Michał Zawalski, William Chen, Karl Pertsch, Oier Mees, Chelsea Finn, Sergey Levine
Recent works on vision-language-action models (VLAs) have shown that the use of large, internet pre-trained vision-language models as the backbone of learned robot policies can substantially improve their robustness and generalization ability.
no code implementations • 20 May 2024 • Octo Model Team, Dibya Ghosh, Homer Walke, Karl Pertsch, Kevin Black, Oier Mees, Sudeep Dasari, Joey Hejna, Tobias Kreiman, Charles Xu, Jianlan Luo, You Liang Tan, Lawrence Yunliang Chen, Pannag Sanketi, Quan Vuong, Ted Xiao, Dorsa Sadigh, Chelsea Finn, Sergey Levine
In experiments across 9 robotic platforms, we demonstrate that Octo serves as a versatile policy initialization that can be effectively finetuned to new observation and action spaces.
Ranked #3 on
Robot Manipulation
on SimplerEnv-Widow X
(using extra training data)
1 code implementation • 9 May 2024 • Xuanlin Li, Kyle Hsu, Jiayuan Gu, Karl Pertsch, Oier Mees, Homer Rich Walke, Chuyuan Fu, Ishikaa Lunawat, Isabel Sieh, Sean Kirmani, Sergey Levine, Jiajun Wu, Chelsea Finn, Hao Su, Quan Vuong, Ted Xiao
We then employ these approaches to create SIMPLER, a collection of simulated environments for manipulation policy evaluation on common real robot setups.
no code implementations • 5 Feb 2024 • William Chen, Oier Mees, Aviral Kumar, Sergey Levine
We find that our policies trained on embeddings from off-the-shelf, general-purpose VLMs outperform equivalent policies trained on generic, non-promptable image embeddings.
no code implementations • 13 Mar 2023 • Chenguang Huang, Oier Mees, Andy Zeng, Wolfram Burgard
While interacting in the world is a multi-sensory experience, many robots continue to predominantly rely on visual perception to map and navigate in their environments.
1 code implementation • 11 Oct 2022 • Chenguang Huang, Oier Mees, Andy Zeng, Wolfram Burgard
Grounding language to the visual observations of a navigating agent can be performed using off-the-shelf visual-language models pretrained on Internet-scale data (e. g., image captions).
1 code implementation • 4 Oct 2022 • Oier Mees, Jessica Borja-Diaz, Wolfram Burgard
Recent works have shown that Large Language Models (LLMs) can be applied to ground natural language to a wide variety of robot skills.
Ranked #1 on
Avg. sequence length
on CALVIN
1 code implementation • 19 Sep 2022 • Erick Rosete-Beas, Oier Mees, Gabriel Kalweit, Joschka Boedecker, Wolfram Burgard
Concretely, we combine a low-level policy that learns latent skills via imitation learning and a high-level policy learned from offline reinforcement learning for skill-chaining the latent behavior priors.
2 code implementations • 13 Apr 2022 • Oier Mees, Lukas Hermann, Wolfram Burgard
We have open-sourced our implementation to facilitate future research in learning to perform many complex manipulation skills in a row specified with natural language.
1 code implementation • 1 Mar 2022 • Jessica Borja-Diaz, Oier Mees, Gabriel Kalweit, Lukas Hermann, Joschka Boedecker, Wolfram Burgard
Robots operating in human-centered environments should have the ability to understand how objects function: what can be done with each object, where this interaction may occur, and how the object is used to achieve a goal.
1 code implementation • 6 Dec 2021 • Oier Mees, Lukas Hermann, Erick Rosete-Beas, Wolfram Burgard
We show that a baseline model based on multi-context imitation learning performs poorly on CALVIN, suggesting that there is significant room for developing innovative agents that learn to relate human language to their world models with this benchmark.
2 code implementations • 16 Feb 2021 • Oier Mees, Wolfram Burgard
Controlling robots to perform tasks via natural language is one of the most challenging topics in human-robot interaction.
no code implementations • 2 Aug 2020 • Iman Nematollahi, Oier Mees, Lukas Hermann, Wolfram Burgard
A key challenge for an agent learning to interact with the world is to reason about physical properties of objects and to foresee their dynamics under the effect of applied forces.
2 code implementations • 23 Jan 2020 • Oier Mees, Alp Emek, Johan Vertens, Wolfram Burgard
One particular requirement for such robots is that they are able to understand spatial relations and can place objects in accordance with the spatial relations expressed by their user.
1 code implementation • 21 Oct 2019 • Oier Mees, Markus Merklinger, Gabriel Kalweit, Wolfram Burgard
Our method learns a general skill embedding independently from the task context by using an adversarial loss.
1 code implementation • 17 Oct 2019 • Oier Mees, Maxim Tatarchenko, Thomas Brox, Wolfram Burgard
We present a convolutional neural network for joint 3D shape prediction and viewpoint estimation from a single input image.
3D Object Reconstruction From A Single Image
3D Reconstruction
+3
1 code implementation • 18 Jul 2017 • Oier Mees, Andreas Eitel, Wolfram Burgard
Object detection is an essential task for autonomous robots operating in dynamic and changing environments.
1 code implementation • 6 Mar 2017 • Oier Mees, Nichola Abdo, Mladen Mazuran, Wolfram Burgard
Human-centered environments are rich with a wide variety of spatial relations between everyday objects.