Search Results for author: Oier Mees

Found 28 papers, 14 papers with code

Multimodal Spatial Language Maps for Robot Navigation and Manipulation

no code implementations7 Jun 2025 Chenguang Huang, Oier Mees, Andy Zeng, Wolfram Burgard

Experiments in simulation and real-world settings demonstrate that our multimodal spatial language maps enable zero-shot spatial and multimodal goal navigation and improve recall by 50% in ambiguous scenarios.

3D Reconstruction Robot Navigation

FAST: Efficient Action Tokenization for Vision-Language-Action Models

no code implementations16 Jan 2025 Karl Pertsch, Kyle Stachowicz, Brian Ichter, Danny Driess, Suraj Nair, Quan Vuong, Oier Mees, Chelsea Finn, Sergey Levine

However, such models require us to choose a tokenization of our continuous action signals, which determines how the discrete symbols predicted by the model map to continuous robot actions.

Vision-Language-Action

Beyond Sight: Finetuning Generalist Robot Policies with Heterogeneous Sensors via Language Grounding

no code implementations8 Jan 2025 Joshua Jones, Oier Mees, Carmelo Sferrazza, Kyle Stachowicz, Pieter Abbeel, Sergey Levine

However, state-of-the-art generalist robot policies are typically trained on large datasets to predict robot actions solely from visual and proprioceptive observations.

Robot Manipulation Text Generation +1

GHIL-Glue: Hierarchical Control with Filtered Subgoal Images

no code implementations26 Oct 2024 Kyle B. Hatch, Ashwin Balakrishna, Oier Mees, Suraj Nair, Seohong Park, Blake Wulfe, Masha Itkina, Benjamin Eysenbach, Sergey Levine, Thomas Kollar, Benjamin Burchfiel

Image and video generative models that are pre-trained on Internet-scale data can greatly increase the generalization capacity of robot learning systems.

Imitation Learning Video Prediction +1

Scaling Robot Policy Learning via Zero-Shot Labeling with Foundation Models

no code implementations23 Oct 2024 Nils Blank, Moritz Reuss, Marcel Rühle, Ömer Erdinç Yağmurlu, Fabian Wenzel, Oier Mees, Rudolf Lioutikov

A central challenge towards developing robots that can relate human language to their perception and actions is the scarcity of natural language annotations in diverse robot datasets.

Diversity

Steering Your Generalists: Improving Robotic Foundation Models via Value Guidance

no code implementations17 Oct 2024 Mitsuhiko Nakamoto, Oier Mees, Aviral Kumar, Sergey Levine

Large, general-purpose robotic policies trained on diverse demonstration datasets have been shown to be remarkably effective both for controlling a variety of robots in a range of different scenes, and for acquiring broad repertoires of manipulation skills.

Offline RL Re-Ranking

The Ingredients for Robotic Diffusion Transformers

no code implementations14 Oct 2024 Sudeep Dasari, Oier Mees, Sebastian Zhao, Mohan Kumar Srirama, Sergey Levine

By combining the results of our investigation with our improved model components, we are able to present a novel architecture, named \method, that significantly outperforms the state of the art in solving long-horizon ($1500+$ time-steps) dexterous tasks on a bi-manual ALOHA robot.

Policy Adaptation via Language Optimization: Decomposing Tasks for Few-Shot Imitation

no code implementations29 Aug 2024 Vivek Myers, Bill Chunyuan Zheng, Oier Mees, Sergey Levine, Kuan Fang

Learned language-conditioned robot policies often struggle to effectively adapt to new real-world tasks even when pre-trained across a diverse set of instructions.

Robot Manipulation

Scaling Cross-Embodied Learning: One Policy for Manipulation, Navigation, Locomotion and Aviation

no code implementations21 Aug 2024 Ria Doshi, Homer Walke, Oier Mees, Sudeep Dasari, Sergey Levine

By training a single policy across many different kinds of robots, a robot learning method can leverage much broader and more diverse datasets, which in turn can lead to better generalization and robustness.

Autonomous Improvement of Instruction Following Skills via Foundation Models

1 code implementation30 Jul 2024 Zhiyuan Zhou, Pranav Atreya, Abraham Lee, Homer Walke, Oier Mees, Sergey Levine

Intelligent instruction-following robots capable of improving from autonomously collected experience have the potential to transform robot learning: instead of collecting costly teleoperated demonstration data, large-scale deployment of fleets of robots can quickly collect larger quantities of autonomous data that can collectively improve their performance.

Image Generation Instruction Following

Robotic Control via Embodied Chain-of-Thought Reasoning

no code implementations11 Jul 2024 Michał Zawalski, William Chen, Karl Pertsch, Oier Mees, Chelsea Finn, Sergey Levine

Recent works on vision-language-action models (VLAs) have shown that the use of large, internet pre-trained vision-language models as the backbone of learned robot policies can substantially improve their robustness and generalization ability.

Vision-Language-Action

Octo: An Open-Source Generalist Robot Policy

no code implementations20 May 2024 Octo Model Team, Dibya Ghosh, Homer Walke, Karl Pertsch, Kevin Black, Oier Mees, Sudeep Dasari, Joey Hejna, Tobias Kreiman, Charles Xu, Jianlan Luo, You Liang Tan, Lawrence Yunliang Chen, Pannag Sanketi, Quan Vuong, Ted Xiao, Dorsa Sadigh, Chelsea Finn, Sergey Levine

In experiments across 9 robotic platforms, we demonstrate that Octo serves as a versatile policy initialization that can be effectively finetuned to new observation and action spaces.

Ranked #3 on Robot Manipulation on SimplerEnv-Widow X (using extra training data)

Robot Manipulation

Vision-Language Models Provide Promptable Representations for Reinforcement Learning

no code implementations5 Feb 2024 William Chen, Oier Mees, Aviral Kumar, Sergey Levine

We find that our policies trained on embeddings from off-the-shelf, general-purpose VLMs outperform equivalent policies trained on generic, non-promptable image embeddings.

Common Sense Reasoning Instruction Following +6

Audio Visual Language Maps for Robot Navigation

no code implementations13 Mar 2023 Chenguang Huang, Oier Mees, Andy Zeng, Wolfram Burgard

While interacting in the world is a multi-sensory experience, many robots continue to predominantly rely on visual perception to map and navigate in their environments.

Navigate Robot Navigation

Visual Language Maps for Robot Navigation

1 code implementation11 Oct 2022 Chenguang Huang, Oier Mees, Andy Zeng, Wolfram Burgard

Grounding language to the visual observations of a navigating agent can be performed using off-the-shelf visual-language models pretrained on Internet-scale data (e. g., image captions).

3D Reconstruction Image Captioning +1

Grounding Language with Visual Affordances over Unstructured Data

1 code implementation4 Oct 2022 Oier Mees, Jessica Borja-Diaz, Wolfram Burgard

Recent works have shown that Large Language Models (LLMs) can be applied to ground natural language to a wide variety of robot skills.

Avg. sequence length Success Rate (5 task-horizon)

Latent Plans for Task-Agnostic Offline Reinforcement Learning

1 code implementation19 Sep 2022 Erick Rosete-Beas, Oier Mees, Gabriel Kalweit, Joschka Boedecker, Wolfram Burgard

Concretely, we combine a low-level policy that learns latent skills via imitation learning and a high-level policy learned from offline reinforcement learning for skill-chaining the latent behavior priors.

Imitation Learning reinforcement-learning +2

What Matters in Language Conditioned Robotic Imitation Learning over Unstructured Data

2 code implementations13 Apr 2022 Oier Mees, Lukas Hermann, Wolfram Burgard

We have open-sourced our implementation to facilitate future research in learning to perform many complex manipulation skills in a row specified with natural language.

Imitation Learning Robot Manipulation

Affordance Learning from Play for Sample-Efficient Policy Learning

1 code implementation1 Mar 2022 Jessica Borja-Diaz, Oier Mees, Gabriel Kalweit, Lukas Hermann, Joschka Boedecker, Wolfram Burgard

Robots operating in human-centered environments should have the ability to understand how objects function: what can be done with each object, where this interaction may occur, and how the object is used to achieve a goal.

Deep Reinforcement Learning Motion Planning +2

CALVIN: A Benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks

1 code implementation6 Dec 2021 Oier Mees, Lukas Hermann, Erick Rosete-Beas, Wolfram Burgard

We show that a baseline model based on multi-context imitation learning performs poorly on CALVIN, suggesting that there is significant room for developing innovative agents that learn to relate human language to their world models with this benchmark.

Continuous Control Imitation Learning +3

Composing Pick-and-Place Tasks By Grounding Language

2 code implementations16 Feb 2021 Oier Mees, Wolfram Burgard

Controlling robots to perform tasks via natural language is one of the most challenging topics in human-robot interaction.

Natural Language Visual Grounding Robotic Grasping +1

Hindsight for Foresight: Unsupervised Structured Dynamics Models from Physical Interaction

no code implementations2 Aug 2020 Iman Nematollahi, Oier Mees, Lukas Hermann, Wolfram Burgard

A key challenge for an agent learning to interact with the world is to reason about physical properties of objects and to foresee their dynamics under the effect of applied forces.

Object Optical Flow Estimation +1

Learning Object Placements For Relational Instructions by Hallucinating Scene Representations

2 code implementations23 Jan 2020 Oier Mees, Alp Emek, Johan Vertens, Wolfram Burgard

One particular requirement for such robots is that they are able to understand spatial relations and can place objects in accordance with the spatial relations expressed by their user.

Auxiliary Learning Robotic Grasping +2

Choosing Smartly: Adaptive Multimodal Fusion for Object Detection in Changing Environments

1 code implementation18 Jul 2017 Oier Mees, Andreas Eitel, Wolfram Burgard

Object detection is an essential task for autonomous robots operating in dynamic and changing environments.

object-detection Object Detection

Metric Learning for Generalizing Spatial Relations to New Objects

1 code implementation6 Mar 2017 Oier Mees, Nichola Abdo, Mladen Mazuran, Wolfram Burgard

Human-centered environments are rich with a wide variety of spatial relations between everyday objects.

Metric Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.