no code implementations • 26 May 2024 • Allen Nie, Ching-An Cheng, Andrey Kolobov, Adith Swaminathan
We study the potential of using large language models (LLMs) as an interactive optimizer for solving maximization problems in a text space using natural language and numerical feedback.
1 code implementation • 16 Feb 2024 • Ruijie Zheng, Ching-An Cheng, Hal Daumé III, Furong Huang, Andrey Kolobov
To do so, we bring a subtle but critical component of LLM training pipelines -- input tokenization via byte pair encoding (BPE) -- to the seemingly distant task of learning skills of variable time span in continuous control domains.
no code implementations • 18 Jan 2024 • Florian Achermann, Thomas Stastny, Bogdan Danciu, Andrey Kolobov, Jen Jen Chung, Roland Siegwart, Nicholas Lawrance
Real-time high-resolution wind predictions are beneficial for various applications including safe manned and unmanned aviation.
1 code implementation • 11 Dec 2023 • Ching-An Cheng, Andrey Kolobov, Dipendra Misra, Allen Nie, Adith Swaminathan
We introduce a new benchmark, LLF-Bench (Learning from Language Feedback Benchmark; pronounced as "elf-bench"), to evaluate the ability of AI agents to interactively learn from natural language feedback and instructions.
no code implementations • 26 Oct 2023 • Huihan Liu, Alice Chen, Yuke Zhu, Adith Swaminathan, Andrey Kolobov, Ching-An Cheng
A key feature of OLAF is its ability to update the robot's visuomotor neural policy based on the verbal feedback to avoid repeating mistakes in the future.
no code implementations • 30 Jun 2023 • Vivek Myers, Andre He, Kuan Fang, Homer Walke, Philippe Hansen-Estruch, Ching-An Cheng, Mihai Jalobeanu, Andrey Kolobov, Anca Dragan, Sergey Levine
Our method achieves robust performance in the real world by learning an embedding from the labeled data that aligns language not to the goal image, but rather to the desired change between the start and goal images that the instruction corresponds to.
no code implementations • 1 Jun 2023 • Sinong Geng, Aldo Pacchiano, Andrey Kolobov, Ching-An Cheng
We propose Heuristic Blending (HUBL), a simple performance-improving technique for a broad class of offline RL algorithms based on value bootstrapping.
no code implementations • 15 Mar 2023 • Garrett Thomas, Ching-An Cheng, Ricky Loynd, Felipe Vieira Frujeri, Vibhav Vineet, Mihai Jalobeanu, Andrey Kolobov
A rich representation is key to general robotic manipulation, but existing approaches to representation learning require large amounts of multimodal demonstrations.
1 code implementation • 15 Aug 2022 • Nolan Wagener, Andrey Kolobov, Felipe Vieira Frujeri, Ricky Loynd, Ching-An Cheng, Matthew Hausknecht
We demonstrate the utility of MoCapAct by using it to train a single hierarchical policy capable of tracking the entire MoCap dataset within dm_control and show the learned low-level component can be re-used to efficiently learn downstream high-level tasks.
1 code implementation • 19 Mar 2022 • R Devon Hjelm, Bogdan Mazoure, Florian Golemo, Samira Ebrahimi Kahou, Pedro Braga, Felipe Frujeri, Mihai Jalobeanu, Andrey Kolobov
A broad challenge of research on generalization for sequential decision-making tasks in interactive environments is designing benchmarks that clearly landmark progress.
no code implementations • NeurIPS 2021 • Ching-An Cheng, Andrey Kolobov, Adith Swaminathan
On the theoretical side, we characterize properties of a good heuristic and its impact on RL acceleration.
1 code implementation • ICLR 2022 • Bogdan Mazoure, Ahmed M. Ahmed, Patrick MacAlpine, R Devon Hjelm, Andrey Kolobov
A highly desirable property of a reinforcement learning (RL) agent -- and a major difficulty for deep RL approaches -- is the ability to generalize policies learned on a few tasks over a high-dimensional observation space to similar tasks not seen during training.
no code implementations • 29 Mar 2021 • Sharada Mohanty, Jyotish Poonganam, Adrien Gaidon, Andrey Kolobov, Blake Wulfe, Dipam Chakraborty, Gražvydas Šemetulskis, João Schapke, Jonas Kubilius, Jurgis Pašukonis, Linas Klimas, Matthew Hausknecht, Patrick MacAlpine, Quang Nhat Tran, Thomas Tumiel, Xiaocheng Tang, Xinwei Chen, Christopher Hesse, Jacob Hilton, William Hebgen Guss, Sahika Genc, John Schulman, Karl Cobbe
We present the design of a centralized benchmark for Reinforcement Learning which can help measure Sample Efficiency and Generalization in Reinforcement Learning by doing end to end evaluation of the training and rollout phases of thousands of user submitted code bases in a scalable way.
1 code implementation • 2020 Conference on Robot Learning 2020 • Florian Achermann, Andrey Kolobov, Debadeepta Dey, Timo Hinzmann, Jen Jen Chung, Roland Siegwart, Nicholas Lawrance
This model is then deployed for fast and accurate online interest point detection.
no code implementations • NeurIPS 2020 • Ching-An Cheng, Andrey Kolobov, Alekh Agarwal
In this paper, we propose the state-wise maximum of the oracle policies' values as a natural baseline to resolve conflicting advice from multiple oracles.
1 code implementation • NeurIPS 2020 • Matteo Turchetta, Andrey Kolobov, Shital Shah, Andreas Krause, Alekh Agarwal
In safety-critical applications, autonomous agents may need to learn in an environment where mistakes can be very costly.
1 code implementation • ICML 2020 • Andrey Kolobov, Sébastien Bubeck, Julian Zimmert
Existing multi-armed bandit (MAB) models make two implicit assumptions: an arm generates a payoff only when it is played, and the agent observes every payoff that is generated.
1 code implementation • NeurIPS 2019 • Andrey Kolobov, Yuval Peres, Cheng Lu, Eric J. Horvitz
From traditional Web search engines to virtual assistants and Web accelerators, services that rely on online information need to continually keep track of remote content changes by explicitly requesting content updates from remote sources (e. g., web pages).
1 code implementation • 24 May 2018 • Iain Guilliard, Richard Rogahn, Jim Piavis, Andrey Kolobov
Small uninhabited aerial vehicles (sUAVs) commonly rely on active propulsion to stay airborne, which limits flight time and range.
Robotics Systems and Control
no code implementations • 3 May 2015 • Christopher H. Lin, Andrey Kolobov, Ece Kamar, Eric Horvitz
Our work subsumes previously studied special cases of metareasoning and shows that in the general case, metareasoning is at most polynomially harder than solving MDPs with any given algorithm that disregards the cost of thinking.