no code implementations • 4 Apr 2024 • Corby Rosset, Ching-An Cheng, Arindam Mitra, Michael Santacroce, Ahmed Awadallah, Tengyang Xie
In this paper, we introduce Direct Nash Optimization (DNO), a provable and scalable algorithm that marries the simplicity and stability of contrastive learning with theoretical generality from optimizing general preferences.
1 code implementation • 16 Feb 2024 • Ruijie Zheng, Ching-An Cheng, Hal Daumé III, Furong Huang, Andrey Kolobov
To do so, we bring a subtle but critical component of LLM training pipelines -- input tokenization via byte pair encoding (BPE) -- to the seemingly distant task of learning skills of variable time span in continuous control domains.
no code implementations • 11 Dec 2023 • Ching-An Cheng, Andrey Kolobov, Dipendra Misra, Allen Nie, Adith Swaminathan
We introduce a new benchmark, LLF-Bench (Learning from Language Feedback Benchmark; pronounced as "elf-bench"), to evaluate the ability of AI agents to interactively learn from natural language feedback and instructions.
no code implementations • 26 Oct 2023 • Huihan Liu, Alice Chen, Yuke Zhu, Adith Swaminathan, Andrey Kolobov, Ching-An Cheng
A key feature of OLAF is its ability to update the robot's visuomotor neural policy based on the verbal feedback to avoid repeating mistakes in the future.
no code implementations • 30 Jun 2023 • Vivek Myers, Andre He, Kuan Fang, Homer Walke, Philippe Hansen-Estruch, Ching-An Cheng, Mihai Jalobeanu, Andrey Kolobov, Anca Dragan, Sergey Levine
Our method achieves robust performance in the real world by learning an embedding from the labeled data that aligns language not to the goal image, but rather to the desired change between the start and goal images that the instruction corresponds to.
no code implementations • 1 Jun 2023 • Sinong Geng, Aldo Pacchiano, Andrey Kolobov, Ching-An Cheng
We propose Heuristic Blending (HUBL), a simple performance-improving technique for a broad class of offline RL algorithms based on value bootstrapping.
1 code implementation • 30 Mar 2023 • Anqi Li, Byron Boots, Ching-An Cheng
We study a new paradigm for sequential decision making, called offline policy learning from observations (PLfO).
no code implementations • 15 Mar 2023 • Garrett Thomas, Ching-An Cheng, Ricky Loynd, Felipe Vieira Frujeri, Vibhav Vineet, Mihai Jalobeanu, Andrey Kolobov
A rich representation is key to general robotic manipulation, but existing approaches to representation learning require large amounts of multimodal demonstrations.
no code implementations • NeurIPS 2023 • Mohak Bhardwaj, Tengyang Xie, Byron Boots, Nan Jiang, Ching-An Cheng
We propose a novel model-based offline Reinforcement Learning (RL) framework, called Adversarial Model for Offline Reinforcement Learning (ARMOR), which can robustly learn policies to improve upon an arbitrary reference policy regardless of data coverage.
no code implementations • 6 Jan 2023 • Hoai-An Nguyen, Ching-An Cheng
Reinforcement learning (RL) so far has limited real-world applications.
no code implementations • 8 Nov 2022 • Tengyang Xie, Mohak Bhardwaj, Nan Jiang, Ching-An Cheng
We propose a new model-based offline RL framework, called Adversarial Models for Offline Reinforcement Learning (ARMOR), which can robustly learn policies to improve upon an arbitrary baseline policy regardless of data coverage.
1 code implementation • 15 Aug 2022 • Nolan Wagener, Andrey Kolobov, Felipe Vieira Frujeri, Ricky Loynd, Ching-An Cheng, Matthew Hausknecht
We demonstrate the utility of MoCapAct by using it to train a single hierarchical policy capable of tracking the entire MoCap dataset within dm_control and show the learned low-level component can be re-used to efficiently learn downstream high-level tasks.
1 code implementation • 13 Jul 2022 • Sean R. Sinclair, Felipe Frujeri, Ching-An Cheng, Luke Marshall, Hugo Barbalho, Jingling Li, Jennifer Neville, Ishai Menache, Adith Swaminathan
Many resource management problems require sequential decision-making under uncertainty, where the only uncertainty affecting the decision outcomes are exogenous variables outside the control of the decision-maker.
no code implementations • 1 Jun 2022 • Sanae Amani, Lin F. Yang, Ching-An Cheng
We study lifelong reinforcement learning (RL) in a regret minimization setting of linear contextual Markov decision process (MDP), where the agent needs to learn a multi-task policy while solving a streaming sequence of tasks.
3 code implementations • 5 Feb 2022 • Ching-An Cheng, Tengyang Xie, Nan Jiang, Alekh Agarwal
We propose Adversarially Trained Actor Critic (ATAC), a new model-free algorithm for offline reinforcement learning (RL) under insufficient data coverage, based on the concept of relative pessimism.
1 code implementation • 16 Jun 2021 • Nolan Wagener, Byron Boots, Ching-An Cheng
We propose a new algorithm, SAILR, that uses an intervention mechanism based on advantage functions to keep the agent safe throughout training and optimizes the agent's policy using off-the-shelf RL algorithms designed for unconstrained MDPs.
no code implementations • NeurIPS 2021 • Tengyang Xie, Ching-An Cheng, Nan Jiang, Paul Mineiro, Alekh Agarwal
The use of pessimism, when reasoning about datasets lacking exhaustive exploration has recently gained prominence in offline reinforcement learning.
no code implementations • NeurIPS 2021 • Ching-An Cheng, Andrey Kolobov, Adith Swaminathan
On the theoretical side, we characterize properties of a good heuristic and its impact on RL acceleration.
no code implementations • 24 Mar 2021 • Andrea Zanette, Ching-An Cheng, Alekh Agarwal
Policy optimization methods are popular reinforcement learning algorithms, because their incremental and on-policy nature makes them more stable than the value-based counterparts.
no code implementations • 10 Mar 2021 • Anqi Li, Ching-An Cheng, M. Asif Rana, Man Xie, Karl Van Wyk, Nathan Ratliff, Byron Boots
Using RMPflow as a structured policy class in learning has several benefits, such as sufficient expressiveness, the flexibility to inject different levels of prior knowledge as well as the ability to transfer policies between robots.
no code implementations • 6 Jul 2020 • Xinyan Yan, Byron Boots, Ching-An Cheng
Here policies are optimized by performing online learning on a sequence of loss functions that encourage the learner to mimic expert actions, and if the online learning has no regret, the agent can provably learn an expert-like policy.
no code implementations • NeurIPS 2020 • Ching-An Cheng, Andrey Kolobov, Alekh Agarwal
In this paper, we propose the state-wise maximum of the oracle policies' values as a natural baseline to resolve conflicting advice from multiple oracles.
1 code implementation • NeurIPS 2020 • Amir Rahimi, Amirreza Shaban, Ching-An Cheng, Richard Hartley, Byron Boots
A common approach is to learn a post-hoc calibration function that transforms the output of the original network into calibrated confidence scores while maintaining the network's accuracy.
no code implementations • 3 Dec 2019 • Jonathan Lee, Ching-An Cheng, Ken Goldberg, Byron Boots
We prove that there is a fundamental equivalence between achieving sublinear dynamic regret in COL and solving certain EPs, and we present a reduction from dynamic regret to both static regret and convergence rate of the associated EP.
no code implementations • 14 Nov 2019 • Ching-An Cheng, Remi Tachet des Combes, Byron Boots, Geoff Gordon
We present a reduction from reinforcement learning (RL) to no-regret online learning based on the saddle-point formulation of RL, by which "any" online algorithm with sublinear regret can generate policies with provable performance guarantees.
no code implementations • 7 Oct 2019 • Mustafa Mukadam, Ching-An Cheng, Dieter Fox, Byron Boots, Nathan Ratliff
RMPfusion supplements RMPflow with weight functions that can hierarchically reshape the Lyapunov functions of the subtask RMPs according to the current configuration of the robot and environment.
no code implementations • 8 Aug 2019 • Ching-An Cheng, Xinyan Yan, Byron Boots
This can be attributed, at least in part, to the high variance in estimating the gradient of the task objective with Monte Carlo methods.
no code implementations • 24 Feb 2019 • Nolan Wagener, Ching-An Cheng, Jacob Sacks, Byron Boots
In this paper, we show that there exists a close connection between MPC and online learning, an abstract theoretical framework for analyzing online decision making in the optimization literature.
no code implementations • 19 Feb 2019 • Ching-An Cheng, Jonathan Lee, Ken Goldberg, Byron Boots
Furthermore, we show for COL a reduction from dynamic regret to both static regret and convergence in the associated EP, allowing us to analyze the dynamic regret of many existing algorithms.
1 code implementation • 16 Nov 2018 • Ching-An Cheng, Mustafa Mukadam, Jan Issac, Stan Birchfield, Dieter Fox, Byron Boots, Nathan Ratliff
We develop a novel policy synthesis algorithm, RMPflow, based on geometrically consistent transformations of Riemannian Motion Policies (RMPs).
Robotics Systems and Control
2 code implementations • 25 Oct 2018 • Amirreza Shaban, Ching-An Cheng, Nathan Hatch, Byron Boots
Bilevel optimization has been recently revisited for designing and analyzing algorithms in hyperparameter tuning and meta learning tasks.
1 code implementation • 15 Oct 2018 • Ching-An Cheng, Xinyan Yan, Nathan Ratliff, Byron Boots
We present a predictor-corrector framework, called PicCoLO, that can transform a first-order model-free reinforcement or imitation learning algorithm into a new hybrid method that leverages predictive models to accelerate policy learning.
1 code implementation • NeurIPS 2018 • Hugh Salimbeni, Ching-An Cheng, Byron Boots, Marc Deisenroth
It adopts an orthogonal basis in the mean function to model the residues that cannot be learned by the standard coupled approach.
no code implementations • 12 Jun 2018 • Ching-An Cheng, Xinyan Yan, Evangelos A. Theodorou, Byron Boots
When the model oracle is learned online, these algorithms can provably accelerate the best known convergence rate up to an order.
no code implementations • 26 May 2018 • Ching-An Cheng, Xinyan Yan, Nolan Wagener, Byron Boots
We show that if the switching time is properly randomized, LOKI can learn to outperform a suboptimal expert and converge faster than running policy gradient from scratch.
no code implementations • 22 Jan 2018 • Ching-An Cheng, Byron Boots
Value aggregation is a general framework for solving imitation learning problems.
no code implementations • NeurIPS 2017 • Ching-An Cheng, Byron Boots
Furthermore, it yields a variational inference problem that can be solved by stochastic gradient ascent with time and space complexity that is only linear in the number of mean function parameters, regardless of the choice of kernels, likelihoods, and inducing points.
no code implementations • 21 Sep 2017 • Yunpeng Pan, Ching-An Cheng, Kamil Saigol, Keuntaek Lee, Xinyan Yan, Evangelos Theodorou, Byron Boots
We present an end-to-end imitation learning system for agile, off-road autonomous driving using only low-cost on-board sensors.
Robotics
no code implementations • NeurIPS 2016 • Ching-An Cheng, Byron Boots
Recent work on scaling up Gaussian process regression (GPR) to large datasets has primarily focused on sparse GPR, which leverages a small set of basis functions to approximate the full Gaussian process during inference.