Search Results for author: Ching-An Cheng

Found 39 papers, 11 papers with code

Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences

no code implementations • 4 Apr 2024 • Corby Rosset, Ching-An Cheng, Arindam Mitra, Michael Santacroce, Ahmed Awadallah, Tengyang Xie

In this paper, we introduce Direct Nash Optimization (DNO), a provable and scalable algorithm that marries the simplicity and stability of contrastive learning with theoretical generality from optimizing general preferences.

Contrastive Learning

Paper
Add Code

PRISE: Learning Temporal Action Abstractions as a Sequence Compression Problem

1 code implementation • 16 Feb 2024 • Ruijie Zheng, Ching-An Cheng, Hal Daumé III, Furong Huang, Andrey Kolobov

To do so, we bring a subtle but critical component of LLM training pipelines -- input tokenization via byte pair encoding (BPE) -- to the seemingly distant task of learning skills of variable time span in continuous control domains.

Continuous Control Few-Shot Imitation Learning +2

Paper
Code

LLF-Bench: Benchmark for Interactive Learning from Language Feedback

no code implementations • 11 Dec 2023 • Ching-An Cheng, Andrey Kolobov, Dipendra Misra, Allen Nie, Adith Swaminathan

We introduce a new benchmark, LLF-Bench (Learning from Language Feedback Benchmark; pronounced as "elf-bench"), to evaluate the ability of AI agents to interactively learn from natural language feedback and instructions.

Information Retrieval OpenAI Gym

Paper
Add Code

Interactive Robot Learning from Verbal Correction

no code implementations • 26 Oct 2023 • Huihan Liu, Alice Chen, Yuke Zhu, Adith Swaminathan, Andrey Kolobov, Ching-An Cheng

A key feature of OLAF is its ability to update the robot's visuomotor neural policy based on the verbal feedback to avoid repeating mistakes in the future.

Language Modelling Large Language Model

Paper
Add Code

Goal Representations for Instruction Following: A Semi-Supervised Language Interface to Control

no code implementations • 30 Jun 2023 • Vivek Myers, Andre He, Kuan Fang, Homer Walke, Philippe Hansen-Estruch, Ching-An Cheng, Mihai Jalobeanu, Andrey Kolobov, Anca Dragan, Sergey Levine

Our method achieves robust performance in the real world by learning an embedding from the labeled data that aligns language not to the goal image, but rather to the desired change between the start and goal images that the instruction corresponds to.

Instruction Following

Paper
Add Code

Improving Offline RL by Blending Heuristics

no code implementations • 1 Jun 2023 • Sinong Geng, Aldo Pacchiano, Andrey Kolobov, Ching-An Cheng

We propose Heuristic Blending (HUBL), a simple performance-improving technique for a broad class of offline RL algorithms based on value bootstrapping.

D4RL Offline RL

Paper
Add Code

MAHALO: Unifying Offline Reinforcement Learning and Imitation Learning from Observations

1 code implementation • 30 Mar 2023 • Anqi Li, Byron Boots, Ching-An Cheng

We study a new paradigm for sequential decision making, called offline policy learning from observations (PLfO).

Imitation Learning Offline RL +2

Paper
Code

PLEX: Making the Most of the Available Data for Robotic Manipulation Pretraining

no code implementations • 15 Mar 2023 • Garrett Thomas, Ching-An Cheng, Ricky Loynd, Felipe Vieira Frujeri, Vibhav Vineet, Mihai Jalobeanu, Andrey Kolobov

A rich representation is key to general robotic manipulation, but existing approaches to representation learning require large amounts of multimodal demonstrations.

Representation Learning

Paper
Add Code

Adversarial Model for Offline Reinforcement Learning

no code implementations • NeurIPS 2023 • Mohak Bhardwaj, Tengyang Xie, Byron Boots, Nan Jiang, Ching-An Cheng

We propose a novel model-based offline Reinforcement Learning (RL) framework, called Adversarial Model for Offline Reinforcement Learning (ARMOR), which can robustly learn policies to improve upon an arbitrary reference policy regardless of data coverage.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Provable Reset-free Reinforcement Learning by No-Regret Reduction

no code implementations • 6 Jan 2023 • Hoai-An Nguyen, Ching-An Cheng

Reinforcement learning (RL) so far has limited real-world applications.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

ARMOR: A Model-based Framework for Improving Arbitrary Baseline Policies with Offline Data

no code implementations • 8 Nov 2022 • Tengyang Xie, Mohak Bhardwaj, Nan Jiang, Ching-An Cheng

We propose a new model-based offline RL framework, called Adversarial Models for Offline Reinforcement Learning (ARMOR), which can robustly learn policies to improve upon an arbitrary baseline policy regardless of data coverage.

Offline RL

Paper
Add Code

MoCapAct: A Multi-Task Dataset for Simulated Humanoid Control

1 code implementation • 15 Aug 2022 • Nolan Wagener, Andrey Kolobov, Felipe Vieira Frujeri, Ricky Loynd, Ching-An Cheng, Matthew Hausknecht

We demonstrate the utility of MoCapAct by using it to train a single hierarchical policy capable of tracking the entire MoCap dataset within dm_control and show the learned low-level component can be re-used to efficiently learn downstream high-level tasks.

Humanoid Control

131

Paper
Code

Hindsight Learning for MDPs with Exogenous Inputs

1 code implementation • 13 Jul 2022 • Sean R. Sinclair, Felipe Frujeri, Ching-An Cheng, Luke Marshall, Hugo Barbalho, Jingling Li, Jennifer Neville, Ishai Menache, Adith Swaminathan

Many resource management problems require sequential decision-making under uncertainty, where the only uncertainty affecting the decision outcomes are exogenous variables outside the control of the decision-maker.

counterfactual Decision Making +3

Paper
Code

Provably Efficient Lifelong Reinforcement Learning with Linear Function Approximation

no code implementations • 1 Jun 2022 • Sanae Amani, Lin F. Yang, Ching-An Cheng

We study lifelong reinforcement learning (RL) in a regret minimization setting of linear contextual Markov decision process (MDP), where the agent needs to learn a multi-task policy while solving a streaming sequence of tasks.

4k reinforcement-learning +1

Paper
Add Code

Adversarially Trained Actor Critic for Offline Reinforcement Learning

3 code implementations • 5 Feb 2022 • Ching-An Cheng, Tengyang Xie, Nan Jiang, Alekh Agarwal

We propose Adversarially Trained Actor Critic (ATAC), a new model-free algorithm for offline reinforcement learning (RL) under insufficient data coverage, based on the concept of relative pessimism.

Continuous Control D4RL +3

Paper
Code

Safe Reinforcement Learning Using Advantage-Based Intervention

1 code implementation • 16 Jun 2021 • Nolan Wagener, Byron Boots, Ching-An Cheng

We propose a new algorithm, SAILR, that uses an intervention mechanism based on advantage functions to keep the agent safe throughout training and optimizes the agent's policy using off-the-shelf RL algorithms designed for unconstrained MDPs.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Code

Bellman-consistent Pessimism for Offline Reinforcement Learning

no code implementations • NeurIPS 2021 • Tengyang Xie, Ching-An Cheng, Nan Jiang, Paul Mineiro, Alekh Agarwal

The use of pessimism, when reasoning about datasets lacking exhaustive exploration has recently gained prominence in offline reinforcement learning.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Heuristic-Guided Reinforcement Learning

no code implementations • NeurIPS 2021 • Ching-An Cheng, Andrey Kolobov, Adith Swaminathan

On the theoretical side, we characterize properties of a good heuristic and its impact on RL acceleration.

Decision Making reinforcement-learning +1

Paper
Add Code

Cautiously Optimistic Policy Optimization and Exploration with Linear Function Approximation

no code implementations • 24 Mar 2021 • Andrea Zanette, Ching-An Cheng, Alekh Agarwal

Policy optimization methods are popular reinforcement learning algorithms, because their incremental and on-policy nature makes them more stable than the value-based counterparts.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

RMP2: A Structured Composable Policy Class for Robot Learning

no code implementations • 10 Mar 2021 • Anqi Li, Ching-An Cheng, M. Asif Rana, Man Xie, Karl Van Wyk, Nathan Ratliff, Byron Boots

Using RMPflow as a structured policy class in learning has several benefits, such as sufficient expressiveness, the flexibility to inject different levels of prior knowledge as well as the ability to transfer policies between robots.

Computational Efficiency

Paper
Add Code

Explaining Fast Improvement in Online Imitation Learning

no code implementations • 6 Jul 2020 • Xinyan Yan, Byron Boots, Ching-An Cheng

Here policies are optimized by performing online learning on a sequence of loss functions that encourage the learner to mimic expert actions, and if the online learning has no regret, the agent can provably learn an expert-like policy.

Imitation Learning Structured Prediction

Paper
Add Code

Policy Improvement via Imitation of Multiple Oracles

no code implementations • NeurIPS 2020 • Ching-An Cheng, Andrey Kolobov, Alekh Agarwal

In this paper, we propose the state-wise maximum of the oracle policies' values as a natural baseline to resolve conflicting advice from multiple oracles.

Imitation Learning

Paper
Add Code

Intra Order-preserving Functions for Calibration of Multi-Class Neural Networks

1 code implementation • NeurIPS 2020 • Amir Rahimi, Amirreza Shaban, Ching-An Cheng, Richard Hartley, Byron Boots

A common approach is to learn a post-hoc calibration function that transforms the output of the original network into calibrated confidence scores while maintaining the network's accuracy.

Paper
Code

Continuous Online Learning and New Insights to Online Imitation Learning

no code implementations • 3 Dec 2019 • Jonathan Lee, Ching-An Cheng, Ken Goldberg, Byron Boots

We prove that there is a fundamental equivalence between achieving sublinear dynamic regret in COL and solving certain EPs, and we present a reduction from dynamic regret to both static regret and convergence rate of the associated EP.

Imitation Learning

Paper
Add Code

A Reduction from Reinforcement Learning to No-Regret Online Learning

no code implementations • 14 Nov 2019 • Ching-An Cheng, Remi Tachet des Combes, Byron Boots, Geoff Gordon

We present a reduction from reinforcement learning (RL) to no-regret online learning based on the saddle-point formulation of RL, by which "any" online algorithm with sublinear regret can generate policies with provable performance guarantees.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Riemannian Motion Policy Fusion through Learnable Lyapunov Function Reshaping

no code implementations • 7 Oct 2019 • Mustafa Mukadam, Ching-An Cheng, Dieter Fox, Byron Boots, Nathan Ratliff

RMPfusion supplements RMPflow with weight functions that can hierarchically reshape the Lyapunov functions of the subtask RMPs according to the current configuration of the robot and environment.

Imitation Learning

Paper
Add Code

Trajectory-wise Control Variates for Variance Reduction in Policy Gradient Methods

no code implementations • 8 Aug 2019 • Ching-An Cheng, Xinyan Yan, Byron Boots

This can be attributed, at least in part, to the high variance in estimating the gradient of the task objective with Monte Carlo methods.

Policy Gradient Methods

Paper
Add Code

An Online Learning Approach to Model Predictive Control

no code implementations • 24 Feb 2019 • Nolan Wagener, Ching-An Cheng, Jacob Sacks, Byron Boots

In this paper, we show that there exists a close connection between MPC and online learning, an abstract theoretical framework for analyzing online decision making in the optimization literature.

Decision Making Model Predictive Control

Paper
Add Code

Online Learning with Continuous Variations: Dynamic Regret and Reductions

no code implementations • 19 Feb 2019 • Ching-An Cheng, Jonathan Lee, Ken Goldberg, Byron Boots

Furthermore, we show for COL a reduction from dynamic regret to both static regret and convergence in the associated EP, allowing us to analyze the dynamic regret of many existing algorithms.

Paper
Add Code

RMPflow: A Computational Graph for Automatic Motion Policy Generation

1 code implementation • 16 Nov 2018 • Ching-An Cheng, Mustafa Mukadam, Jan Issac, Stan Birchfield, Dieter Fox, Byron Boots, Nathan Ratliff

We develop a novel policy synthesis algorithm, RMPflow, based on geometrically consistent transformations of Riemannian Motion Policies (RMPs).

Robotics Systems and Control

Paper
Code

Truncated Back-propagation for Bilevel Optimization

2 code implementations • 25 Oct 2018 • Amirreza Shaban, Ching-An Cheng, Nathan Hatch, Byron Boots

Bilevel optimization has been recently revisited for designing and analyzing algorithms in hyperparameter tuning and meta learning tasks.

Bilevel Optimization Meta-Learning

187

Paper
Code

Predictor-Corrector Policy Optimization

1 code implementation • 15 Oct 2018 • Ching-An Cheng, Xinyan Yan, Nathan Ratliff, Byron Boots

We present a predictor-corrector framework, called PicCoLO, that can transform a first-order model-free reinforcement or imitation learning algorithm into a new hybrid method that leverages predictive models to accelerate policy learning.

Imitation Learning

Paper
Code

Orthogonally Decoupled Variational Gaussian Processes

1 code implementation • NeurIPS 2018 • Hugh Salimbeni, Ching-An Cheng, Byron Boots, Marc Deisenroth

It adopts an orthogonal basis in the mean function to model the residues that cannot be learned by the standard coupled approach.

Gaussian Processes Variational Inference

Paper
Code

Accelerating Imitation Learning with Predictive Models

no code implementations • 12 Jun 2018 • Ching-An Cheng, Xinyan Yan, Evangelos A. Theodorou, Byron Boots

When the model oracle is learned online, these algorithms can provably accelerate the best known convergence rate up to an order.

Imitation Learning

Paper
Add Code

Fast Policy Learning through Imitation and Reinforcement

no code implementations • 26 May 2018 • Ching-An Cheng, Xinyan Yan, Nolan Wagener, Byron Boots

We show that if the switching time is properly randomized, LOKI can learn to outperform a suboptimal expert and converge faster than running policy gradient from scratch.

Imitation Learning Reinforcement Learning (RL)

Paper
Add Code

Convergence of Value Aggregation for Imitation Learning

no code implementations • 22 Jan 2018 • Ching-An Cheng, Byron Boots

Value aggregation is a general framework for solving imitation learning problems.

Imitation Learning

Paper
Add Code

Variational Inference for Gaussian Process Models with Linear Complexity

no code implementations • NeurIPS 2017 • Ching-An Cheng, Byron Boots

Furthermore, it yields a variational inference problem that can be solved by stochastic gradient ascent with time and space complexity that is only linear in the number of mean function parameters, regardless of the choice of kernels, likelihoods, and inducing points.

Variational Inference

Paper
Add Code

Imitation Learning for Agile Autonomous Driving

no code implementations • 21 Sep 2017 • Yunpeng Pan, Ching-An Cheng, Kamil Saigol, Keuntaek Lee, Xinyan Yan, Evangelos Theodorou, Byron Boots

We present an end-to-end imitation learning system for agile, off-road autonomous driving using only low-cost on-board sensors.

Robotics

Paper
Add Code

Incremental Variational Sparse Gaussian Process Regression

no code implementations • NeurIPS 2016 • Ching-An Cheng, Byron Boots

Recent work on scaling up Gaussian process regression (GPR) to large datasets has primarily focused on sparse GPR, which leverages a small set of basis functions to approximate the full Gaussian process during inference.

GPR Incremental Learning +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.