Search Results for author: Ching-An Cheng

Found 39 papers, 11 papers with code

Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences

no code implementations4 Apr 2024 Corby Rosset, Ching-An Cheng, Arindam Mitra, Michael Santacroce, Ahmed Awadallah, Tengyang Xie

In this paper, we introduce Direct Nash Optimization (DNO), a provable and scalable algorithm that marries the simplicity and stability of contrastive learning with theoretical generality from optimizing general preferences.

Contrastive Learning

PRISE: Learning Temporal Action Abstractions as a Sequence Compression Problem

1 code implementation16 Feb 2024 Ruijie Zheng, Ching-An Cheng, Hal Daumé III, Furong Huang, Andrey Kolobov

To do so, we bring a subtle but critical component of LLM training pipelines -- input tokenization via byte pair encoding (BPE) -- to the seemingly distant task of learning skills of variable time span in continuous control domains.

Continuous Control Few-Shot Imitation Learning +2

LLF-Bench: Benchmark for Interactive Learning from Language Feedback

no code implementations11 Dec 2023 Ching-An Cheng, Andrey Kolobov, Dipendra Misra, Allen Nie, Adith Swaminathan

We introduce a new benchmark, LLF-Bench (Learning from Language Feedback Benchmark; pronounced as "elf-bench"), to evaluate the ability of AI agents to interactively learn from natural language feedback and instructions.

Information Retrieval OpenAI Gym

Interactive Robot Learning from Verbal Correction

no code implementations26 Oct 2023 Huihan Liu, Alice Chen, Yuke Zhu, Adith Swaminathan, Andrey Kolobov, Ching-An Cheng

A key feature of OLAF is its ability to update the robot's visuomotor neural policy based on the verbal feedback to avoid repeating mistakes in the future.

Language Modelling Large Language Model

Goal Representations for Instruction Following: A Semi-Supervised Language Interface to Control

no code implementations30 Jun 2023 Vivek Myers, Andre He, Kuan Fang, Homer Walke, Philippe Hansen-Estruch, Ching-An Cheng, Mihai Jalobeanu, Andrey Kolobov, Anca Dragan, Sergey Levine

Our method achieves robust performance in the real world by learning an embedding from the labeled data that aligns language not to the goal image, but rather to the desired change between the start and goal images that the instruction corresponds to.

Instruction Following

Improving Offline RL by Blending Heuristics

no code implementations1 Jun 2023 Sinong Geng, Aldo Pacchiano, Andrey Kolobov, Ching-An Cheng

We propose Heuristic Blending (HUBL), a simple performance-improving technique for a broad class of offline RL algorithms based on value bootstrapping.

D4RL Offline RL

MAHALO: Unifying Offline Reinforcement Learning and Imitation Learning from Observations

1 code implementation30 Mar 2023 Anqi Li, Byron Boots, Ching-An Cheng

We study a new paradigm for sequential decision making, called offline policy learning from observations (PLfO).

Imitation Learning Offline RL +2

PLEX: Making the Most of the Available Data for Robotic Manipulation Pretraining

no code implementations15 Mar 2023 Garrett Thomas, Ching-An Cheng, Ricky Loynd, Felipe Vieira Frujeri, Vibhav Vineet, Mihai Jalobeanu, Andrey Kolobov

A rich representation is key to general robotic manipulation, but existing approaches to representation learning require large amounts of multimodal demonstrations.

Representation Learning

Adversarial Model for Offline Reinforcement Learning

no code implementations NeurIPS 2023 Mohak Bhardwaj, Tengyang Xie, Byron Boots, Nan Jiang, Ching-An Cheng

We propose a novel model-based offline Reinforcement Learning (RL) framework, called Adversarial Model for Offline Reinforcement Learning (ARMOR), which can robustly learn policies to improve upon an arbitrary reference policy regardless of data coverage.

reinforcement-learning Reinforcement Learning (RL)

ARMOR: A Model-based Framework for Improving Arbitrary Baseline Policies with Offline Data

no code implementations8 Nov 2022 Tengyang Xie, Mohak Bhardwaj, Nan Jiang, Ching-An Cheng

We propose a new model-based offline RL framework, called Adversarial Models for Offline Reinforcement Learning (ARMOR), which can robustly learn policies to improve upon an arbitrary baseline policy regardless of data coverage.

Offline RL

MoCapAct: A Multi-Task Dataset for Simulated Humanoid Control

1 code implementation15 Aug 2022 Nolan Wagener, Andrey Kolobov, Felipe Vieira Frujeri, Ricky Loynd, Ching-An Cheng, Matthew Hausknecht

We demonstrate the utility of MoCapAct by using it to train a single hierarchical policy capable of tracking the entire MoCap dataset within dm_control and show the learned low-level component can be re-used to efficiently learn downstream high-level tasks.

Humanoid Control

Hindsight Learning for MDPs with Exogenous Inputs

1 code implementation13 Jul 2022 Sean R. Sinclair, Felipe Frujeri, Ching-An Cheng, Luke Marshall, Hugo Barbalho, Jingling Li, Jennifer Neville, Ishai Menache, Adith Swaminathan

Many resource management problems require sequential decision-making under uncertainty, where the only uncertainty affecting the decision outcomes are exogenous variables outside the control of the decision-maker.

counterfactual Decision Making +3

Provably Efficient Lifelong Reinforcement Learning with Linear Function Approximation

no code implementations1 Jun 2022 Sanae Amani, Lin F. Yang, Ching-An Cheng

We study lifelong reinforcement learning (RL) in a regret minimization setting of linear contextual Markov decision process (MDP), where the agent needs to learn a multi-task policy while solving a streaming sequence of tasks.

4k reinforcement-learning +1

Adversarially Trained Actor Critic for Offline Reinforcement Learning

3 code implementations5 Feb 2022 Ching-An Cheng, Tengyang Xie, Nan Jiang, Alekh Agarwal

We propose Adversarially Trained Actor Critic (ATAC), a new model-free algorithm for offline reinforcement learning (RL) under insufficient data coverage, based on the concept of relative pessimism.

Continuous Control D4RL +3

Safe Reinforcement Learning Using Advantage-Based Intervention

1 code implementation16 Jun 2021 Nolan Wagener, Byron Boots, Ching-An Cheng

We propose a new algorithm, SAILR, that uses an intervention mechanism based on advantage functions to keep the agent safe throughout training and optimizes the agent's policy using off-the-shelf RL algorithms designed for unconstrained MDPs.

reinforcement-learning Reinforcement Learning (RL) +1

Bellman-consistent Pessimism for Offline Reinforcement Learning

no code implementations NeurIPS 2021 Tengyang Xie, Ching-An Cheng, Nan Jiang, Paul Mineiro, Alekh Agarwal

The use of pessimism, when reasoning about datasets lacking exhaustive exploration has recently gained prominence in offline reinforcement learning.

reinforcement-learning Reinforcement Learning (RL)

Cautiously Optimistic Policy Optimization and Exploration with Linear Function Approximation

no code implementations24 Mar 2021 Andrea Zanette, Ching-An Cheng, Alekh Agarwal

Policy optimization methods are popular reinforcement learning algorithms, because their incremental and on-policy nature makes them more stable than the value-based counterparts.

reinforcement-learning Reinforcement Learning (RL)

RMP2: A Structured Composable Policy Class for Robot Learning

no code implementations10 Mar 2021 Anqi Li, Ching-An Cheng, M. Asif Rana, Man Xie, Karl Van Wyk, Nathan Ratliff, Byron Boots

Using RMPflow as a structured policy class in learning has several benefits, such as sufficient expressiveness, the flexibility to inject different levels of prior knowledge as well as the ability to transfer policies between robots.

Computational Efficiency

Explaining Fast Improvement in Online Imitation Learning

no code implementations6 Jul 2020 Xinyan Yan, Byron Boots, Ching-An Cheng

Here policies are optimized by performing online learning on a sequence of loss functions that encourage the learner to mimic expert actions, and if the online learning has no regret, the agent can provably learn an expert-like policy.

Imitation Learning Structured Prediction

Policy Improvement via Imitation of Multiple Oracles

no code implementations NeurIPS 2020 Ching-An Cheng, Andrey Kolobov, Alekh Agarwal

In this paper, we propose the state-wise maximum of the oracle policies' values as a natural baseline to resolve conflicting advice from multiple oracles.

Imitation Learning

Intra Order-preserving Functions for Calibration of Multi-Class Neural Networks

1 code implementation NeurIPS 2020 Amir Rahimi, Amirreza Shaban, Ching-An Cheng, Richard Hartley, Byron Boots

A common approach is to learn a post-hoc calibration function that transforms the output of the original network into calibrated confidence scores while maintaining the network's accuracy.

Continuous Online Learning and New Insights to Online Imitation Learning

no code implementations3 Dec 2019 Jonathan Lee, Ching-An Cheng, Ken Goldberg, Byron Boots

We prove that there is a fundamental equivalence between achieving sublinear dynamic regret in COL and solving certain EPs, and we present a reduction from dynamic regret to both static regret and convergence rate of the associated EP.

Imitation Learning

A Reduction from Reinforcement Learning to No-Regret Online Learning

no code implementations14 Nov 2019 Ching-An Cheng, Remi Tachet des Combes, Byron Boots, Geoff Gordon

We present a reduction from reinforcement learning (RL) to no-regret online learning based on the saddle-point formulation of RL, by which "any" online algorithm with sublinear regret can generate policies with provable performance guarantees.

reinforcement-learning Reinforcement Learning (RL)

Riemannian Motion Policy Fusion through Learnable Lyapunov Function Reshaping

no code implementations7 Oct 2019 Mustafa Mukadam, Ching-An Cheng, Dieter Fox, Byron Boots, Nathan Ratliff

RMPfusion supplements RMPflow with weight functions that can hierarchically reshape the Lyapunov functions of the subtask RMPs according to the current configuration of the robot and environment.

Imitation Learning

Trajectory-wise Control Variates for Variance Reduction in Policy Gradient Methods

no code implementations8 Aug 2019 Ching-An Cheng, Xinyan Yan, Byron Boots

This can be attributed, at least in part, to the high variance in estimating the gradient of the task objective with Monte Carlo methods.

Policy Gradient Methods

An Online Learning Approach to Model Predictive Control

no code implementations24 Feb 2019 Nolan Wagener, Ching-An Cheng, Jacob Sacks, Byron Boots

In this paper, we show that there exists a close connection between MPC and online learning, an abstract theoretical framework for analyzing online decision making in the optimization literature.

Decision Making Model Predictive Control

Online Learning with Continuous Variations: Dynamic Regret and Reductions

no code implementations19 Feb 2019 Ching-An Cheng, Jonathan Lee, Ken Goldberg, Byron Boots

Furthermore, we show for COL a reduction from dynamic regret to both static regret and convergence in the associated EP, allowing us to analyze the dynamic regret of many existing algorithms.

RMPflow: A Computational Graph for Automatic Motion Policy Generation

1 code implementation16 Nov 2018 Ching-An Cheng, Mustafa Mukadam, Jan Issac, Stan Birchfield, Dieter Fox, Byron Boots, Nathan Ratliff

We develop a novel policy synthesis algorithm, RMPflow, based on geometrically consistent transformations of Riemannian Motion Policies (RMPs).

Robotics Systems and Control

Truncated Back-propagation for Bilevel Optimization

2 code implementations25 Oct 2018 Amirreza Shaban, Ching-An Cheng, Nathan Hatch, Byron Boots

Bilevel optimization has been recently revisited for designing and analyzing algorithms in hyperparameter tuning and meta learning tasks.

Bilevel Optimization Meta-Learning

Predictor-Corrector Policy Optimization

1 code implementation15 Oct 2018 Ching-An Cheng, Xinyan Yan, Nathan Ratliff, Byron Boots

We present a predictor-corrector framework, called PicCoLO, that can transform a first-order model-free reinforcement or imitation learning algorithm into a new hybrid method that leverages predictive models to accelerate policy learning.

Imitation Learning

Orthogonally Decoupled Variational Gaussian Processes

1 code implementation NeurIPS 2018 Hugh Salimbeni, Ching-An Cheng, Byron Boots, Marc Deisenroth

It adopts an orthogonal basis in the mean function to model the residues that cannot be learned by the standard coupled approach.

Gaussian Processes Variational Inference

Accelerating Imitation Learning with Predictive Models

no code implementations12 Jun 2018 Ching-An Cheng, Xinyan Yan, Evangelos A. Theodorou, Byron Boots

When the model oracle is learned online, these algorithms can provably accelerate the best known convergence rate up to an order.

Imitation Learning

Fast Policy Learning through Imitation and Reinforcement

no code implementations26 May 2018 Ching-An Cheng, Xinyan Yan, Nolan Wagener, Byron Boots

We show that if the switching time is properly randomized, LOKI can learn to outperform a suboptimal expert and converge faster than running policy gradient from scratch.

Imitation Learning Reinforcement Learning (RL)

Convergence of Value Aggregation for Imitation Learning

no code implementations22 Jan 2018 Ching-An Cheng, Byron Boots

Value aggregation is a general framework for solving imitation learning problems.

Imitation Learning

Variational Inference for Gaussian Process Models with Linear Complexity

no code implementations NeurIPS 2017 Ching-An Cheng, Byron Boots

Furthermore, it yields a variational inference problem that can be solved by stochastic gradient ascent with time and space complexity that is only linear in the number of mean function parameters, regardless of the choice of kernels, likelihoods, and inducing points.

Variational Inference

Imitation Learning for Agile Autonomous Driving

no code implementations21 Sep 2017 Yunpeng Pan, Ching-An Cheng, Kamil Saigol, Keuntaek Lee, Xinyan Yan, Evangelos Theodorou, Byron Boots

We present an end-to-end imitation learning system for agile, off-road autonomous driving using only low-cost on-board sensors.

Robotics

Incremental Variational Sparse Gaussian Process Regression

no code implementations NeurIPS 2016 Ching-An Cheng, Byron Boots

Recent work on scaling up Gaussian process regression (GPR) to large datasets has primarily focused on sparse GPR, which leverages a small set of basis functions to approximate the full Gaussian process during inference.

GPR Incremental Learning +1

Cannot find the paper you are looking for? You can Submit a new open access paper.