Search Results for author: Tengyang Xie

Found 24 papers, 6 papers with code

Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data

1 code implementation22 Apr 2024 Fahim Tajwar, Anikait Singh, Archit Sharma, Rafael Rafailov, Jeff Schneider, Tengyang Xie, Stefano Ermon, Chelsea Finn, Aviral Kumar

Our main finding is that, in general, approaches that use on-policy sampling or attempt to push down the likelihood on certain responses (i. e., employ a "negative gradient") outperform offline and maximum likelihood objectives.

Contrastive Learning Reinforcement Learning (RL)

Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences

no code implementations4 Apr 2024 Corby Rosset, Ching-An Cheng, Arindam Mitra, Michael Santacroce, Ahmed Awadallah, Tengyang Xie

In this paper, we introduce Direct Nash Optimization (DNO), a provable and scalable algorithm that marries the simplicity and stability of contrastive learning with theoretical generality from optimizing general preferences.

Contrastive Learning

Towards Principled Representation Learning from Videos for Reinforcement Learning

1 code implementation20 Mar 2024 Dipendra Misra, Akanksha Saran, Tengyang Xie, Alex Lamb, John Langford

We study two types of settings: one where there is iid noise in the observation, and a more challenging setting where there is also the presence of exogenous noise, which is non-iid noise that is temporally correlated, such as the motion of people or cars in the background.

Contrastive Learning reinforcement-learning +1

CounterCurate: Enhancing Physical and Semantic Visio-Linguistic Compositional Reasoning via Counterfactual Examples

1 code implementation20 Feb 2024 Jianrui Zhang, Mu Cai, Tengyang Xie, Yong Jae Lee

We first spotlight the near-chance performance of multimodal models like CLIP and LLaVA in physically grounded compositional reasoning.

counterfactual Data Augmentation +2

Harnessing Density Ratios for Online Reinforcement Learning

no code implementations18 Jan 2024 Philip Amortila, Dylan J. Foster, Nan Jiang, Ayush Sekhari, Tengyang Xie

The theories of offline and online reinforcement learning, despite having evolved in parallel, have begun to show signs of the possibility for a unification, with algorithms and analysis techniques for one setting often having natural counterparts in the other.

Offline RL reinforcement-learning

Adversarial Model for Offline Reinforcement Learning

no code implementations NeurIPS 2023 Mohak Bhardwaj, Tengyang Xie, Byron Boots, Nan Jiang, Ching-An Cheng

We propose a novel model-based offline Reinforcement Learning (RL) framework, called Adversarial Model for Offline Reinforcement Learning (ARMOR), which can robustly learn policies to improve upon an arbitrary reference policy regardless of data coverage.

reinforcement-learning Reinforcement Learning (RL)

ARMOR: A Model-based Framework for Improving Arbitrary Baseline Policies with Offline Data

no code implementations8 Nov 2022 Tengyang Xie, Mohak Bhardwaj, Nan Jiang, Ching-An Cheng

We propose a new model-based offline RL framework, called Adversarial Models for Offline Reinforcement Learning (ARMOR), which can robustly learn policies to improve upon an arbitrary baseline policy regardless of data coverage.

Offline RL

The Role of Coverage in Online Reinforcement Learning

no code implementations9 Oct 2022 Tengyang Xie, Dylan J. Foster, Yu Bai, Nan Jiang, Sham M. Kakade

Coverage conditions -- which assert that the data logging distribution adequately covers the state space -- play a fundamental role in determining the sample complexity of offline reinforcement learning.

Efficient Exploration Offline RL +2

Interaction-Grounded Learning with Action-inclusive Feedback

no code implementations16 Jun 2022 Tengyang Xie, Akanksha Saran, Dylan J. Foster, Lekan Molu, Ida Momennejad, Nan Jiang, Paul Mineiro, John Langford

Consider the problem setting of Interaction-Grounded Learning (IGL), in which a learner's goal is to optimally interact with the environment with no explicit reward to ground its policies.

Brain Computer Interface

Adversarially Trained Actor Critic for Offline Reinforcement Learning

3 code implementations5 Feb 2022 Ching-An Cheng, Tengyang Xie, Nan Jiang, Alekh Agarwal

We propose Adversarially Trained Actor Critic (ATAC), a new model-free algorithm for offline reinforcement learning (RL) under insufficient data coverage, based on the concept of relative pessimism.

Continuous Control D4RL +3

Bellman-consistent Pessimism for Offline Reinforcement Learning

no code implementations NeurIPS 2021 Tengyang Xie, Ching-An Cheng, Nan Jiang, Paul Mineiro, Alekh Agarwal

The use of pessimism, when reasoning about datasets lacking exhaustive exploration has recently gained prominence in offline reinforcement learning.

reinforcement-learning Reinforcement Learning (RL)

Policy Finetuning: Bridging Sample-Efficient Offline and Online Reinforcement Learning

no code implementations NeurIPS 2021 Tengyang Xie, Nan Jiang, Huan Wang, Caiming Xiong, Yu Bai

This offline result is the first that matches the sample complexity lower bound in this setting, and resolves a recent open question in offline RL.

Offline RL Open-Ended Question Answering +2

Interaction-Grounded Learning

no code implementations9 Jun 2021 Tengyang Xie, John Langford, Paul Mineiro, Ida Momennejad

We propose Interaction-Grounded Learning for this novel setting, in which a learner's goal is to interact with the environment with no grounding or explicit reward to optimize its policies.

Finite Sample Analysis of Minimax Offline Reinforcement Learning: Completeness, Fast Rates and First-Order Efficiency

no code implementations5 Feb 2021 Masatoshi Uehara, Masaaki Imaizumi, Nan Jiang, Nathan Kallus, Wen Sun, Tengyang Xie

We offer a theoretical characterization of off-policy evaluation (OPE) in reinforcement learning using function approximation for marginal importance weights and $q$-functions when these are estimated using recent minimax methods.

Off-policy evaluation reinforcement-learning

A Variant of the Wang-Foster-Kakade Lower Bound for the Discounted Setting

no code implementations2 Nov 2020 Philip Amortila, Nan Jiang, Tengyang Xie

Recently, Wang et al. (2020) showed a highly intriguing hardness result for batch reinforcement learning (RL) with linearly realizable value function and good feature coverage in the finite-horizon case.

reinforcement-learning Reinforcement Learning (RL)

Batch Value-function Approximation with Only Realizability

1 code implementation11 Aug 2020 Tengyang Xie, Nan Jiang

We make progress in a long-standing problem of batch reinforcement learning (RL): learning $Q^\star$ from an exploratory and polynomial-sized dataset, using a realizable and otherwise arbitrary function class.

Model Selection Reinforcement Learning (RL)

Q* Approximation Schemes for Batch Reinforcement Learning: A Theoretical Comparison

no code implementations9 Mar 2020 Tengyang Xie, Nan Jiang

We prove performance guarantees of two algorithms for approximating $Q^\star$ in batch reinforcement learning.

reinforcement-learning Reinforcement Learning (RL)

Towards Optimal Off-Policy Evaluation for Reinforcement Learning with Marginalized Importance Sampling

no code implementations NeurIPS 2019 Tengyang Xie, Yifei Ma, Yu-Xiang Wang

To solve this problem, we consider a marginalized importance sampling (MIS) estimator that recursively estimates the state marginal distribution for the target policy at every step.

Off-policy evaluation reinforcement-learning

Provably Efficient Q-Learning with Low Switching Cost

no code implementations NeurIPS 2019 Yu Bai, Tengyang Xie, Nan Jiang, Yu-Xiang Wang

We take initial steps in studying PAC-MDP algorithms with limited adaptivity, that is, algorithms that change its exploration policy as infrequently as possible during regret minimization.

Q-Learning

Privacy Preserving Off-Policy Evaluation

no code implementations1 Feb 2019 Tengyang Xie, Philip S. Thomas, Gerome Miklau

Many reinforcement learning applications involve the use of data that is sensitive, such as medical records of patients or financial information.

Off-policy evaluation Privacy Preserving +1

A Block Coordinate Ascent Algorithm for Mean-Variance Optimization

no code implementations NeurIPS 2018 Bo Liu, Tengyang Xie, Yangyang Xu, Mohammad Ghavamzadeh, Yin-Lam Chow, Daoming Lyu, Daesub Yoon

Risk management in dynamic decision problems is a primary concern in many fields, including financial investment, autonomous driving, and healthcare.

Autonomous Driving Management

Cannot find the paper you are looking for? You can Submit a new open access paper.