Search Results for author: Yecheng Jason Ma

Found 13 papers, 9 papers with code

Composing Pre-Trained Object-Centric Representations for Robotics From "What" and "Where" Foundation Models

no code implementations20 Apr 2024 Junyao Shi, Jianing Qian, Yecheng Jason Ma, Dinesh Jayaraman

There have recently been large advances both in pre-training visual representations for robotic control and segmenting unknown category objects in general images.

Eureka: Human-Level Reward Design via Coding Large Language Models

1 code implementation19 Oct 2023 Yecheng Jason Ma, William Liang, Guanzhi Wang, De-An Huang, Osbert Bastani, Dinesh Jayaraman, Yuke Zhu, Linxi Fan, Anima Anandkumar

The generality of Eureka also enables a new gradient-free in-context learning approach to reinforcement learning from human feedback (RLHF), readily incorporating human inputs to improve the quality and the safety of the generated rewards without model updating.

Decision Making In-Context Learning +1

Universal Visual Decomposer: Long-Horizon Manipulation Made Easy

no code implementations12 Oct 2023 Zichen Zhang, Yunshuang Li, Osbert Bastani, Abhishek Gupta, Dinesh Jayaraman, Yecheng Jason Ma, Luca Weihs

Learning long-horizon manipulation tasks, however, is a long-standing challenge, and demands decomposing the overarching task into several manageable subtasks to facilitate policy learning and generalization to unseen tasks.

reinforcement-learning

LIV: Language-Image Representations and Rewards for Robotic Control

1 code implementation1 Jun 2023 Yecheng Jason Ma, William Liang, Vaidehi Som, Vikash Kumar, Amy Zhang, Osbert Bastani, Dinesh Jayaraman

We present Language-Image Value learning (LIV), a unified objective for vision-language representation and reward learning from action-free videos with text annotations.

Contrastive Learning Imitation Learning

TOM: Learning Policy-Aware Models for Model-Based Reinforcement Learning via Transition Occupancy Matching

no code implementations22 May 2023 Yecheng Jason Ma, Kausik Sivakumar, Jason Yan, Osbert Bastani, Dinesh Jayaraman

Standard model-based reinforcement learning (MBRL) approaches fit a transition model of the environment to all past experience, but this wastes model capacity on data that is irrelevant for policy improvement.

Model-based Reinforcement Learning reinforcement-learning

VIP: Towards Universal Visual Reward and Representation via Value-Implicit Pre-Training

1 code implementation30 Sep 2022 Yecheng Jason Ma, Shagun Sodhani, Dinesh Jayaraman, Osbert Bastani, Vikash Kumar, Amy Zhang

Given the inherent cost and scarcity of in-domain, task-specific robot data, learning from large, diverse, offline human videos has emerged as a promising path towards acquiring a generally useful visual representation for control; however, how these human videos can be used for general-purpose reward learning remains an open question.

Offline RL Open-Ended Question Answering +2

How Far I'll Go: Offline Goal-Conditioned Reinforcement Learning via $f$-Advantage Regression

1 code implementation7 Jun 2022 Yecheng Jason Ma, Jason Yan, Dinesh Jayaraman, Osbert Bastani

Offline goal-conditioned reinforcement learning (GCRL) promises general-purpose skill learning in the form of reaching diverse goals from purely offline datasets.

regression reinforcement-learning +1

Versatile Offline Imitation from Observations and Examples via Regularized State-Occupancy Matching

2 code implementations4 Feb 2022 Yecheng Jason Ma, Andrew Shen, Dinesh Jayaraman, Osbert Bastani

We propose State Matching Offline DIstribution Correction Estimation (SMODICE), a novel and versatile regression-based offline imitation learning (IL) algorithm derived via state-occupancy matching.

Imitation Learning Reinforcement Learning (RL)

State Relevance for Off-Policy Evaluation

1 code implementation13 Sep 2021 Simon P. Shen, Yecheng Jason Ma, Omer Gottesman, Finale Doshi-Velez

Importance sampling-based estimators for off-policy evaluation (OPE) are valued for their simplicity, unbiasedness, and reliance on relatively few assumptions.

Off-policy evaluation

Conservative Offline Distributional Reinforcement Learning

1 code implementation NeurIPS 2021 Yecheng Jason Ma, Dinesh Jayaraman, Osbert Bastani

We prove that CODAC learns a conservative return distribution -- in particular, for finite MDPs, CODAC converges to an uniform lower bound on the quantiles of the return distribution; our proof relies on a novel analysis of the distributional Bellman operator.

D4RL Distributional Reinforcement Learning +4

Likelihood-Based Diverse Sampling for Trajectory Forecasting

1 code implementation ICCV 2021 Yecheng Jason Ma, Jeevana Priya Inala, Dinesh Jayaraman, Osbert Bastani

We propose Likelihood-Based Diverse Sampling (LDS), a method for improving the quality and the diversity of trajectory samples from a pre-trained flow model.

Trajectory Forecasting

Cannot find the paper you are looking for? You can Submit a new open access paper.