Search Results for author: Jiaxuan Gao

Found 13 papers, 8 papers with code

How Far Are We from Optimal Reasoning Efficiency?

1 code implementation8 Jun 2025 Jiaxuan Gao, Shu Yan, Qixin Tan, Lu Yang, Shusheng Xu, Wei Fu, Zhiyu Mei, Kaifeng Lyu, Yi Wu

To reduce the efficiency gap, we propose REO-RL, a class of Reinforcement Learning algorithms that minimizes REG by targeting a sparse set of token budgets.

16k Benchmarking +1

AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning

1 code implementation30 May 2025 Wei Fu, Jiaxuan Gao, Xujie Shen, Chen Zhu, Zhiyu Mei, Chuyi He, Shusheng Xu, Guo Wei, Jun Mei, Jiashu Wang, Tongkai Yang, Binhang Yuan, Yi Wu

Most existing large-scale RL systems for LLMs are synchronous, alternating generation and training in a batch setting where rollouts in each training batch are generated by the same model.

Math Reinforcement Learning (RL)

Fine-tuning Diffusion Policies with Backpropagation Through Diffusion Timesteps

no code implementations15 May 2025 Ningyuan Yang, Jiaxuan Gao, Feng Gao, Yi Wu, Chao Yu

Diffusion policies, widely adopted in decision-making scenarios such as robotics, gaming and autonomous driving, are capable of learning diverse skills from demonstration data due to their high representation power.

Autonomous Driving Denoising +1

Industrial-Grade Sensor Simulation via Gaussian Splatting: A Modular Framework for Scalable Editing and Full-Stack Validation

no code implementations14 Mar 2025 Xianming Zeng, Sicong Du, Qifeng Chen, Lizhe Liu, Haoyu Shu, Jiaxuan Gao, Jiarun Liu, Jiulong Xu, Jianyun Xu, Mingxia Chen, Yiru Zhao, Peng Chen, Yapeng Xue, Chunming Zhao, Sheng Yang, Qiang Li

Then in practice, we refactor three crucial components through GS, to leverage its explicit scene representation and real-time rendering: (1) choosing the 2D neural Gaussian representation for physics-compliant scene and sensor modeling, (2) proposing a scene editing pipeline to leverage Gaussian primitives library for data augmentation, and (3) coupling a controllable diffusion model for scene expansion and harmonization.

Autonomous Driving Data Augmentation +2

Large Language Models are In-context Preference Learners

no code implementations22 Oct 2024 Chao Yu, Qixin Tan, Hong Lu, Jiaxuan Gao, Xinting Yang, Yu Wang, Yi Wu, Eugene Vinitsky

Preference-based reinforcement learning is an effective way to handle tasks where rewards are hard to specify but can be exceedingly inefficient as preference learning is often tabula rasa.

In-Context Learning reinforcement-learning +1

On Designing Effective RL Reward at Training Time for LLM Reasoning

no code implementations19 Oct 2024 Jiaxuan Gao, Shusheng Xu, Wenjie Ye, Weilin Liu, Chuyi He, Wei Fu, Zhiyu Mei, Guangju Wang, Yi Wu

In this work, we evaluate popular reward models for RL training, including the Outcome-supervised Reward Model (ORM) and the Process-supervised Reward Model (PRM), and train a collection of LLMs for math problems using RL by combining these learned rewards with success rewards.

GSM8K Math

Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study

1 code implementation16 Apr 2024 Shusheng Xu, Wei Fu, Jiaxuan Gao, Wenjie Ye, Weilin Liu, Zhiyu Mei, Guangju Wang, Chao Yu, Yi Wu

However, in academic benchmarks, state-of-the-art results are often achieved via reward-free methods, such as Direct Preference Optimization (DPO).

Code Generation

LLM-Powered Hierarchical Language Agent for Real-time Human-AI Coordination

1 code implementation23 Dec 2023 Jijia Liu, Chao Yu, Jiaxuan Gao, Yuqing Xie, Qingmin Liao, Yi Wu, Yu Wang

AI agents powered by Large Language Models (LLMs) have made significant advances, enabling them to assist humans in diverse complex tasks and leading to a revolution in human-AI coordination.

Code Generation

SRL: Scaling Distributed Reinforcement Learning to Over Ten Thousand Cores

2 code implementations29 Jun 2023 Zhiyu Mei, Wei Fu, Jiaxuan Gao, Guangju Wang, Huanchen Zhang, Yi Wu

Following this abstraction, we develop a scalable, efficient, and extensible distributed RL system called ReaLlyScalableRL, which allows efficient and massively parallelized training and easy development of customized algorithms.

reinforcement-learning Reinforcement Learning +1

Learning Efficient Multi-Agent Cooperative Visual Exploration

no code implementations12 Oct 2021 Chao Yu, Xinyi Yang, Jiaxuan Gao, Huazhong Yang, Yu Wang, Yi Wu

In this paper, we extend the state-of-the-art single-agent visual navigation method, Active Neural SLAM (ANS), to the multi-agent setting by introducing a novel RL-based planning module, Multi-agent Spatial Planner (MSP). MSP leverages a transformer-based architecture, Spatial-TeamFormer, which effectively captures spatial relations and intra-agent interactions via hierarchical spatial self-attentions.

Reinforcement Learning (RL) Visual Navigation

The Surprising Effectiveness of PPO in Cooperative, Multi-Agent Games

19 code implementations2 Mar 2021 Chao Yu, Akash Velu, Eugene Vinitsky, Jiaxuan Gao, Yu Wang, Alexandre Bayen, Yi Wu

This is often due to the belief that PPO is significantly less sample efficient than off-policy methods in multi-agent systems.

Multi-agent Reinforcement Learning reinforcement-learning +4

Cannot find the paper you are looking for? You can Submit a new open access paper.