Search Results for author: Zheng Wu

Found 12 papers, 1 papers with code

Flaming-hot Initiation with Regular Execution Sampling for Large Language Models

no code implementations28 Oct 2024 Weizhe Chen, Zhicheng Zhang, Guanlin Liu, Renjie Zheng, Wenlei Shi, Chen Dun, Zheng Wu, Xing Jin, Lin Yan

Since the release of ChatGPT, large language models (LLMs) have demonstrated remarkable capabilities across various domains.

Diversity Math

Process Supervision-Guided Policy Optimization for Code Generation

no code implementations23 Oct 2024 Ning Dai, Zheng Wu, Renjie Zheng, Ziyun Wei, Wenlei Shi, Xing Jin, Guanlin Liu, Chen Dun, Liang Huang, Lin Yan

Reinforcement Learning (RL) with unit test feedback has enhanced large language models (LLMs) code generation, but relies on sparse rewards provided only after complete code evaluation, limiting learning efficiency and incremental improvements.

Code Generation Reinforcement Learning (RL)

Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization

no code implementations11 Oct 2024 Guanlin Liu, Kaixuan Ji, Renjie Zheng, Zheng Wu, Chen Dun, Quanquan Gu, Lin Yan

Reinforcement Learning (RL) plays a crucial role in aligning large language models (LLMs) with human preferences and improving their ability to perform complex tasks.

GSM8K Language Modelling +5

Expressing Diverse Human Driving Behavior with Probabilistic Rewards and Online Inference

no code implementations20 Aug 2020 Liting Sun, Zheng Wu, Hengbo Ma, Masayoshi Tomizuka

In human-robot interaction (HRI) systems, such as autonomous vehicles, understanding and representing human behavior are important.

Autonomous Vehicles Diversity

Efficient Sampling-Based Maximum Entropy Inverse Reinforcement Learning with Application to Autonomous Driving

no code implementations22 Jun 2020 Zheng Wu, Liting Sun, Wei Zhan, Chenyu Yang, Masayoshi Tomizuka

Different from existing IRL algorithms, by introducing an efficient continuous-domain trajectory sampler, the proposed algorithm can directly learn the reward functions in the continuous domain while considering the uncertainties in demonstrated trajectories from human drivers.

Autonomous Driving reinforcement-learning +2

Learning to Describe Scenes with Programs

no code implementations ICLR 2019 Yunchao Liu, Zheng Wu, Daniel Ritchie, William T. Freeman, Joshua B. Tenenbaum, Jiajun Wu

We are able to understand the higher-level, abstract regularities within the scene such as symmetry and repetition.

Class Probability Estimation via Differential Geometric Regularization

no code implementations4 Mar 2015 Qinxun Bai, Steven Rosenberg, Zheng Wu, Stan Sclaroff

We study the problem of supervised learning for both binary and multiclass classification from a unified geometric perspective.

Classification General Classification

Cannot find the paper you are looking for? You can Submit a new open access paper.