no code implementations • 28 Oct 2024 • Weizhe Chen, Zhicheng Zhang, Guanlin Liu, Renjie Zheng, Wenlei Shi, Chen Dun, Zheng Wu, Xing Jin, Lin Yan
Since the release of ChatGPT, large language models (LLMs) have demonstrated remarkable capabilities across various domains.
no code implementations • 23 Oct 2024 • Ning Dai, Zheng Wu, Renjie Zheng, Ziyun Wei, Wenlei Shi, Xing Jin, Guanlin Liu, Chen Dun, Liang Huang, Lin Yan
Reinforcement Learning (RL) with unit test feedback has enhanced large language models (LLMs) code generation, but relies on sparse rewards provided only after complete code evaluation, limiting learning efficiency and incremental improvements.
no code implementations • 11 Oct 2024 • Guanlin Liu, Kaixuan Ji, Renjie Zheng, Zheng Wu, Chen Dun, Quanquan Gu, Lin Yan
Reinforcement Learning (RL) plays a crucial role in aligning large language models (LLMs) with human preferences and improving their ability to perform complex tasks.
1 code implementation • 6 Dec 2023 • Zheqing Zhu, Rodrigo de Salvo Braz, Jalaj Bhandari, Daniel Jiang, Yi Wan, Yonathan Efroni, Liyuan Wang, Ruiyang Xu, Hongbo Guo, Alex Nikulkov, Dmytro Korenkevych, Urun Dogan, Frank Cheng, Zheng Wu, Wanqiao Xu
Reinforcement learning (RL) is a versatile framework for optimizing long-term goals.
no code implementations • 3 Dec 2022 • Yanjiang Guo, Jingyue Gao, Zheng Wu, Chengming Shi, Jianyu Chen
In this paper, we consider the case where the target task is mismatched from but similar with that of the expert.
no code implementations • 1 Oct 2022 • Zheng Wu, Yichen Xie, Wenzhao Lian, Changhao Wang, Yanjiang Guo, Jianyu Chen, Stefan Schaal, Masayoshi Tomizuka
Experimental results demonstrate that our proposed method achieves policy generalization to unseen compositional tasks in a zero-shot manner.
no code implementations • 19 Jul 2022 • Gopinath Chennupati, Milind Rao, Gurpreet Chadha, Aaron Eakin, Anirudh Raju, Gautam Tiwari, Anit Kumar Sahu, Ariya Rastrow, Jasha Droppo, Andy Oberlin, Buddha Nandanoor, Prahalad Venkataramanan, Zheng Wu, Pankaj Sitpure
For end-to-end automatic speech recognition (ASR) tasks, the absence of human annotated labels along with the need for privacy preserving policies for model building makes it a daunting challenge.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 20 Aug 2020 • Liting Sun, Zheng Wu, Hengbo Ma, Masayoshi Tomizuka
In human-robot interaction (HRI) systems, such as autonomous vehicles, understanding and representing human behavior are important.
no code implementations • 22 Jun 2020 • Zheng Wu, Liting Sun, Wei Zhan, Chenyu Yang, Masayoshi Tomizuka
Different from existing IRL algorithms, by introducing an efficient continuous-domain trajectory sampler, the proposed algorithm can directly learn the reward functions in the continuous domain while considering the uncertainties in demonstrated trajectories from human drivers.
no code implementations • ICLR 2019 • Yunchao Liu, Zheng Wu, Daniel Ritchie, William T. Freeman, Joshua B. Tenenbaum, Jiajun Wu
We are able to understand the higher-level, abstract regularities within the scene such as symmetry and repetition.
no code implementations • 1 Feb 2018 • Zheng Wu, Ruiheng Chang, Jiaxu Ma, Cewu Lu, Chi-Keung Tang
We propose a novel approach for instance segmen- tation given an image of homogeneous object clus- ter (HOC).
no code implementations • 4 Mar 2015 • Qinxun Bai, Steven Rosenberg, Zheng Wu, Stan Sclaroff
We study the problem of supervised learning for both binary and multiclass classification from a unified geometric perspective.