2 code implementations • 6 May 2025 • Andrew Zhao, Yiran Wu, Tong Wu, Quentin Xu, Yang Yue, Matthieu Lin, Shenzhi Wang, Qingyun Wu, Zilong Zheng, Gao Huang
Reinforcement learning with verifiable rewards (RLVR) has shown promise in enhancing the reasoning capabilities of large language models by learning directly from outcome-based rewards.
no code implementations • 18 Apr 2025 • Zhiqi Chen, Rui Lu, Andrew Zhao, Zhaokai Wang, Yang Yue, Shiji Song, Gao Huang
Overall, our findings suggest that current RLVR methods have not yet realized the potential of RL to elicit truly novel reasoning abilities in LLMs.
no code implementations • 1 Mar 2025 • Rui Lu, Yang Yue, Andrew Zhao, Simon Du, Gao Huang
Our work tries to fill the gap by extending the analysis to \textit{unknown non-linear} representations, giving a comprehensive analysis for its mechanism in online and transfer learning setting.
no code implementations • 22 Nov 2024 • Luhang Sun, Varsha Pendyala, Yun-Shiuan Chuang, Shanglin Yang, Jonathan Feldman, Andrew Zhao, Munmun De Choudhury, Sijia Yang, Dhavan Shah
This paper leverages large-language models (LLMs) to experimentally determine optimal strategies for scaling up social media content annotation for stance detection on HPV vaccine-related tweets.
no code implementations • 29 Oct 2024 • Andrew Zhao
We consider two models of control over the time evolution:~the first has access to time reversal ($t < 0$), enabling an algorithm that outputs an $\epsilon$-accurate classical description of $H$ after querying its dynamics for a total of $\widetilde{\mathcal{O}}(m/\epsilon)$ evolution time.
1 code implementation • 21 Oct 2024 • Matthieu Lin, Jenny Sheng, Andrew Zhao, Shenzhi Wang, Yang Yue, Victor Shea Jay Huang, Huan Liu, Jun Liu, Gao Huang, Yong-Jin Liu
In this survey, we refer to this paradigm as training of scaffolded LMs with language supervision.
1 code implementation • 11 Jul 2024 • Huanqian Wang, Yang Yue, Rui Lu, Jingxin Shi, Andrew Zhao, Shenzhi Wang, Shiji Song, Gao Huang
Furthermore, models modified through SFT and RLHF may deviate from the pretrained models, potentially leading to a degradation in foundational LLM capabilities.
no code implementations • 25 Jun 2024 • Yiqiao Jin, Andrew Zhao, Yeon-Chang Lee, Meng Ye, Ajay Divakaran, Srijan Kumar
Our work not only addresses the ongoing challenges in visualizing and analyzing DTDG models but also establishes a foundational framework for future investigations into dynamic graph representation and analysis across various disciplines.
1 code implementation • 29 May 2024 • Andrew Zhao, Quentin Xu, Matthieu Lin, Shenzhi Wang, Yong-Jin Liu, Zilong Zheng, Gao Huang
Recent advances in large language model assistants have made them indispensable, raising significant concerns over managing their safety.
1 code implementation • 15 Apr 2024 • Jenny Sheng, Matthieu Lin, Andrew Zhao, Kevin Pruvost, Yu-Hui Wen, Yangguang Li, Gao Huang, Yong-Jin Liu
This paper presents an exploration of preference learning in text-to-motion generation.
no code implementations • 16 Nov 2023 • Andrew Zhao, Erle Zhu, Rui Lu, Matthieu Lin, Yong-Jin Liu, Gao Huang
Our approach achieves state-of-the-art results in terms of Interquartile Mean (IQM) performance and Optimality Gap reduction on the Unsupervised Reinforcement Learning Benchmark for model-free methods, recording an 86% IQM and a 16% Optimality Gap.
no code implementations • 2 Oct 2023 • Shenzhi Wang, Chang Liu, Zilong Zheng, Siyuan Qi, Shuo Chen, Qisen Yang, Andrew Zhao, Chaofei Wang, Shiji Song, Gao Huang
This study utilizes the intricate Avalon game as a testbed to explore LLMs' potential in deceptive environments.
1 code implementation • 20 Aug 2023 • Andrew Zhao, Daniel Huang, Quentin Xu, Matthieu Lin, Yong-Jin Liu, Gao Huang
The recent surge in research interest in applying large language models (LLMs) to decision-making tasks has flourished by leveraging the extensive world knowledge embedded in LLMs.
1 code implementation • 13 Oct 2022 • Andrew Zhao, Matthieu Gaetan Lin, Yangguang Li, Yong-Jin Liu, Gao Huang
However, both strategies rely on a strong assumption: the entropy of the environment's dynamics is either high or low.
no code implementations • 31 May 2022 • Rui Lu, Andrew Zhao, Simon S. Du, Gao Huang
While multitask representation learning has become a popular approach in reinforcement learning (RL) to boost the sample efficiency, the theoretical understanding of why and how it works is still limited.
no code implementations • 21 Oct 2015 • Jeong Joon Park, Ronnel Boettcher, Andrew Zhao, Alex Mun, Kevin Yuh, Vibhor Kumar, Matilde Marcolli
We propose a new method, based on Sparse Distributed Memory (Kanerva Networks), for studying dependency relations between different syntactic parameters in the Principles and Parameters model of Syntax.