Search Results for author: Andrew Zhao

Found 16 papers, 7 papers with code

Absolute Zero: Reinforced Self-play Reasoning with Zero Data

2 code implementations6 May 2025 Andrew Zhao, Yiran Wu, Tong Wu, Quentin Xu, Yang Yue, Matthieu Lin, Shenzhi Wang, Qingyun Wu, Zilong Zheng, Gao Huang

Reinforcement learning with verifiable rewards (RLVR) has shown promise in enhancing the reasoning capabilities of large language models by learning directly from outcome-based rewards.

Mathematical Reasoning

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

no code implementations18 Apr 2025 Zhiqi Chen, Rui Lu, Andrew Zhao, Zhaokai Wang, Yang Yue, Shiji Song, Gao Huang

Overall, our findings suggest that current RLVR methods have not yet realized the potential of RL to elicit truly novel reasoning abilities in LLMs.

Math Visual Reasoning

Towards Understanding the Benefit of Multitask Representation Learning in Decision Process

no code implementations1 Mar 2025 Rui Lu, Yang Yue, Andrew Zhao, Simon Du, Gao Huang

Our work tries to fill the gap by extending the analysis to \textit{unknown non-linear} representations, giving a comprehensive analysis for its mechanism in online and transfer learning setting.

Multi-Armed Bandits Reinforcement Learning (RL) +2

Learning the structure of any Hamiltonian from minimal assumptions

no code implementations29 Oct 2024 Andrew Zhao

We consider two models of control over the time evolution:~the first has access to time reversal ($t < 0$), enabling an algorithm that outputs an $\epsilon$-accurate classical description of $H$ after querying its dynamics for a total of $\widetilde{\mathcal{O}}(m/\epsilon)$ evolution time.

Model Surgery: Modulating LLM's Behavior Via Simple Parameter Editing

1 code implementation11 Jul 2024 Huanqian Wang, Yang Yue, Rui Lu, Jingxin Shi, Andrew Zhao, Shenzhi Wang, Shiji Song, Gao Huang

Furthermore, models modified through SFT and RLHF may deviate from the pretrained models, potentially leading to a degradation in foundational LLM capabilities.

Common Sense Reasoning Question Answering

Empowering Interdisciplinary Insights with Dynamic Graph Embedding Trajectories

no code implementations25 Jun 2024 Yiqiao Jin, Andrew Zhao, Yeon-Chang Lee, Meng Ye, Ajay Divakaran, Srijan Kumar

Our work not only addresses the ongoing challenges in visualizing and analyzing DTDG models but also establishes a foundational framework for future investigations into dynamic graph representation and analysis across various disciplines.

Dynamic graph embedding Epidemiology

DiveR-CT: Diversity-enhanced Red Teaming Large Language Model Assistants with Relaxing Constraints

1 code implementation29 May 2024 Andrew Zhao, Quentin Xu, Matthieu Lin, Shenzhi Wang, Yong-Jin Liu, Zilong Zheng, Gao Huang

Recent advances in large language model assistants have made them indispensable, raising significant concerns over managing their safety.

Diversity Language Modeling +3

Augmenting Unsupervised Reinforcement Learning with Self-Reference

no code implementations16 Nov 2023 Andrew Zhao, Erle Zhu, Rui Lu, Matthieu Lin, Yong-Jin Liu, Gao Huang

Our approach achieves state-of-the-art results in terms of Interquartile Mean (IQM) performance and Optimality Gap reduction on the Unsupervised Reinforcement Learning Benchmark for model-free methods, recording an 86% IQM and a 16% Optimality Gap.

Attribute reinforcement-learning +2

ExpeL: LLM Agents Are Experiential Learners

1 code implementation20 Aug 2023 Andrew Zhao, Daniel Huang, Quentin Xu, Matthieu Lin, Yong-Jin Liu, Gao Huang

The recent surge in research interest in applying large language models (LLMs) to decision-making tasks has flourished by leveraging the extensive world knowledge embedded in LLMs.

Decision Making Transfer Learning +1

A Mixture of Surprises for Unsupervised Reinforcement Learning

1 code implementation13 Oct 2022 Andrew Zhao, Matthieu Gaetan Lin, Yangguang Li, Yong-Jin Liu, Gao Huang

However, both strategies rely on a strong assumption: the entropy of the environment's dynamics is either high or low.

reinforcement-learning Reinforcement Learning +2

Provable General Function Class Representation Learning in Multitask Bandits and MDPs

no code implementations31 May 2022 Rui Lu, Andrew Zhao, Simon S. Du, Gao Huang

While multitask representation learning has become a popular approach in reinforcement learning (RL) to boost the sample efficiency, the theoretical understanding of why and how it works is still limited.

Multi-Armed Bandits Reinforcement Learning (RL) +1

Prevalence and recoverability of syntactic parameters in sparse distributed memories

no code implementations21 Oct 2015 Jeong Joon Park, Ronnel Boettcher, Andrew Zhao, Alex Mun, Kevin Yuh, Vibhor Kumar, Matilde Marcolli

We propose a new method, based on Sparse Distributed Memory (Kanerva Networks), for studying dependency relations between different syntactic parameters in the Principles and Parameters model of Syntax.

Relation

Cannot find the paper you are looking for? You can Submit a new open access paper.