Search Results for author: Baihe Huang

Found 16 papers, 0 papers with code

Stochastic Zeroth-Order Optimization under Strongly Convexity and Lipschitz Hessian: Minimax Sample Complexity

no code implementations28 Jun 2024 Qian Yu, Yining Wang, Baihe Huang, Qi Lei, Jason D. Lee

Optimization of convex functions under stochastic zeroth-order feedback has been a major and challenging question in online learning.

Towards a Theoretical Understanding of the 'Reversal Curse' via Training Dynamics

no code implementations7 May 2024 Hanlin Zhu, Baihe Huang, Shaolun Zhang, Michael Jordan, Jiantao Jiao, Yuandong Tian, Stuart Russell

Auto-regressive large language models (LLMs) show impressive capacities to solve many complex reasoning tasks while struggling with some simple logical reasoning tasks such as inverse search: when trained on ''A is B'', LLM fails to directly conclude ''B is A'' during inference, which is known as the ''reversal curse'' (Berglund et al., 2023).

Logical Reasoning

Towards Optimal Statistical Watermarking

no code implementations13 Dec 2023 Baihe Huang, Hanlin Zhu, Banghua Zhu, Kannan Ramchandran, Michael I. Jordan, Jason D. Lee, Jiantao Jiao

Key to our formulation is a coupling of the output tokens and the rejection region, realized by pseudo-random generators in practice, that allows non-trivial trade-offs between the Type I error and Type II error.

On Representation Complexity of Model-based and Model-free Reinforcement Learning

no code implementations3 Oct 2023 Hanlin Zhu, Baihe Huang, Stuart Russell

To the best of our knowledge, this work is the first to study the circuit complexity of RL, which also provides a rigorous framework for future research.

reinforcement-learning Reinforcement Learning (RL)

Sample Complexity for Quadratic Bandits: Hessian Dependent Bounds and Optimal Algorithms

no code implementations NeurIPS 2023 Qian Yu, Yining Wang, Baihe Huang, Qi Lei, Jason D. Lee

We consider a fundamental setting in which the objective function is quadratic, and provide the first tight characterization of the optimal Hessian-dependent sample complexity.

valid

Evaluating and Incentivizing Diverse Data Contributions in Collaborative Learning

no code implementations8 Jun 2023 Baihe Huang, Sai Praneeth Karimireddy, Michael I. Jordan

This creates a tension between the principal (the FL platform designer) who cares about global performance and the agents (the data collectors) who care about local performance.

Diversity Federated Learning

Offline Reinforcement Learning with Realizability and Single-policy Concentrability

no code implementations9 Feb 2022 Wenhao Zhan, Baihe Huang, Audrey Huang, Nan Jiang, Jason D. Lee

Sample-efficiency guarantees for offline reinforcement learning (RL) often rely on strong assumptions on both the function classes (e. g., Bellman-completeness) and the data coverage (e. g., all-policy concentrability).

Offline RL reinforcement-learning +1

InstaHide’s Sample Complexity When Mixing Two Private Images

no code implementations29 Sep 2021 Baihe Huang, Zhao Song, Runzhou Tao, Ruizhe Zhang, Danyang Zhuo

Inspired by InstaHide challenge [Huang, Song, Li and Arora'20], [Chen, Song and Zhuo'20] recently provides one mathematical formulation of InstaHide attack problem under Gaussian images distribution.

Vocal Bursts Valence Prediction

Towards General Function Approximation in Zero-Sum Markov Games

no code implementations ICLR 2022 Baihe Huang, Jason D. Lee, Zhaoran Wang, Zhuoran Yang

In the {coordinated} setting where both players are controlled by the agent, we propose a model-based algorithm and a model-free algorithm.

Going Beyond Linear RL: Sample Efficient Neural Function Approximation

no code implementations NeurIPS 2021 Baihe Huang, Kaixuan Huang, Sham M. Kakade, Jason D. Lee, Qi Lei, Runzhe Wang, Jiaqi Yang

While the theory of RL has traditionally focused on linear function approximation (or eluder dimension) approaches, little is known about nonlinear RL with neural net approximations of the Q functions.

Reinforcement Learning (RL)

Optimal Gradient-based Algorithms for Non-concave Bandit Optimization

no code implementations NeurIPS 2021 Baihe Huang, Kaixuan Huang, Sham M. Kakade, Jason D. Lee, Qi Lei, Runzhe Wang, Jiaqi Yang

This work considers a large family of bandit problems where the unknown underlying reward function is non-concave, including the low-rank generalized linear bandit problems and two-layer neural network with polynomial activation bandit problem.

Policy Mirror Descent for Regularized Reinforcement Learning: A Generalized Framework with Linear Convergence

no code implementations24 May 2021 Wenhao Zhan, Shicong Cen, Baihe Huang, Yuxin Chen, Jason D. Lee, Yuejie Chi

These can often be accounted for via regularized RL, which augments the target value function with a structure-promoting regularizer.

Reinforcement Learning (RL)

FL-NTK: A Neural Tangent Kernel-based Framework for Federated Learning Convergence Analysis

no code implementations11 May 2021 Baihe Huang, Xiaoxiao Li, Zhao Song, Xin Yang

Nevertheless, training analysis of neural networks in FL is non-trivial for two reasons: first, the objective loss function we are optimizing is non-smooth and non-convex, and second, we are even not updating in the gradient direction.

Federated Learning

Solving SDP Faster: A Robust IPM Framework and Efficient Implementation

no code implementations20 Jan 2021 Baihe Huang, Shunhua Jiang, Zhao Song, Runzhou Tao

This paper introduces a new robust interior point method analysis for semidefinite programming (SDP).

Optimization and Control Data Structures and Algorithms

InstaHide's Sample Complexity When Mixing Two Private Images

no code implementations24 Nov 2020 Baihe Huang, Zhao Song, Runzhou Tao, Junze Yin, Ruizhe Zhang, Danyang Zhuo

On the current InstaHide challenge setup, where each InstaHide image is a mixture of two private images, we present a new algorithm to recover all the private images with a provable guarantee and optimal sample complexity.

Vocal Bursts Valence Prediction

Cannot find the paper you are looking for? You can Submit a new open access paper.