no code implementations • 28 Jun 2024 • Qian Yu, Yining Wang, Baihe Huang, Qi Lei, Jason D. Lee
Optimization of convex functions under stochastic zeroth-order feedback has been a major and challenging question in online learning.
no code implementations • 7 May 2024 • Hanlin Zhu, Baihe Huang, Shaolun Zhang, Michael Jordan, Jiantao Jiao, Yuandong Tian, Stuart Russell
Auto-regressive large language models (LLMs) show impressive capacities to solve many complex reasoning tasks while struggling with some simple logical reasoning tasks such as inverse search: when trained on ''A is B'', LLM fails to directly conclude ''B is A'' during inference, which is known as the ''reversal curse'' (Berglund et al., 2023).
no code implementations • 20 Mar 2024 • Charles Lu, Baihe Huang, Sai Praneeth Karimireddy, Praneeth Vepakomma, Michael Jordan, Ramesh Raskar
Acquiring high-quality training data is essential for current machine learning models.
no code implementations • 13 Dec 2023 • Baihe Huang, Hanlin Zhu, Banghua Zhu, Kannan Ramchandran, Michael I. Jordan, Jason D. Lee, Jiantao Jiao
Key to our formulation is a coupling of the output tokens and the rejection region, realized by pseudo-random generators in practice, that allows non-trivial trade-offs between the Type I error and Type II error.
no code implementations • 3 Oct 2023 • Hanlin Zhu, Baihe Huang, Stuart Russell
To the best of our knowledge, this work is the first to study the circuit complexity of RL, which also provides a rigorous framework for future research.
no code implementations • NeurIPS 2023 • Qian Yu, Yining Wang, Baihe Huang, Qi Lei, Jason D. Lee
We consider a fundamental setting in which the objective function is quadratic, and provide the first tight characterization of the optimal Hessian-dependent sample complexity.
no code implementations • 8 Jun 2023 • Baihe Huang, Sai Praneeth Karimireddy, Michael I. Jordan
This creates a tension between the principal (the FL platform designer) who cares about global performance and the agents (the data collectors) who care about local performance.
no code implementations • 9 Feb 2022 • Wenhao Zhan, Baihe Huang, Audrey Huang, Nan Jiang, Jason D. Lee
Sample-efficiency guarantees for offline reinforcement learning (RL) often rely on strong assumptions on both the function classes (e. g., Bellman-completeness) and the data coverage (e. g., all-policy concentrability).
no code implementations • 29 Sep 2021 • Baihe Huang, Zhao Song, Runzhou Tao, Ruizhe Zhang, Danyang Zhuo
Inspired by InstaHide challenge [Huang, Song, Li and Arora'20], [Chen, Song and Zhuo'20] recently provides one mathematical formulation of InstaHide attack problem under Gaussian images distribution.
no code implementations • ICLR 2022 • Baihe Huang, Jason D. Lee, Zhaoran Wang, Zhuoran Yang
In the {coordinated} setting where both players are controlled by the agent, we propose a model-based algorithm and a model-free algorithm.
no code implementations • NeurIPS 2021 • Baihe Huang, Kaixuan Huang, Sham M. Kakade, Jason D. Lee, Qi Lei, Runzhe Wang, Jiaqi Yang
While the theory of RL has traditionally focused on linear function approximation (or eluder dimension) approaches, little is known about nonlinear RL with neural net approximations of the Q functions.
no code implementations • NeurIPS 2021 • Baihe Huang, Kaixuan Huang, Sham M. Kakade, Jason D. Lee, Qi Lei, Runzhe Wang, Jiaqi Yang
This work considers a large family of bandit problems where the unknown underlying reward function is non-concave, including the low-rank generalized linear bandit problems and two-layer neural network with polynomial activation bandit problem.
no code implementations • 24 May 2021 • Wenhao Zhan, Shicong Cen, Baihe Huang, Yuxin Chen, Jason D. Lee, Yuejie Chi
These can often be accounted for via regularized RL, which augments the target value function with a structure-promoting regularizer.
no code implementations • 11 May 2021 • Baihe Huang, Xiaoxiao Li, Zhao Song, Xin Yang
Nevertheless, training analysis of neural networks in FL is non-trivial for two reasons: first, the objective loss function we are optimizing is non-smooth and non-convex, and second, we are even not updating in the gradient direction.
no code implementations • 20 Jan 2021 • Baihe Huang, Shunhua Jiang, Zhao Song, Runzhou Tao
This paper introduces a new robust interior point method analysis for semidefinite programming (SDP).
Optimization and Control Data Structures and Algorithms
no code implementations • 24 Nov 2020 • Baihe Huang, Zhao Song, Runzhou Tao, Junze Yin, Ruizhe Zhang, Danyang Zhuo
On the current InstaHide challenge setup, where each InstaHide image is a mixture of two private images, we present a new algorithm to recover all the private images with a provable guarantee and optimal sample complexity.