1 code implementation • 7 Mar 2024 • Wei-Lin Chiang, Lianmin Zheng, Ying Sheng, Anastasios Nikolas Angelopoulos, Tianle Li, Dacheng Li, Hao Zhang, Banghua Zhu, Michael Jordan, Joseph E. Gonzalez, Ion Stoica
To address this issue, we introduce Chatbot Arena, an open platform for evaluating LLMs based on human preferences.
no code implementations • 20 Feb 2024 • Banghua Zhu, Norman Mu, Jiantao Jiao, David Wagner
Generative AI's expanding footprint across numerous industries has led to both excitement and increased scrutiny.
no code implementations • 2 Feb 2024 • Hanlin Zhu, Banghua Zhu, Jiantao Jiao
In this paper, we aim to improve the inference efficiency of LLMs by prompt caching, i. e., if the current prompt can be answered by the same response of a previous prompt, one can directly utilize that previous response without calling the LLM.
no code implementations • 29 Jan 2024 • Banghua Zhu, Michael I. Jordan, Jiantao Jiao
Reinforcement Learning from Human Feedback (RLHF) is a pivotal technique that aligns language models closely with human-centric values.
1 code implementation • 31 Dec 2023 • Ying Sheng, Shiyi Cao, Dacheng Li, Banghua Zhu, Zhuohan Li, Danyang Zhuo, Joseph E. Gonzalez, Ion Stoica
High-demand LLM inference services (e. g., ChatGPT and BARD) support a wide range of requests from short chat conversations to long document reading.
no code implementations • 13 Dec 2023 • Baihe Huang, Hanlin Zhu, Banghua Zhu, Kannan Ramchandran, Michael I. Jordan, Jason D. Lee, Jiantao Jiao
Key to our formulation is a coupling of the output tokens and the rejection region, realized by pseudo-random generators in practice, that allows non-trivial trade-offs between the Type I error and Type II error.
1 code implementation • 13 Dec 2023 • Cassidy Laidlaw, Banghua Zhu, Stuart Russell, Anca Dragan
Our goal is to explain why deep RL algorithms often perform well in practice, despite using random exploration and much more expressive function classes like neural networks.
1 code implementation • 6 Nov 2023 • Ying Sheng, Shiyi Cao, Dacheng Li, Coleman Hooper, Nicholas Lee, Shuo Yang, Christopher Chou, Banghua Zhu, Lianmin Zheng, Kurt Keutzer, Joseph E. Gonzalez, Ion Stoica
To capitalize on these opportunities, we present S-LoRA, a system designed for the scalable serving of many LoRA adapters.
no code implementations • 11 Oct 2023 • Qingyue Zhao, Banghua Zhu
The second level has the teacher probabilities of sampled labels available in addition, which turns out to boost the convergence rate lower bound to ${{|{\mathcal S}||{\mathcal A}|}/{n}}$.
no code implementations • 11 Oct 2023 • Zhikai Li, Xiaoxuan Liu, Banghua Zhu, Zhen Dong, Qingyi Gu, Kurt Keutzer
Large Language Models (LLMs) have showcased remarkable impacts across a wide spectrum of natural language processing tasks.
no code implementations • 30 Sep 2023 • Tianhao Wu, Banghua Zhu, Ruoyu Zhang, Zhaojin Wen, Kannan Ramchandran, Jiantao Jiao
In summary, this work introduces a simpler yet effective approach for aligning LLMs to human preferences through relative feedback.
no code implementations • 18 Sep 2023 • Jinning Li, Xinyi Liu, Banghua Zhu, Jiantao Jiao, Masayoshi Tomizuka, Chen Tang, Wei Zhan
GOLD distills an offline DT policy into a lightweight policy network through guided online safe RL training, which outperforms both the offline DT policy and online safe RL algorithms.
no code implementations • 7 Sep 2023 • Banghua Zhu, Ziao Wang, Nadim Ghaddar, Jiantao Jiao, Lele Wang
We consider the problem of computing a function of $n$ variables using noisy queries, where each query is incorrect with some fixed and known probability $p \in (0, 1/2)$.
no code implementations • 21 Jun 2023 • Banghua Zhu, Ziao Wang, Nadim Ghaddar, Jiantao Jiao, Lele Wang
However, the upper and lower bounds do not match in terms of the dependence on $\delta$ and $p$.
1 code implementation • 4 Jun 2023 • Banghua Zhu, Hiteshi Sharma, Felipe Vieira Frujeri, Shi Dong, Chenguang Zhu, Michael I. Jordan, Jiantao Jiao
Reinforcement learning from human feedback (RLHF) has emerged as a reliable approach to aligning large language models (LLMs) to human preferences.
1 code implementation • 3 Jun 2023 • Banghua Zhu, Ying Sheng, Lianmin Zheng, Clark Barrett, Michael I. Jordan, Jiantao Jiao
Theoretically, we provide an optimal algorithm for jointly optimizing both approaches to reduce the inference cost in both offline and online tabular settings.
1 code implementation • 1 Jun 2023 • Banghua Zhu, Mingyu Ding, Philip Jacobson, Ming Wu, Wei Zhan, Michael Jordan, Jiantao Jiao
Self-training is an important technique for solving semi-supervised learning problems.
no code implementations • 19 May 2023 • Banghua Zhu, Sai Praneeth Karimireddy, Jiantao Jiao, Michael I. Jordan
In this paper, we initiate the study of online learning in the creator economy by modeling the creator economy as a three-party game between the users, platform, and content creators, with the platform interacting with the content creator under a principal-agent model through contracts to encourage better content.
no code implementations • 27 Jan 2023 • Geng Zhao, Banghua Zhu, Jiantao Jiao, Michael I. Jordan
We analyze the sample complexity of regret minimization in this repeated Stackelberg game.
no code implementations • 26 Jan 2023 • Banghua Zhu, Jiantao Jiao, Michael I. Jordan
Our analysis shows that when the true reward function is linear, the widely used maximum likelihood estimator (MLE) converges under both the Bradley-Terry-Luce (BTL) model and the Plackett-Luce (PL) model.
no code implementations • 10 Nov 2022 • Banghua Zhu, Stephen Bates, Zhuoran Yang, Yixin Wang, Jiantao Jiao, Michael I. Jordan
This result shows that exponential-in-$m$ samples are sufficient and necessary to learn a near-optimal contract, resolving an open problem on the hardness of online contract design.
2 code implementations • 24 May 2022 • Banghua Zhu, Lun Wang, Qi Pang, Shuai Wang, Jiantao Jiao, Dawn Song, Michael I. Jordan
In contrast to prior work, our proposed protocols improve the dimension dependence and achieve a tight statistical rate in terms of all the parameters for strongly convex losses.
no code implementations • 5 Apr 2022 • Ikechukwu Uchendu, Ted Xiao, Yao Lu, Banghua Zhu, Mengyuan Yan, Joséphine Simon, Matthew Bennice, Chuyuan Fu, Cong Ma, Jiantao Jiao, Sergey Levine, Karol Hausman
In addition, we provide an upper bound on the sample complexity of JSRL and show that with the help of a guide-policy, one can improve the sample complexity for non-optimism exploration methods from exponential in horizon to polynomial.
no code implementations • 2 Feb 2022 • Banghua Zhu, Jiantao Jiao, Michael I. Jordan
Prior work focus on the problem of robust mean and covariance estimation when the true distribution lies in the family of Gaussian distributions or elliptical distributions, and analyze depth or scoring rule based GAN losses for the problem.
no code implementations • NeurIPS 2021 • Paria Rashidinejad, Banghua Zhu, Cong Ma, Jiantao Jiao, Stuart Russell
Based on the composition of the offline dataset, two main categories of methods are used: imitation learning which is suitable for expert datasets and vanilla offline RL which often requires uniform coverage datasets.
no code implementations • 19 Jan 2021 • Cong Ma, Banghua Zhu, Jiantao Jiao, Martin J. Wainwright
Second, when the behavior policy is unknown, we analyze performance in terms of the competitive ratio, thereby revealing a fundamental gap between the settings of known and unknown behavior policies.
no code implementations • 12 Jan 2021 • Matt Peng, Banghua Zhu, Jiantao Jiao
This paper introduces Fast Linearized Adaptive Policy (FLAP), a new meta-reinforcement learning (meta-RL) method that is able to extrapolate well to out-of-distribution tasks without the need to reuse data from training, and adapt almost instantaneously with the need of only a few samples during testing.
no code implementations • 28 May 2020 • Banghua Zhu, Jiantao Jiao, Jacob Steinhardt
We study the loss landscape of these robust estimation problems, and identify the existence of "generalized quasi-gradients".
no code implementations • 21 Jan 2020 • Banghua Zhu, Jiantao Jiao, Jacob Steinhardt
We show that under TV corruptions, the breakdown point reduces to 1/4 for the same set of distributions.
no code implementations • 19 Sep 2019 • Banghua Zhu, Jiantao Jiao, Jacob Steinhardt
This generalizes a property called resilience previously employed in the special case of mean estimation with outliers.
no code implementations • 27 Jan 2019 • Banghua Zhu, Jiantao Jiao, David Tse
Generalization: given a population target of GANs, we design a systematic principle, projection under admissible distance, to design GANs to meet the population requirement using finite samples.
no code implementations • 9 Aug 2018 • Banghua Zhu, Jintao Wang, Longzhuang He, Jian Song
The simulation results show that the performance of neural network based system is superior to traditional modulation and equalization methods in terms of time complexity and bit error rate (BER) under fading channels.