Search Results for author: Banghua Zhu

Found 32 papers, 8 papers with code

Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference

1 code implementation • 7 Mar 2024 • Wei-Lin Chiang, Lianmin Zheng, Ying Sheng, Anastasios Nikolas Angelopoulos, Tianle Li, Dacheng Li, Hao Zhang, Banghua Zhu, Michael Jordan, Joseph E. Gonzalez, Ion Stoica

To address this issue, we introduce Chatbot Arena, an open platform for evaluating LLMs based on human preferences.

Chatbot

33,860

Paper
Code

Generative AI Security: Challenges and Countermeasures

no code implementations • 20 Feb 2024 • Banghua Zhu, Norman Mu, Jiantao Jiao, David Wagner

Generative AI's expanding footprint across numerous industries has led to both excitement and increased scrutiny.

Paper
Add Code

Efficient Prompt Caching via Embedding Similarity

no code implementations • 2 Feb 2024 • Hanlin Zhu, Banghua Zhu, Jiantao Jiao

In this paper, we aim to improve the inference efficiency of LLMs by prompt caching, i. e., if the current prompt can be answered by the same response of a previous prompt, one can directly utilize that previous response without calling the LLM.

Question Answering

Paper
Add Code

Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF

no code implementations • 29 Jan 2024 • Banghua Zhu, Michael I. Jordan, Jiantao Jiao

Reinforcement Learning from Human Feedback (RLHF) is a pivotal technique that aligns language models closely with human-centric values.

Paper
Add Code

Fairness in Serving Large Language Models

1 code implementation • 31 Dec 2023 • Ying Sheng, Shiyi Cao, Dacheng Li, Banghua Zhu, Zhuohan Li, Danyang Zhuo, Joseph E. Gonzalez, Ion Stoica

High-demand LLM inference services (e. g., ChatGPT and BARD) support a wide range of requests from short chat conversations to long document reading.

Fairness Scheduling

1,458

Paper
Code

Towards Optimal Statistical Watermarking

no code implementations • 13 Dec 2023 • Baihe Huang, Hanlin Zhu, Banghua Zhu, Kannan Ramchandran, Michael I. Jordan, Jason D. Lee, Jiantao Jiao

Key to our formulation is a coupling of the output tokens and the rejection region, realized by pseudo-random generators in practice, that allows non-trivial trade-offs between the Type I error and Type II error.

Paper
Add Code

The Effective Horizon Explains Deep RL Performance in Stochastic Environments

1 code implementation • 13 Dec 2023 • Cassidy Laidlaw, Banghua Zhu, Stuart Russell, Anca Dragan

Our goal is to explain why deep RL algorithms often perform well in practice, despite using random exploration and much more expressive function classes like neural networks.

Reinforcement Learning (RL)

Paper
Code

S-LoRA: Serving Thousands of Concurrent LoRA Adapters

1 code implementation • 6 Nov 2023 • Ying Sheng, Shiyi Cao, Dacheng Li, Coleman Hooper, Nicholas Lee, Shuo Yang, Christopher Chou, Banghua Zhu, Lianmin Zheng, Kurt Keutzer, Joseph E. Gonzalez, Ion Stoica

To capitalize on these opportunities, we present S-LoRA, a system designed for the scalable serving of many LoRA adapters.

1,458

Paper
Code

Towards the Fundamental Limits of Knowledge Transfer over Finite Domains

no code implementations • 11 Oct 2023 • Qingyue Zhao, Banghua Zhu

The second level has the teacher probabilities of sampled labels available in addition, which turns out to boost the convergence rate lower bound to ${{|{\mathcal S}||{\mathcal A}|}/{n}}$.

Transfer Learning

Paper
Add Code

QFT: Quantized Full-parameter Tuning of LLMs with Affordable Resources

no code implementations • 11 Oct 2023 • Zhikai Li, Xiaoxuan Liu, Banghua Zhu, Zhen Dong, Qingyi Gu, Kurt Keutzer

Large Language Models (LLMs) have showcased remarkable impacts across a wide spectrum of natural language processing tasks.

Quantization

Paper
Add Code

Pairwise Proximal Policy Optimization: Harnessing Relative Feedback for LLM Alignment

no code implementations • 30 Sep 2023 • Tianhao Wu, Banghua Zhu, Ruoyu Zhang, Zhaojin Wen, Kannan Ramchandran, Jiantao Jiao

In summary, this work introduces a simpler yet effective approach for aligning LLMs to human preferences through relative feedback.

reinforcement-learning World Knowledge

Paper
Add Code

Guided Online Distillation: Promoting Safe Reinforcement Learning by Offline Demonstration

no code implementations • 18 Sep 2023 • Jinning Li, Xinyi Liu, Banghua Zhu, Jiantao Jiao, Masayoshi Tomizuka, Chen Tang, Wei Zhan

GOLD distills an offline DT policy into a lightweight policy network through guided online safe RL training, which outperforms both the offline DT policy and online safe RL algorithms.

Autonomous Driving Decision Making +3

Paper
Add Code

Noisy Computing of the $\mathsf{OR}$ and $\mathsf{MAX}$ Functions

no code implementations • 7 Sep 2023 • Banghua Zhu, Ziao Wang, Nadim Ghaddar, Jiantao Jiao, Lele Wang

We consider the problem of computing a function of $n$ variables using noisy queries, where each query is incorrect with some fixed and known probability $p \in (0, 1/2)$.

Paper
Add Code

On the Optimal Bounds for Noisy Computing

no code implementations • 21 Jun 2023 • Banghua Zhu, Ziao Wang, Nadim Ghaddar, Jiantao Jiao, Lele Wang

However, the upper and lower bounds do not match in terms of the dependence on $\delta$ and $p$.

Paper
Add Code

Fine-Tuning Language Models with Advantage-Induced Policy Alignment

1 code implementation • 4 Jun 2023 • Banghua Zhu, Hiteshi Sharma, Felipe Vieira Frujeri, Shi Dong, Chenguang Zhu, Michael I. Jordan, Jiantao Jiao

Reinforcement learning from human feedback (RLHF) has emerged as a reliable approach to aligning large language models (LLMs) to human preferences.

Paper
Code

On Optimal Caching and Model Multiplexing for Large Model Inference

1 code implementation • 3 Jun 2023 • Banghua Zhu, Ying Sheng, Lianmin Zheng, Clark Barrett, Michael I. Jordan, Jiantao Jiao

Theoretically, we provide an optimal algorithm for jointly optimizing both approaches to reduce the inference cost in both offline and online tabular settings.

Paper
Code

Doubly Robust Self-Training

1 code implementation • 1 Jun 2023 • Banghua Zhu, Mingyu Ding, Philip Jacobson, Ming Wu, Wei Zhan, Michael Jordan, Jiantao Jiao

Self-training is an important technique for solving semi-supervised learning problems.

3D Object Detection Autonomous Driving +2

Paper
Code

Online Learning in a Creator Economy

no code implementations • 19 May 2023 • Banghua Zhu, Sai Praneeth Karimireddy, Jiantao Jiao, Michael I. Jordan

In this paper, we initiate the study of online learning in the creator economy by modeling the creator economy as a three-party game between the users, platform, and content creators, with the platform interacting with the content creator under a principal-agent model through contracts to encourage better content.

Recommendation Systems

Paper
Add Code

Online Learning in Stackelberg Games with an Omniscient Follower

no code implementations • 27 Jan 2023 • Geng Zhao, Banghua Zhu, Jiantao Jiao, Michael I. Jordan

We analyze the sample complexity of regret minimization in this repeated Stackelberg game.

Paper
Add Code

Principled Reinforcement Learning with Human Feedback from Pairwise or $K$-wise Comparisons

no code implementations • 26 Jan 2023 • Banghua Zhu, Jiantao Jiao, Michael I. Jordan

Our analysis shows that when the true reward function is linear, the widely used maximum likelihood estimator (MLE) converges under both the Bradley-Terry-Luce (BTL) model and the Plackett-Luce (PL) model.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

The Sample Complexity of Online Contract Design

no code implementations • 10 Nov 2022 • Banghua Zhu, Stephen Bates, Zhuoran Yang, Yixin Wang, Jiantao Jiao, Michael I. Jordan

This result shows that exponential-in-$m$ samples are sufficient and necessary to learn a near-optimal contract, resolving an open problem on the hardness of online contract design.

Paper
Add Code

Byzantine-Robust Federated Learning with Optimal Statistical Rates and Privacy Guarantees

2 code implementations • 24 May 2022 • Banghua Zhu, Lun Wang, Qi Pang, Shuai Wang, Jiantao Jiao, Dawn Song, Michael I. Jordan

In contrast to prior work, our proposed protocols improve the dimension dependence and achieve a tight statistical rate in terms of all the parameters for strongly convex losses.

Federated Learning

Paper
Code

Jump-Start Reinforcement Learning

no code implementations • 5 Apr 2022 • Ikechukwu Uchendu, Ted Xiao, Yao Lu, Banghua Zhu, Mengyuan Yan, Joséphine Simon, Matthew Bennice, Chuyuan Fu, Cong Ma, Jiantao Jiao, Sergey Levine, Karol Hausman

In addition, we provide an upper bound on the sample complexity of JSRL and show that with the help of a guide-policy, one can improve the sample complexity for non-optimism exploration methods from exponential in horizon to polynomial.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Robust Estimation for Nonparametric Families via Generative Adversarial Networks

no code implementations • 2 Feb 2022 • Banghua Zhu, Jiantao Jiao, Michael I. Jordan

Prior work focus on the problem of robust mean and covariance estimation when the true distribution lies in the family of Gaussian distributions or elliptical distributions, and analyze depth or scoring rule based GAN losses for the problem.

Paper
Add Code

Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism

no code implementations • NeurIPS 2021 • Paria Rashidinejad, Banghua Zhu, Cong Ma, Jiantao Jiao, Stuart Russell

Based on the composition of the offline dataset, two main categories of methods are used: imitation learning which is suitable for expert datasets and vanilla offline RL which often requires uniform coverage datasets.

Imitation Learning Multi-Armed Bandits +3

Paper
Add Code

Minimax Off-Policy Evaluation for Multi-Armed Bandits

no code implementations • 19 Jan 2021 • Cong Ma, Banghua Zhu, Jiantao Jiao, Martin J. Wainwright

Second, when the behavior policy is unknown, we analyze performance in terms of the competitive ratio, thereby revealing a fundamental gap between the settings of known and unknown behavior policies.

Multi-Armed Bandits Off-policy evaluation

Paper
Add Code

Linear Representation Meta-Reinforcement Learning for Instant Adaptation

no code implementations • 12 Jan 2021 • Matt Peng, Banghua Zhu, Jiantao Jiao

This paper introduces Fast Linearized Adaptive Policy (FLAP), a new meta-reinforcement learning (meta-RL) method that is able to extrapolate well to out-of-distribution tasks without the need to reuse data from training, and adapt almost instantaneously with the need of only a few samples during testing.

Continuous Control Meta Reinforcement Learning +2

Paper
Add Code

Robust estimation via generalized quasi-gradients

no code implementations • 28 May 2020 • Banghua Zhu, Jiantao Jiao, Jacob Steinhardt

We study the loss landscape of these robust estimation problems, and identify the existence of "generalized quasi-gradients".

regression

Paper
Add Code

When does the Tukey median work?

no code implementations • 21 Jan 2020 • Banghua Zhu, Jiantao Jiao, Jacob Steinhardt

We show that under TV corruptions, the breakdown point reduces to 1/4 for the same set of distributions.

Paper
Add Code

Generalized Resilience and Robust Statistics

no code implementations • 19 Sep 2019 • Banghua Zhu, Jiantao Jiao, Jacob Steinhardt

This generalizes a property called resilience previously employed in the special case of mean estimation with outliers.

Paper
Add Code

Deconstructing Generative Adversarial Networks

no code implementations • 27 Jan 2019 • Banghua Zhu, Jiantao Jiao, David Tse

Generalization: given a population target of GANs, we design a systematic principle, projection under admissible distance, to design GANs to meet the population requirement using finite samples.

Paper
Add Code

Joint Transceiver Optimization for Wireless Communication PHY with Convolutional Neural Network

no code implementations • 9 Aug 2018 • Banghua Zhu, Jintao Wang, Longzhuang He, Jian Song

The simulation results show that the performance of neural network based system is superior to traditional modulation and equalization methods in terms of time complexity and bit error rate (BER) under fading channels.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.