Search Results for author: Banghua Zhu

Found 31 papers, 7 papers with code

Generative AI Security: Challenges and Countermeasures

no code implementations20 Feb 2024 Banghua Zhu, Norman Mu, Jiantao Jiao, David Wagner

Generative AI's expanding footprint across numerous industries has led to both excitement and increased scrutiny.

Efficient Prompt Caching via Embedding Similarity

no code implementations2 Feb 2024 Hanlin Zhu, Banghua Zhu, Jiantao Jiao

In this paper, we aim to improve the inference efficiency of LLMs by prompt caching, i. e., if the current prompt can be answered by the same response of a previous prompt, one can directly utilize that previous response without calling the LLM.

Question Answering

Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF

no code implementations29 Jan 2024 Banghua Zhu, Michael I. Jordan, Jiantao Jiao

Reinforcement Learning from Human Feedback (RLHF) is a pivotal technique that aligns language models closely with human-centric values.

Fairness in Serving Large Language Models

1 code implementation31 Dec 2023 Ying Sheng, Shiyi Cao, Dacheng Li, Banghua Zhu, Zhuohan Li, Danyang Zhuo, Joseph E. Gonzalez, Ion Stoica

High-demand LLM inference services (e. g., ChatGPT and BARD) support a wide range of requests from short chat conversations to long document reading.

Fairness Scheduling

The Effective Horizon Explains Deep RL Performance in Stochastic Environments

1 code implementation13 Dec 2023 Cassidy Laidlaw, Banghua Zhu, Stuart Russell, Anca Dragan

Furthermore, SQIRL explains why random exploration works well in practice, since we show many environments can be solved by estimating the random policy's Q-function and then applying zero or a few steps of value iteration.

Reinforcement Learning (RL)

Towards Optimal Statistical Watermarking

no code implementations13 Dec 2023 Baihe Huang, Hanlin Zhu, Banghua Zhu, Kannan Ramchandran, Michael I. Jordan, Jason D. Lee, Jiantao Jiao

Key to our formulation is a coupling of the output tokens and the rejection region, realized by pseudo-random generators in practice, that allows non-trivial trade-offs between the Type I error and Type II error.

S-LoRA: Serving Thousands of Concurrent LoRA Adapters

1 code implementation6 Nov 2023 Ying Sheng, Shiyi Cao, Dacheng Li, Coleman Hooper, Nicholas Lee, Shuo Yang, Christopher Chou, Banghua Zhu, Lianmin Zheng, Kurt Keutzer, Joseph E. Gonzalez, Ion Stoica

To capitalize on these opportunities, we present S-LoRA, a system designed for the scalable serving of many LoRA adapters.

Towards the Fundamental Limits of Knowledge Transfer over Finite Domains

no code implementations11 Oct 2023 Qingyue Zhao, Banghua Zhu

The second level has the teacher probabilities of sampled labels available in addition, which turns out to boost the convergence rate lower bound to ${{|{\mathcal S}||{\mathcal A}|}/{n}}$.

Transfer Learning

QFT: Quantized Full-parameter Tuning of LLMs with Affordable Resources

no code implementations11 Oct 2023 Zhikai Li, Xiaoxuan Liu, Banghua Zhu, Zhen Dong, Qingyi Gu, Kurt Keutzer

Large Language Models (LLMs) have showcased remarkable impacts across a wide spectrum of natural language processing tasks.

Quantization

Pairwise Proximal Policy Optimization: Harnessing Relative Feedback for LLM Alignment

no code implementations30 Sep 2023 Tianhao Wu, Banghua Zhu, Ruoyu Zhang, Zhaojin Wen, Kannan Ramchandran, Jiantao Jiao

In summary, this work introduces a simpler yet effective approach for aligning LLMs to human preferences through relative feedback.

reinforcement-learning World Knowledge

Guided Online Distillation: Promoting Safe Reinforcement Learning by Offline Demonstration

no code implementations18 Sep 2023 Jinning Li, Xinyi Liu, Banghua Zhu, Jiantao Jiao, Masayoshi Tomizuka, Chen Tang, Wei Zhan

GOLD distills an offline DT policy into a lightweight policy network through guided online safe RL training, which outperforms both the offline DT policy and online safe RL algorithms.

Autonomous Driving Decision Making +3

Noisy Computing of the $\mathsf{OR}$ and $\mathsf{MAX}$ Functions

no code implementations7 Sep 2023 Banghua Zhu, Ziao Wang, Nadim Ghaddar, Jiantao Jiao, Lele Wang

We consider the problem of computing a function of $n$ variables using noisy queries, where each query is incorrect with some fixed and known probability $p \in (0, 1/2)$.

On the Optimal Bounds for Noisy Computing

no code implementations21 Jun 2023 Banghua Zhu, Ziao Wang, Nadim Ghaddar, Jiantao Jiao, Lele Wang

However, the upper and lower bounds do not match in terms of the dependence on $\delta$ and $p$.

Fine-Tuning Language Models with Advantage-Induced Policy Alignment

1 code implementation4 Jun 2023 Banghua Zhu, Hiteshi Sharma, Felipe Vieira Frujeri, Shi Dong, Chenguang Zhu, Michael I. Jordan, Jiantao Jiao

Reinforcement learning from human feedback (RLHF) has emerged as a reliable approach to aligning large language models (LLMs) to human preferences.

On Optimal Caching and Model Multiplexing for Large Model Inference

1 code implementation3 Jun 2023 Banghua Zhu, Ying Sheng, Lianmin Zheng, Clark Barrett, Michael I. Jordan, Jiantao Jiao

Theoretically, we provide an optimal algorithm for jointly optimizing both approaches to reduce the inference cost in both offline and online tabular settings.

Online Learning in a Creator Economy

no code implementations19 May 2023 Banghua Zhu, Sai Praneeth Karimireddy, Jiantao Jiao, Michael I. Jordan

In this paper, we initiate the study of online learning in the creator economy by modeling the creator economy as a three-party game between the users, platform, and content creators, with the platform interacting with the content creator under a principal-agent model through contracts to encourage better content.

Recommendation Systems

Online Learning in Stackelberg Games with an Omniscient Follower

no code implementations27 Jan 2023 Geng Zhao, Banghua Zhu, Jiantao Jiao, Michael I. Jordan

We analyze the sample complexity of regret minimization in this repeated Stackelberg game.

Principled Reinforcement Learning with Human Feedback from Pairwise or $K$-wise Comparisons

no code implementations26 Jan 2023 Banghua Zhu, Jiantao Jiao, Michael I. Jordan

Our analysis shows that when the true reward function is linear, the widely used maximum likelihood estimator (MLE) converges under both the Bradley-Terry-Luce (BTL) model and the Plackett-Luce (PL) model.

reinforcement-learning Reinforcement Learning (RL)

The Sample Complexity of Online Contract Design

no code implementations10 Nov 2022 Banghua Zhu, Stephen Bates, Zhuoran Yang, Yixin Wang, Jiantao Jiao, Michael I. Jordan

This result shows that exponential-in-$m$ samples are sufficient and necessary to learn a near-optimal contract, resolving an open problem on the hardness of online contract design.

Byzantine-Robust Federated Learning with Optimal Statistical Rates and Privacy Guarantees

2 code implementations24 May 2022 Banghua Zhu, Lun Wang, Qi Pang, Shuai Wang, Jiantao Jiao, Dawn Song, Michael I. Jordan

In contrast to prior work, our proposed protocols improve the dimension dependence and achieve a tight statistical rate in terms of all the parameters for strongly convex losses.

Federated Learning

Jump-Start Reinforcement Learning

no code implementations5 Apr 2022 Ikechukwu Uchendu, Ted Xiao, Yao Lu, Banghua Zhu, Mengyuan Yan, Joséphine Simon, Matthew Bennice, Chuyuan Fu, Cong Ma, Jiantao Jiao, Sergey Levine, Karol Hausman

In addition, we provide an upper bound on the sample complexity of JSRL and show that with the help of a guide-policy, one can improve the sample complexity for non-optimism exploration methods from exponential in horizon to polynomial.

reinforcement-learning Reinforcement Learning (RL)

Robust Estimation for Nonparametric Families via Generative Adversarial Networks

no code implementations2 Feb 2022 Banghua Zhu, Jiantao Jiao, Michael I. Jordan

Prior work focus on the problem of robust mean and covariance estimation when the true distribution lies in the family of Gaussian distributions or elliptical distributions, and analyze depth or scoring rule based GAN losses for the problem.

Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism

no code implementations NeurIPS 2021 Paria Rashidinejad, Banghua Zhu, Cong Ma, Jiantao Jiao, Stuart Russell

Based on the composition of the offline dataset, two main categories of methods are used: imitation learning which is suitable for expert datasets and vanilla offline RL which often requires uniform coverage datasets.

Imitation Learning Multi-Armed Bandits +3

Minimax Off-Policy Evaluation for Multi-Armed Bandits

no code implementations19 Jan 2021 Cong Ma, Banghua Zhu, Jiantao Jiao, Martin J. Wainwright

Second, when the behavior policy is unknown, we analyze performance in terms of the competitive ratio, thereby revealing a fundamental gap between the settings of known and unknown behavior policies.

Multi-Armed Bandits Off-policy evaluation

Linear Representation Meta-Reinforcement Learning for Instant Adaptation

no code implementations12 Jan 2021 Matt Peng, Banghua Zhu, Jiantao Jiao

This paper introduces Fast Linearized Adaptive Policy (FLAP), a new meta-reinforcement learning (meta-RL) method that is able to extrapolate well to out-of-distribution tasks without the need to reuse data from training, and adapt almost instantaneously with the need of only a few samples during testing.

Continuous Control Meta Reinforcement Learning +2

Robust estimation via generalized quasi-gradients

no code implementations28 May 2020 Banghua Zhu, Jiantao Jiao, Jacob Steinhardt

We study the loss landscape of these robust estimation problems, and identify the existence of "generalized quasi-gradients".

regression

When does the Tukey median work?

no code implementations21 Jan 2020 Banghua Zhu, Jiantao Jiao, Jacob Steinhardt

We show that under TV corruptions, the breakdown point reduces to 1/4 for the same set of distributions.

Generalized Resilience and Robust Statistics

no code implementations19 Sep 2019 Banghua Zhu, Jiantao Jiao, Jacob Steinhardt

This generalizes a property called resilience previously employed in the special case of mean estimation with outliers.

Deconstructing Generative Adversarial Networks

no code implementations27 Jan 2019 Banghua Zhu, Jiantao Jiao, David Tse

Generalization: given a population target of GANs, we design a systematic principle, projection under admissible distance, to design GANs to meet the population requirement using finite samples.

Joint Transceiver Optimization for Wireless Communication PHY with Convolutional Neural Network

no code implementations9 Aug 2018 Banghua Zhu, Jintao Wang, Longzhuang He, Jian Song

The simulation results show that the performance of neural network based system is superior to traditional modulation and equalization methods in terms of time complexity and bit error rate (BER) under fading channels.

Cannot find the paper you are looking for? You can Submit a new open access paper.