Search Results for author: Jiantao Jiao

Found 54 papers, 12 papers with code

Toxicity Detection for Free

no code implementations29 May 2024 Zhanhao Hu, Julien Piet, Geng Zhao, Jiantao Jiao, David Wagner

However, LLMs can fail to refuse toxic prompts or be overcautious and refuse benign examples.

Towards a Theoretical Understanding of the 'Reversal Curse' via Training Dynamics

no code implementations7 May 2024 Hanlin Zhu, Baihe Huang, Shaolun Zhang, Michael Jordan, Jiantao Jiao, Yuandong Tian, Stuart Russell

Auto-regressive large language models (LLMs) show impressive capacities to solve many complex reasoning tasks while struggling with some simple logical reasoning tasks such as inverse search: when trained on ''A is B'', LLM fails to directly conclude ''B is A'' during inference, which is known as the ''reversal curse'' (Berglund et al., 2023).

Logical Reasoning

Toward a Theory of Tokenization in LLMs

no code implementations12 Apr 2024 Nived Rajaraman, Jiantao Jiao, Kannan Ramchandran

In this paper, we investigate tokenization from a theoretical point of view by studying the behavior of transformers on simple data generating processes.

Language Modelling

Generative AI Security: Challenges and Countermeasures

no code implementations20 Feb 2024 Banghua Zhu, Norman Mu, Jiantao Jiao, David Wagner

Generative AI's expanding footprint across numerous industries has led to both excitement and increased scrutiny.

Efficient Prompt Caching via Embedding Similarity

no code implementations2 Feb 2024 Hanlin Zhu, Banghua Zhu, Jiantao Jiao

In this paper, we aim to improve the inference efficiency of LLMs by prompt caching, i. e., if the current prompt can be answered by the same response of a previous prompt, one can directly utilize that previous response without calling the LLM.

Question Answering

Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF

no code implementations29 Jan 2024 Banghua Zhu, Michael I. Jordan, Jiantao Jiao

Reinforcement Learning from Human Feedback (RLHF) is a pivotal technique that aligns language models closely with human-centric values.

Towards Optimal Statistical Watermarking

no code implementations13 Dec 2023 Baihe Huang, Hanlin Zhu, Banghua Zhu, Kannan Ramchandran, Michael I. Jordan, Jason D. Lee, Jiantao Jiao

Key to our formulation is a coupling of the output tokens and the rejection region, realized by pseudo-random generators in practice, that allows non-trivial trade-offs between the Type I error and Type II error.

End-to-end Story Plot Generator

no code implementations13 Oct 2023 Hanlin Zhu, Andrew Cohen, Danqing Wang, Kevin Yang, Xiaomeng Yang, Jiantao Jiao, Yuandong Tian

Story plots, while short, carry most of the essential information of a full story that may contain tens of thousands of words.


Pairwise Proximal Policy Optimization: Harnessing Relative Feedback for LLM Alignment

no code implementations30 Sep 2023 Tianhao Wu, Banghua Zhu, Ruoyu Zhang, Zhaojin Wen, Kannan Ramchandran, Jiantao Jiao

In summary, this work introduces a simpler yet effective approach for aligning LLMs to human preferences through relative feedback.

reinforcement-learning World Knowledge

Guided Online Distillation: Promoting Safe Reinforcement Learning by Offline Demonstration

no code implementations18 Sep 2023 Jinning Li, Xinyi Liu, Banghua Zhu, Jiantao Jiao, Masayoshi Tomizuka, Chen Tang, Wei Zhan

GOLD distills an offline DT policy into a lightweight policy network through guided online safe RL training, which outperforms both the offline DT policy and online safe RL algorithms.

Autonomous Driving Decision Making +3

Noisy Computing of the $\mathsf{OR}$ and $\mathsf{MAX}$ Functions

no code implementations7 Sep 2023 Banghua Zhu, Ziao Wang, Nadim Ghaddar, Jiantao Jiao, Lele Wang

We consider the problem of computing a function of $n$ variables using noisy queries, where each query is incorrect with some fixed and known probability $p \in (0, 1/2)$.

On the Optimal Bounds for Noisy Computing

no code implementations21 Jun 2023 Banghua Zhu, Ziao Wang, Nadim Ghaddar, Jiantao Jiao, Lele Wang

However, the upper and lower bounds do not match in terms of the dependence on $\delta$ and $p$.

Fine-Tuning Language Models with Advantage-Induced Policy Alignment

1 code implementation4 Jun 2023 Banghua Zhu, Hiteshi Sharma, Felipe Vieira Frujeri, Shi Dong, Chenguang Zhu, Michael I. Jordan, Jiantao Jiao

Reinforcement learning from human feedback (RLHF) has emerged as a reliable approach to aligning large language models (LLMs) to human preferences.

On Optimal Caching and Model Multiplexing for Large Model Inference

1 code implementation3 Jun 2023 Banghua Zhu, Ying Sheng, Lianmin Zheng, Clark Barrett, Michael I. Jordan, Jiantao Jiao

Theoretically, we provide an optimal algorithm for jointly optimizing both approaches to reduce the inference cost in both offline and online tabular settings.

Online Learning in a Creator Economy

no code implementations19 May 2023 Banghua Zhu, Sai Praneeth Karimireddy, Jiantao Jiao, Michael I. Jordan

In this paper, we initiate the study of online learning in the creator economy by modeling the creator economy as a three-party game between the users, platform, and content creators, with the platform interacting with the content creator under a principal-agent model through contracts to encourage better content.

Recommendation Systems

Statistical Complexity and Optimal Algorithms for Non-linear Ridge Bandits

no code implementations12 Feb 2023 Nived Rajaraman, Yanjun Han, Jiantao Jiao, Kannan Ramchandran

We consider the sequential decision-making problem where the mean outcome is a non-linear function of the chosen action.

Decision Making

Importance Weighted Actor-Critic for Optimal Conservative Offline Reinforcement Learning

1 code implementation NeurIPS 2023 Hanlin Zhu, Paria Rashidinejad, Jiantao Jiao

We propose A-Crab (Actor-Critic Regularized by Average Bellman error), a new practical algorithm for offline reinforcement learning (RL) in complex environments with insufficient data coverage.

reinforcement-learning Reinforcement Learning (RL)

Online Learning in Stackelberg Games with an Omniscient Follower

no code implementations27 Jan 2023 Geng Zhao, Banghua Zhu, Jiantao Jiao, Michael I. Jordan

We analyze the sample complexity of regret minimization in this repeated Stackelberg game.

Principled Reinforcement Learning with Human Feedback from Pairwise or $K$-wise Comparisons

no code implementations26 Jan 2023 Banghua Zhu, Jiantao Jiao, Michael I. Jordan

Our analysis shows that when the true reward function is linear, the widely used maximum likelihood estimator (MLE) converges under both the Bradley-Terry-Luce (BTL) model and the Plackett-Luce (PL) model.

reinforcement-learning Reinforcement Learning (RL)

The Sample Complexity of Online Contract Design

no code implementations10 Nov 2022 Banghua Zhu, Stephen Bates, Zhuoran Yang, Yixin Wang, Jiantao Jiao, Michael I. Jordan

This result shows that exponential-in-$m$ samples are sufficient and necessary to learn a near-optimal contract, resolving an open problem on the hardness of online contract design.

Beyond the Best: Estimating Distribution Functionals in Infinite-Armed Bandits

no code implementations1 Nov 2022 Yifei Wang, Tavor Baharav, Yanjun Han, Jiantao Jiao, David Tse

In the infinite-armed bandit problem, each arm's average reward is sampled from an unknown distribution, and each arm can be sampled further to obtain noisy estimates of the average reward of that arm.

Optimal Conservative Offline RL with General Function Approximation via Augmented Lagrangian

no code implementations1 Nov 2022 Paria Rashidinejad, Hanlin Zhu, Kunhe Yang, Stuart Russell, Jiantao Jiao

Offline reinforcement learning (RL), which refers to decision-making from a previously-collected dataset of interactions, has received significant attention over the past years.

Decision Making Offline RL +2

Minimax Optimal Online Imitation Learning via Replay Estimation

1 code implementation30 May 2022 Gokul Swamy, Nived Rajaraman, Matthew Peng, Sanjiban Choudhury, J. Andrew Bagnell, Zhiwei Steven Wu, Jiantao Jiao, Kannan Ramchandran

In the tabular setting or with linear function approximation, our meta theorem shows that the performance gap incurred by our approach achieves the optimal $\widetilde{O} \left( \min({H^{3/2}} / {N}, {H} / {\sqrt{N}} \right)$ dependency, under significantly weaker assumptions compared to prior work.

Continuous Control Imitation Learning

Byzantine-Robust Federated Learning with Optimal Statistical Rates and Privacy Guarantees

2 code implementations24 May 2022 Banghua Zhu, Lun Wang, Qi Pang, Shuai Wang, Jiantao Jiao, Dawn Song, Michael I. Jordan

In contrast to prior work, our proposed protocols improve the dimension dependence and achieve a tight statistical rate in terms of all the parameters for strongly convex losses.

Federated Learning

Jump-Start Reinforcement Learning

1 code implementation5 Apr 2022 Ikechukwu Uchendu, Ted Xiao, Yao Lu, Banghua Zhu, Mengyuan Yan, Joséphine Simon, Matthew Bennice, Chuyuan Fu, Cong Ma, Jiantao Jiao, Sergey Levine, Karol Hausman

In addition, we provide an upper bound on the sample complexity of JSRL and show that with the help of a guide-policy, one can improve the sample complexity for non-optimism exploration methods from exponential in horizon to polynomial.

reinforcement-learning Reinforcement Learning (RL)

Robust Estimation for Nonparametric Families via Generative Adversarial Networks

no code implementations2 Feb 2022 Banghua Zhu, Jiantao Jiao, Michael I. Jordan

Prior work focus on the problem of robust mean and covariance estimation when the true distribution lies in the family of Gaussian distributions or elliptical distributions, and analyze depth or scoring rule based GAN losses for the problem.

Nearly Optimal Policy Optimization with Stable at Any Time Guarantee

no code implementations21 Dec 2021 Tianhao Wu, Yunchang Yang, Han Zhong, LiWei Wang, Simon S. Du, Jiantao Jiao

Policy optimization methods are one of the most widely used classes of Reinforcement Learning (RL) algorithms.

4k Reinforcement Learning (RL)

On the Value of Interaction and Function Approximation in Imitation Learning

no code implementations NeurIPS 2021 Nived Rajaraman, Yanjun Han, Lin Yang, Jingbo Liu, Jiantao Jiao, Kannan Ramchandran

In contrast, when the MDP transition structure is known to the learner such as in the case of simulators, we demonstrate fundamental differences compared to the tabular setting in terms of the performance of an optimal algorithm, Mimic-MD (Rajaraman et al. (2020)) when extended to the function approximation setting.

Imitation Learning Multi-class Classification

Computational Benefits of Intermediate Rewards for Goal-Reaching Policy Learning

1 code implementation8 Jul 2021 Yuexiang Zhai, Christina Baek, Zhengyuan Zhou, Jiantao Jiao, Yi Ma

In both OWSP and OWMP settings, we demonstrate that adding {\em intermediate rewards} to subgoals is more computationally efficient than only rewarding the agent once it completes the goal of reaching a terminal state.

Hierarchical Reinforcement Learning Q-Learning +1

MADE: Exploration via Maximizing Deviation from Explored Regions

1 code implementation NeurIPS 2021 Tianjun Zhang, Paria Rashidinejad, Jiantao Jiao, Yuandong Tian, Joseph Gonzalez, Stuart Russell

As a proof of concept, we evaluate the new intrinsic reward on tabular examples across a variety of model-based and model-free algorithms, showing improvements over count-only exploration strategies.

Efficient Exploration Reinforcement Learning (RL)

Securing Secure Aggregation: Mitigating Multi-Round Privacy Leakage in Federated Learning

no code implementations7 Jun 2021 Jinhyun So, Ramy E. Ali, Basak Guler, Jiantao Jiao, Salman Avestimehr

In fact, we show that the conventional random user selection strategies in FL lead to leaking users' individual models within number of rounds that is linear in the number of users.

Fairness Federated Learning

Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism

no code implementations NeurIPS 2021 Paria Rashidinejad, Banghua Zhu, Cong Ma, Jiantao Jiao, Stuart Russell

Based on the composition of the offline dataset, two main categories of methods are used: imitation learning which is suitable for expert datasets and vanilla offline RL which often requires uniform coverage datasets.

Imitation Learning Multi-Armed Bandits +3

Provably Breaking the Quadratic Error Compounding Barrier in Imitation Learning, Optimally

no code implementations25 Feb 2021 Nived Rajaraman, Yanjun Han, Lin F. Yang, Kannan Ramchandran, Jiantao Jiao

We establish an upper bound $O(|\mathcal{S}|H^{3/2}/N)$ for the suboptimality using the Mimic-MD algorithm in Rajaraman et al (2020) which we prove to be computationally efficient.

Imitation Learning

Minimax Off-Policy Evaluation for Multi-Armed Bandits

no code implementations19 Jan 2021 Cong Ma, Banghua Zhu, Jiantao Jiao, Martin J. Wainwright

Second, when the behavior policy is unknown, we analyze performance in terms of the competitive ratio, thereby revealing a fundamental gap between the settings of known and unknown behavior policies.

Multi-Armed Bandits Off-policy evaluation

Linear Representation Meta-Reinforcement Learning for Instant Adaptation

no code implementations12 Jan 2021 Matt Peng, Banghua Zhu, Jiantao Jiao

This paper introduces Fast Linearized Adaptive Policy (FLAP), a new meta-reinforcement learning (meta-RL) method that is able to extrapolate well to out-of-distribution tasks without the need to reuse data from training, and adapt almost instantaneously with the need of only a few samples during testing.

Continuous Control Meta Reinforcement Learning +2

SLIP: Learning to Predict in Unknown Dynamical Systems with Long-Term Memory

1 code implementation NeurIPS 2020 Paria Rashidinejad, Jiantao Jiao, Stuart Russell

Our theoretical and experimental results shed light on the conditions required for efficient probably approximately correct (PAC) learning of the Kalman filter from partially observed data.

PAC learning

Toward the Fundamental Limits of Imitation Learning

no code implementations NeurIPS 2020 Nived Rajaraman, Lin F. Yang, Jiantao Jiao, Kannan Ramachandran

Here, we show that the policy which mimics the expert whenever possible is in expectation $\lesssim \frac{|\mathcal{S}| H^2 \log (N)}{N}$ suboptimal compared to the value of the expert, even when the expert follows an arbitrary stochastic policy.

Imitation Learning

Robust estimation via generalized quasi-gradients

no code implementations28 May 2020 Banghua Zhu, Jiantao Jiao, Jacob Steinhardt

We study the loss landscape of these robust estimation problems, and identify the existence of "generalized quasi-gradients".


When does the Tukey median work?

no code implementations21 Jan 2020 Banghua Zhu, Jiantao Jiao, Jacob Steinhardt

We show that under TV corruptions, the breakdown point reduces to 1/4 for the same set of distributions.

Generalized Resilience and Robust Statistics

no code implementations19 Sep 2019 Banghua Zhu, Jiantao Jiao, Jacob Steinhardt

This generalizes a property called resilience previously employed in the special case of mean estimation with outliers.

Deconstructing Generative Adversarial Networks

no code implementations27 Jan 2019 Banghua Zhu, Jiantao Jiao, David Tse

Generalization: given a population target of GANs, we design a systematic principle, projection under admissible distance, to design GANs to meet the population requirement using finite samples.

Theoretically Principled Trade-off between Robustness and Accuracy

8 code implementations24 Jan 2019 Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric P. Xing, Laurent El Ghaoui, Michael. I. Jordan

We identify a trade-off between robustness and accuracy that serves as a guiding principle in the design of defenses against adversarial examples.

Adversarial Attack Adversarial Defense +2

Local moment matching: A unified methodology for symmetric functional estimation and distribution estimation under Wasserstein distance

no code implementations23 Feb 2018 Yanjun Han, Jiantao Jiao, Tsachy Weissman

We present \emph{Local Moment Matching (LMM)}, a unified methodology for symmetric functional estimation and distribution estimation under Wasserstein distance.

Entropy Rate Estimation for Markov Chains with Large State Space

no code implementations NeurIPS 2018 Yanjun Han, Jiantao Jiao, Chuan-Zheng Lee, Tsachy Weissman, Yihong Wu, Tiancheng Yu

For estimating the Shannon entropy of a distribution on $S$ elements with independent samples, [Paninski2004] showed that the sample complexity is sublinear in $S$, and [Valiant--Valiant2011] showed that consistent estimation of Shannon entropy is possible if and only if the sample size $n$ far exceeds $\frac{S}{\log S}$.

Language Modelling

Approximate Profile Maximum Likelihood

no code implementations19 Dec 2017 Dmitri S. Pavlichin, Jiantao Jiao, Tsachy Weissman

We propose an efficient algorithm for approximate computation of the profile maximum likelihood (PML), a variant of maximum likelihood maximizing the probability of observing a sufficient statistic rather than the empirical sample.

The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal

no code implementations NeurIPS 2018 Jiantao Jiao, Weihao Gao, Yanjun Han

We analyze the Kozachenko--Leonenko (KL) nearest neighbor estimator for the differential entropy.

On Estimation of $L_{r}$-Norms in Gaussian White Noise Models

no code implementations11 Oct 2017 Yanjun Han, Jiantao Jiao, Rajarshi Mukherjee

We provide a complete picture of asymptotically minimax estimation of $L_r$-norms (for any $r\ge 1$) of the mean in Gaussian white noise model over Nikolskii-Besov spaces.

Bias Correction with Jackknife, Bootstrap, and Taylor Series

no code implementations18 Sep 2017 Jiantao Jiao, Yanjun Han

We analyze bias correction methods using jackknife, bootstrap, and Taylor series.

Estimating the Fundamental Limits is Easier than Achieving the Fundamental Limits

no code implementations5 Jul 2017 Jiantao Jiao, Yanjun Han, Irena Fischer-Hwang, Tsachy Weissman

We show through case studies that it is easier to estimate the fundamental limits of data processing than to construct explicit algorithms to achieve those limits.

Binary Classification Data Compression +1

Demystifying ResNet

no code implementations3 Nov 2016 Sihan Li, Jiantao Jiao, Yanjun Han, Tsachy Weissman

We show that with or without nonlinearities, by adding shortcuts that have depth two, the condition number of the Hessian of the loss function at the zero initial point is depth-invariant, which makes training very deep models no more difficult than shallow ones.

Beyond Maximum Likelihood: from Theory to Practice

no code implementations26 Sep 2014 Jiantao Jiao, Kartik Venkat, Yanjun Han, Tsachy Weissman

In a nutshell, a message of this recent work is that, for a wide class of functionals, the performance of these essentially optimal estimators with $n$ samples is comparable to that of the MLE with $n \ln n$ samples.

Universal Estimation of Directed Information

3 code implementations11 Jan 2012 Jiantao Jiao, Haim H. Permuter, Lei Zhao, Young-Han Kim, Tsachy Weissman

Four estimators of the directed information rate between a pair of jointly stationary ergodic finite-alphabet processes are proposed, based on universal probability assignments.

Information Theory Information Theory

Cannot find the paper you are looking for? You can Submit a new open access paper.