no code implementations • 29 May 2024 • Zhanhao Hu, Julien Piet, Geng Zhao, Jiantao Jiao, David Wagner

However, LLMs can fail to refuse toxic prompts or be overcautious and refuse benign examples.

no code implementations • 7 May 2024 • Hanlin Zhu, Baihe Huang, Shaolun Zhang, Michael Jordan, Jiantao Jiao, Yuandong Tian, Stuart Russell

Auto-regressive large language models (LLMs) show impressive capacities to solve many complex reasoning tasks while struggling with some simple logical reasoning tasks such as inverse search: when trained on ''A is B'', LLM fails to directly conclude ''B is A'' during inference, which is known as the ''reversal curse'' (Berglund et al., 2023).

no code implementations • 12 Apr 2024 • Nived Rajaraman, Jiantao Jiao, Kannan Ramchandran

In this paper, we investigate tokenization from a theoretical point of view by studying the behavior of transformers on simple data generating processes.

no code implementations • 20 Feb 2024 • Banghua Zhu, Norman Mu, Jiantao Jiao, David Wagner

Generative AI's expanding footprint across numerous industries has led to both excitement and increased scrutiny.

no code implementations • 2 Feb 2024 • Hanlin Zhu, Banghua Zhu, Jiantao Jiao

In this paper, we aim to improve the inference efficiency of LLMs by prompt caching, i. e., if the current prompt can be answered by the same response of a previous prompt, one can directly utilize that previous response without calling the LLM.

no code implementations • 29 Jan 2024 • Banghua Zhu, Michael I. Jordan, Jiantao Jiao

Reinforcement Learning from Human Feedback (RLHF) is a pivotal technique that aligns language models closely with human-centric values.

no code implementations • 13 Dec 2023 • Baihe Huang, Hanlin Zhu, Banghua Zhu, Kannan Ramchandran, Michael I. Jordan, Jason D. Lee, Jiantao Jiao

Key to our formulation is a coupling of the output tokens and the rejection region, realized by pseudo-random generators in practice, that allows non-trivial trade-offs between the Type I error and Type II error.

no code implementations • 13 Oct 2023 • Hanlin Zhu, Andrew Cohen, Danqing Wang, Kevin Yang, Xiaomeng Yang, Jiantao Jiao, Yuandong Tian

Story plots, while short, carry most of the essential information of a full story that may contain tens of thousands of words.

no code implementations • 30 Sep 2023 • Tianhao Wu, Banghua Zhu, Ruoyu Zhang, Zhaojin Wen, Kannan Ramchandran, Jiantao Jiao

In summary, this work introduces a simpler yet effective approach for aligning LLMs to human preferences through relative feedback.

no code implementations • 18 Sep 2023 • Jinning Li, Xinyi Liu, Banghua Zhu, Jiantao Jiao, Masayoshi Tomizuka, Chen Tang, Wei Zhan

GOLD distills an offline DT policy into a lightweight policy network through guided online safe RL training, which outperforms both the offline DT policy and online safe RL algorithms.

no code implementations • 7 Sep 2023 • Banghua Zhu, Ziao Wang, Nadim Ghaddar, Jiantao Jiao, Lele Wang

We consider the problem of computing a function of $n$ variables using noisy queries, where each query is incorrect with some fixed and known probability $p \in (0, 1/2)$.

no code implementations • 21 Jun 2023 • Banghua Zhu, Ziao Wang, Nadim Ghaddar, Jiantao Jiao, Lele Wang

However, the upper and lower bounds do not match in terms of the dependence on $\delta$ and $p$.

1 code implementation • 4 Jun 2023 • Banghua Zhu, Hiteshi Sharma, Felipe Vieira Frujeri, Shi Dong, Chenguang Zhu, Michael I. Jordan, Jiantao Jiao

Reinforcement learning from human feedback (RLHF) has emerged as a reliable approach to aligning large language models (LLMs) to human preferences.

1 code implementation • 3 Jun 2023 • Banghua Zhu, Ying Sheng, Lianmin Zheng, Clark Barrett, Michael I. Jordan, Jiantao Jiao

Theoretically, we provide an optimal algorithm for jointly optimizing both approaches to reduce the inference cost in both offline and online tabular settings.

1 code implementation • 1 Jun 2023 • Banghua Zhu, Mingyu Ding, Philip Jacobson, Ming Wu, Wei Zhan, Michael Jordan, Jiantao Jiao

Self-training is an important technique for solving semi-supervised learning problems.

no code implementations • 19 May 2023 • Banghua Zhu, Sai Praneeth Karimireddy, Jiantao Jiao, Michael I. Jordan

In this paper, we initiate the study of online learning in the creator economy by modeling the creator economy as a three-party game between the users, platform, and content creators, with the platform interacting with the content creator under a principal-agent model through contracts to encourage better content.

no code implementations • 12 Feb 2023 • Nived Rajaraman, Yanjun Han, Jiantao Jiao, Kannan Ramchandran

We consider the sequential decision-making problem where the mean outcome is a non-linear function of the chosen action.

1 code implementation • NeurIPS 2023 • Hanlin Zhu, Paria Rashidinejad, Jiantao Jiao

We propose A-Crab (Actor-Critic Regularized by Average Bellman error), a new practical algorithm for offline reinforcement learning (RL) in complex environments with insufficient data coverage.

no code implementations • 27 Jan 2023 • Geng Zhao, Banghua Zhu, Jiantao Jiao, Michael I. Jordan

We analyze the sample complexity of regret minimization in this repeated Stackelberg game.

no code implementations • 26 Jan 2023 • Banghua Zhu, Jiantao Jiao, Michael I. Jordan

Our analysis shows that when the true reward function is linear, the widely used maximum likelihood estimator (MLE) converges under both the Bradley-Terry-Luce (BTL) model and the Plackett-Luce (PL) model.

no code implementations • 10 Nov 2022 • Banghua Zhu, Stephen Bates, Zhuoran Yang, Yixin Wang, Jiantao Jiao, Michael I. Jordan

This result shows that exponential-in-$m$ samples are sufficient and necessary to learn a near-optimal contract, resolving an open problem on the hardness of online contract design.

no code implementations • 1 Nov 2022 • Yifei Wang, Tavor Baharav, Yanjun Han, Jiantao Jiao, David Tse

In the infinite-armed bandit problem, each arm's average reward is sampled from an unknown distribution, and each arm can be sampled further to obtain noisy estimates of the average reward of that arm.

no code implementations • 1 Nov 2022 • Paria Rashidinejad, Hanlin Zhu, Kunhe Yang, Stuart Russell, Jiantao Jiao

Offline reinforcement learning (RL), which refers to decision-making from a previously-collected dataset of interactions, has received significant attention over the past years.

1 code implementation • 30 May 2022 • Gokul Swamy, Nived Rajaraman, Matthew Peng, Sanjiban Choudhury, J. Andrew Bagnell, Zhiwei Steven Wu, Jiantao Jiao, Kannan Ramchandran

In the tabular setting or with linear function approximation, our meta theorem shows that the performance gap incurred by our approach achieves the optimal $\widetilde{O} \left( \min({H^{3/2}} / {N}, {H} / {\sqrt{N}} \right)$ dependency, under significantly weaker assumptions compared to prior work.

2 code implementations • 24 May 2022 • Banghua Zhu, Lun Wang, Qi Pang, Shuai Wang, Jiantao Jiao, Dawn Song, Michael I. Jordan

In contrast to prior work, our proposed protocols improve the dimension dependence and achieve a tight statistical rate in terms of all the parameters for strongly convex losses.

1 code implementation • 5 Apr 2022 • Ikechukwu Uchendu, Ted Xiao, Yao Lu, Banghua Zhu, Mengyuan Yan, Joséphine Simon, Matthew Bennice, Chuyuan Fu, Cong Ma, Jiantao Jiao, Sergey Levine, Karol Hausman

In addition, we provide an upper bound on the sample complexity of JSRL and show that with the help of a guide-policy, one can improve the sample complexity for non-optimism exploration methods from exponential in horizon to polynomial.

no code implementations • 2 Feb 2022 • Banghua Zhu, Jiantao Jiao, Michael I. Jordan

Prior work focus on the problem of robust mean and covariance estimation when the true distribution lies in the family of Gaussian distributions or elliptical distributions, and analyze depth or scoring rule based GAN losses for the problem.

no code implementations • 21 Dec 2021 • Tianhao Wu, Yunchang Yang, Han Zhong, LiWei Wang, Simon S. Du, Jiantao Jiao

Policy optimization methods are one of the most widely used classes of Reinforcement Learning (RL) algorithms.

no code implementations • NeurIPS 2021 • Nived Rajaraman, Yanjun Han, Lin Yang, Jingbo Liu, Jiantao Jiao, Kannan Ramchandran

In contrast, when the MDP transition structure is known to the learner such as in the case of simulators, we demonstrate fundamental differences compared to the tabular setting in terms of the performance of an optimal algorithm, Mimic-MD (Rajaraman et al. (2020)) when extended to the function approximation setting.

1 code implementation • 8 Jul 2021 • Yuexiang Zhai, Christina Baek, Zhengyuan Zhou, Jiantao Jiao, Yi Ma

In both OWSP and OWMP settings, we demonstrate that adding {\em intermediate rewards} to subgoals is more computationally efficient than only rewarding the agent once it completes the goal of reaching a terminal state.

1 code implementation • NeurIPS 2021 • Tianjun Zhang, Paria Rashidinejad, Jiantao Jiao, Yuandong Tian, Joseph Gonzalez, Stuart Russell

As a proof of concept, we evaluate the new intrinsic reward on tabular examples across a variety of model-based and model-free algorithms, showing improvements over count-only exploration strategies.

no code implementations • 7 Jun 2021 • Jinhyun So, Ramy E. Ali, Basak Guler, Jiantao Jiao, Salman Avestimehr

In fact, we show that the conventional random user selection strategies in FL lead to leaking users' individual models within number of rounds that is linear in the number of users.

no code implementations • NeurIPS 2021 • Paria Rashidinejad, Banghua Zhu, Cong Ma, Jiantao Jiao, Stuart Russell

Based on the composition of the offline dataset, two main categories of methods are used: imitation learning which is suitable for expert datasets and vanilla offline RL which often requires uniform coverage datasets.

no code implementations • 25 Feb 2021 • Nived Rajaraman, Yanjun Han, Lin F. Yang, Kannan Ramchandran, Jiantao Jiao

We establish an upper bound $O(|\mathcal{S}|H^{3/2}/N)$ for the suboptimality using the Mimic-MD algorithm in Rajaraman et al (2020) which we prove to be computationally efficient.

no code implementations • 19 Jan 2021 • Cong Ma, Banghua Zhu, Jiantao Jiao, Martin J. Wainwright

Second, when the behavior policy is unknown, we analyze performance in terms of the competitive ratio, thereby revealing a fundamental gap between the settings of known and unknown behavior policies.

no code implementations • 12 Jan 2021 • Matt Peng, Banghua Zhu, Jiantao Jiao

This paper introduces Fast Linearized Adaptive Policy (FLAP), a new meta-reinforcement learning (meta-RL) method that is able to extrapolate well to out-of-distribution tasks without the need to reuse data from training, and adapt almost instantaneously with the need of only a few samples during testing.

1 code implementation • NeurIPS 2020 • Paria Rashidinejad, Jiantao Jiao, Stuart Russell

Our theoretical and experimental results shed light on the conditions required for efficient probably approximately correct (PAC) learning of the Kalman filter from partially observed data.

no code implementations • NeurIPS 2020 • Nived Rajaraman, Lin F. Yang, Jiantao Jiao, Kannan Ramachandran

Here, we show that the policy which mimics the expert whenever possible is in expectation $\lesssim \frac{|\mathcal{S}| H^2 \log (N)}{N}$ suboptimal compared to the value of the expert, even when the expert follows an arbitrary stochastic policy.

no code implementations • 28 May 2020 • Banghua Zhu, Jiantao Jiao, Jacob Steinhardt

We study the loss landscape of these robust estimation problems, and identify the existence of "generalized quasi-gradients".

no code implementations • 21 Jan 2020 • Banghua Zhu, Jiantao Jiao, Jacob Steinhardt

We show that under TV corruptions, the breakdown point reduces to 1/4 for the same set of distributions.

no code implementations • 19 Sep 2019 • Banghua Zhu, Jiantao Jiao, Jacob Steinhardt

This generalizes a property called resilience previously employed in the special case of mean estimation with outliers.

no code implementations • 27 Jan 2019 • Banghua Zhu, Jiantao Jiao, David Tse

Generalization: given a population target of GANs, we design a systematic principle, projection under admissible distance, to design GANs to meet the population requirement using finite samples.

8 code implementations • 24 Jan 2019 • Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric P. Xing, Laurent El Ghaoui, Michael. I. Jordan

We identify a trade-off between robustness and accuracy that serves as a guiding principle in the design of defenses against adversarial examples.

Ranked #3 on Adversarial Attack on CIFAR-10

no code implementations • ICLR 2019 • Hongyang Zhang, Susu Xu, Jiantao Jiao, Pengtao Xie, Ruslan Salakhutdinov, Eric P. Xing

In this work, we give new results on the benefits of multi-generator architecture of GANs.

no code implementations • 23 Feb 2018 • Yanjun Han, Jiantao Jiao, Tsachy Weissman

We present \emph{Local Moment Matching (LMM)}, a unified methodology for symmetric functional estimation and distribution estimation under Wasserstein distance.

no code implementations • NeurIPS 2018 • Yanjun Han, Jiantao Jiao, Chuan-Zheng Lee, Tsachy Weissman, Yihong Wu, Tiancheng Yu

For estimating the Shannon entropy of a distribution on $S$ elements with independent samples, [Paninski2004] showed that the sample complexity is sublinear in $S$, and [Valiant--Valiant2011] showed that consistent estimation of Shannon entropy is possible if and only if the sample size $n$ far exceeds $\frac{S}{\log S}$.

no code implementations • 19 Dec 2017 • Dmitri S. Pavlichin, Jiantao Jiao, Tsachy Weissman

We propose an efficient algorithm for approximate computation of the profile maximum likelihood (PML), a variant of maximum likelihood maximizing the probability of observing a sufficient statistic rather than the empirical sample.

no code implementations • NeurIPS 2018 • Jiantao Jiao, Weihao Gao, Yanjun Han

We analyze the Kozachenko--Leonenko (KL) nearest neighbor estimator for the differential entropy.

no code implementations • 11 Oct 2017 • Yanjun Han, Jiantao Jiao, Rajarshi Mukherjee

We provide a complete picture of asymptotically minimax estimation of $L_r$-norms (for any $r\ge 1$) of the mean in Gaussian white noise model over Nikolskii-Besov spaces.

no code implementations • 18 Sep 2017 • Jiantao Jiao, Yanjun Han

We analyze bias correction methods using jackknife, bootstrap, and Taylor series.

no code implementations • 5 Jul 2017 • Jiantao Jiao, Yanjun Han, Irena Fischer-Hwang, Tsachy Weissman

We show through case studies that it is easier to estimate the fundamental limits of data processing than to construct explicit algorithms to achieve those limits.

no code implementations • 3 Nov 2016 • Sihan Li, Jiantao Jiao, Yanjun Han, Tsachy Weissman

We show that with or without nonlinearities, by adding shortcuts that have depth two, the condition number of the Hessian of the loss function at the zero initial point is depth-invariant, which makes training very deep models no more difficult than shallow ones.

no code implementations • 26 Sep 2014 • Jiantao Jiao, Kartik Venkat, Yanjun Han, Tsachy Weissman

In a nutshell, a message of this recent work is that, for a wide class of functionals, the performance of these essentially optimal estimators with $n$ samples is comparable to that of the MLE with $n \ln n$ samples.

3 code implementations • 11 Jan 2012 • Jiantao Jiao, Haim H. Permuter, Lei Zhao, Young-Han Kim, Tsachy Weissman

Four estimators of the directed information rate between a pair of jointly stationary ergodic finite-alphabet processes are proposed, based on universal probability assignments.

Information Theory Information Theory

Cannot find the paper you are looking for? You can
Submit a new open access paper.

Contact us on:
hello@paperswithcode.com
.
Papers With Code is a free resource with all data licensed under CC-BY-SA.