no code implementations • 12 Feb 2024 • Benjamin Plaut, Hanlin Zhu, Stuart Russell
Specifically, we assume that the payoff each round represents the chance of avoiding catastrophe that round, and try to maximize the product of payoffs (the overall chance of avoiding catastrophe).
no code implementations • 2 Feb 2024 • Hanlin Zhu, Banghua Zhu, Jiantao Jiao
In this paper, we aim to improve the inference efficiency of LLMs by prompt caching, i. e., if the current prompt can be answered by the same response of a previous prompt, one can directly utilize that previous response without calling the LLM.
no code implementations • 13 Dec 2023 • Baihe Huang, Hanlin Zhu, Banghua Zhu, Kannan Ramchandran, Michael I. Jordan, Jason D. Lee, Jiantao Jiao
Key to our formulation is a coupling of the output tokens and the rejection region, realized by pseudo-random generators in practice, that allows non-trivial trade-offs between the Type I error and Type II error.
no code implementations • 13 Oct 2023 • Hanlin Zhu, Andrew Cohen, Danqing Wang, Kevin Yang, Xiaomeng Yang, Jiantao Jiao, Yuandong Tian
Story plots, while short, carry most of the essential information of a full story that may contain tens of thousands of words.
no code implementations • 5 Oct 2023 • Danqing Wang, Kevin Yang, Hanlin Zhu, Xiaomeng Yang, Andrew Cohen, Lei LI, Yuandong Tian
We further develop a personalized story evaluation model PERSE to infer reviewer preferences and provide a personalized evaluation.
no code implementations • 3 Oct 2023 • Hanlin Zhu, Baihe Huang, Stuart Russell
To the best of our knowledge, this work is the first to study the circuit complexity of RL, which also provides a rigorous framework for future research.
no code implementations • 22 Feb 2023 • Hanlin Zhu, Ruosong Wang, Jason D. Lee
Value function approximation is important in modern reinforcement learning (RL) problems especially when the state space is (infinitely) large.
1 code implementation • NeurIPS 2023 • Hanlin Zhu, Paria Rashidinejad, Jiantao Jiao
We propose A-Crab (Actor-Critic Regularized by Average Bellman error), a new practical algorithm for offline reinforcement learning (RL) in complex environments with insufficient data coverage.
no code implementations • 1 Nov 2022 • Paria Rashidinejad, Hanlin Zhu, Kunhe Yang, Stuart Russell, Jiantao Jiao
Offline reinforcement learning (RL), which refers to decision-making from a previously-collected dataset of interactions, has received significant attention over the past years.
no code implementations • 3 Jul 2021 • Cyrus Rashtchian, David P. Woodruff, Peng Ye, Hanlin Zhu
Our motivation is to understand the statistical-computational trade-offs in streaming, sketching, and query-based models.
no code implementations • 1 Jan 2021 • Hanlin Zhu, Chengyang Ying, Song Zuo
Recent theoretical analysis suggests that ultra-wide neural networks always converge to global minima near the initialization under first order methods.
no code implementations • 24 Jun 2020 • Cyrus Rashtchian, David P. Woodruff, Hanlin Zhu
We consider the general problem of learning about a matrix through vector-matrix-vector queries.
no code implementations • 19 Mar 2020 • Hanlin Zhu, Xue Li, Liuyang Sun, Fei He, Zhengtuo Zhao, Lan Luan, Ngoc Mai Tran, Chong Xie
Across many areas, from neural tracking to database entity resolution, manual assessment of clusters by human experts presents a bottleneck in rapid development of scalable and specialized clustering methods.
1 code implementation • IJCNLP 2019 • Ryuichi Takanobu, Hanlin Zhu, Minlie Huang
Many studies apply Reinforcement Learning to learn a dialog policy with the reward function which requires elaborate design and pre-specified user goals.