Search Results for author: Hanlin Zhu

Found 14 papers, 2 papers with code

Guided Dialog Policy Learning: Reward Estimation for Multi-Domain Task-Oriented Dialog

1 code implementation IJCNLP 2019 Ryuichi Takanobu, Hanlin Zhu, Minlie Huang

Many studies apply Reinforcement Learning to learn a dialog policy with the reward function which requires elaborate design and pre-specified user goals.

reinforcement-learning Reinforcement Learning (RL)

Clustering with Fast, Automated and Reproducible assessment applied to longitudinal neural tracking

no code implementations19 Mar 2020 Hanlin Zhu, Xue Li, Liuyang Sun, Fei He, Zhengtuo Zhao, Lan Luan, Ngoc Mai Tran, Chong Xie

Across many areas, from neural tracking to database entity resolution, manual assessment of clusters by human experts presents a bottleneck in rapid development of scalable and specialized clustering methods.

Clustering Entity Resolution +2

Vector-Matrix-Vector Queries for Solving Linear Algebra, Statistics, and Graph Problems

no code implementations24 Jun 2020 Cyrus Rashtchian, David P. Woodruff, Hanlin Zhu

We consider the general problem of learning about a matrix through vector-matrix-vector queries.

Analysis of Alignment Phenomenon in Simple Teacher-student Networks with Finite Width

no code implementations1 Jan 2021 Hanlin Zhu, Chengyang Ying, Song Zuo

Recent theoretical analysis suggests that ultra-wide neural networks always converge to global minima near the initialization under first order methods.

Average-Case Communication Complexity of Statistical Problems

no code implementations3 Jul 2021 Cyrus Rashtchian, David P. Woodruff, Peng Ye, Hanlin Zhu

Our motivation is to understand the statistical-computational trade-offs in streaming, sketching, and query-based models.

Optimal Conservative Offline RL with General Function Approximation via Augmented Lagrangian

no code implementations1 Nov 2022 Paria Rashidinejad, Hanlin Zhu, Kunhe Yang, Stuart Russell, Jiantao Jiao

Offline reinforcement learning (RL), which refers to decision-making from a previously-collected dataset of interactions, has received significant attention over the past years.

Decision Making Offline RL +2

Importance Weighted Actor-Critic for Optimal Conservative Offline Reinforcement Learning

1 code implementation NeurIPS 2023 Hanlin Zhu, Paria Rashidinejad, Jiantao Jiao

We propose A-Crab (Actor-Critic Regularized by Average Bellman error), a new practical algorithm for offline reinforcement learning (RL) in complex environments with insufficient data coverage.

reinforcement-learning Reinforcement Learning (RL)

Provably Efficient Reinforcement Learning via Surprise Bound

no code implementations22 Feb 2023 Hanlin Zhu, Ruosong Wang, Jason D. Lee

Value function approximation is important in modern reinforcement learning (RL) problems especially when the state space is (infinitely) large.

reinforcement-learning Reinforcement Learning (RL)

On Representation Complexity of Model-based and Model-free Reinforcement Learning

no code implementations3 Oct 2023 Hanlin Zhu, Baihe Huang, Stuart Russell

To the best of our knowledge, this work is the first to study the circuit complexity of RL, which also provides a rigorous framework for future research.

reinforcement-learning Reinforcement Learning (RL)

Learning Personalized Story Evaluation

no code implementations5 Oct 2023 Danqing Wang, Kevin Yang, Hanlin Zhu, Xiaomeng Yang, Andrew Cohen, Lei LI, Yuandong Tian

We further develop a personalized story evaluation model PERSE to infer reviewer preferences and provide a personalized evaluation.

Retrieval Text Generation

End-to-end Story Plot Generator

no code implementations13 Oct 2023 Hanlin Zhu, Andrew Cohen, Danqing Wang, Kevin Yang, Xiaomeng Yang, Jiantao Jiao, Yuandong Tian

Story plots, while short, carry most of the essential information of a full story that may contain tens of thousands of words.

Blocking

Towards Optimal Statistical Watermarking

no code implementations13 Dec 2023 Baihe Huang, Hanlin Zhu, Banghua Zhu, Kannan Ramchandran, Michael I. Jordan, Jason D. Lee, Jiantao Jiao

Key to our formulation is a coupling of the output tokens and the rejection region, realized by pseudo-random generators in practice, that allows non-trivial trade-offs between the Type I error and Type II error.

Efficient Prompt Caching via Embedding Similarity

no code implementations2 Feb 2024 Hanlin Zhu, Banghua Zhu, Jiantao Jiao

In this paper, we aim to improve the inference efficiency of LLMs by prompt caching, i. e., if the current prompt can be answered by the same response of a previous prompt, one can directly utilize that previous response without calling the LLM.

Question Answering

Avoiding Catastrophe in Continuous Spaces by Asking for Help

no code implementations12 Feb 2024 Benjamin Plaut, Hanlin Zhu, Stuart Russell

Specifically, we assume that the payoff each round represents the chance of avoiding catastrophe that round, and try to maximize the product of payoffs (the overall chance of avoiding catastrophe).

Cannot find the paper you are looking for? You can Submit a new open access paper.