Search Results for author: Yihan Du

Found 16 papers, 0 papers with code

Combinatorial Pure Exploration for Dueling Bandit

no code implementations ICML 2020 Wei Chen, Yihan Du, Longbo Huang, Haoyu Zhao

For Borda winner, we establish a reduction of the problem to the original CPE-MAB setting and design PAC and exact algorithms that achieve both the sample complexity similar to that in the CPE-MAB setting (which is nearly optimal for a subclass of problems) and polynomial running time per round.

Position

Exploration-Driven Policy Optimization in RLHF: Theoretical Insights on Efficient Data Utilization

no code implementations15 Feb 2024 Yihan Du, Anna Winnicki, Gal Dalal, Shie Mannor, R. Srikant

In PO-RLHF, knowledge of the reward function is not assumed and the algorithm relies on trajectory-based comparison feedback to infer the reward function.

Cascading Reinforcement Learning

no code implementations17 Jan 2024 Yihan Du, R. Srikant, Wei Chen

In the cascading bandit model, at each timestep, an agent recommends an ordered subset of items (called an item list) from a pool of items, each associated with an unknown attraction probability.

Recommendation Systems reinforcement-learning

Multi-task Representation Learning for Pure Exploration in Linear Bandits

no code implementations9 Feb 2023 Yihan Du, Longbo Huang, Wen Sun

In these two problems, all tasks share a common low-dimensional linear representation, and our goal is to leverage this feature to accelerate the best arm (policy) identification process for all tasks.

Decision Making Representation Learning

Dueling Bandits: From Two-dueling to Multi-dueling

no code implementations16 Nov 2022 Yihan Du, Siwei Wang, Longbo Huang

DoublerBAI provides a generic schema for translating known results on best arm identification algorithms to the dueling bandit problem, and achieves a regret bound of $O(\ln T)$.

Vocal Bursts Valence Prediction

Branching Reinforcement Learning

no code implementations16 Feb 2022 Yihan Du, Wei Chen

In this paper, we propose a novel Branching Reinforcement Learning (Branching RL) model, and investigate both Regret Minimization (RM) and Reward-Free Exploration (RFE) metrics for this model.

LEMMA Recommendation Systems +2

Collaborative Pure Exploration in Kernel Bandit

no code implementations29 Oct 2021 Yihan Du, Wei Chen, Yuko Kuroki, Longbo Huang

In this paper, we formulate a Collaborative Pure Exploration in Kernel Bandit problem (CoPE-KB), which provides a novel model for multi-agent multi-task decision making under limited communication and general reward functions, and is applicable to many online learning tasks, e. g., recommendation systems and network scheduling.

Decision Making Recommendation Systems +1

Combinatorial Pure Exploration with Bottleneck Reward Function

no code implementations NeurIPS 2021 Yihan Du, Yuko Kuroki, Wei Chen

For the FC setting, we propose novel algorithms with optimal sample complexity for a broad family of instances and establish a matching lower bound to demonstrate the optimality (within a logarithmic factor).

Continuous Mean-Covariance Bandits

no code implementations NeurIPS 2021 Yihan Du, Siwei Wang, Zhixuan Fang, Longbo Huang

To the best of our knowledge, this is the first work that considers option correlation in risk-aware bandits and explicitly quantifies how arbitrary covariance structures impact the learning performance.

Decision Making

A One-Size-Fits-All Solution to Conservative Bandit Problems

no code implementations14 Dec 2020 Yihan Du, Siwei Wang, Longbo Huang

In this paper, we study a family of conservative bandit problems (CBPs) with sample-path reward constraints, i. e., the learner's reward performance must be at least as well as a given baseline at any time.

Multi-Armed Bandits

Combinatorial Pure Exploration of Dueling Bandit

no code implementations23 Jun 2020 Wei Chen, Yihan Du, Longbo Huang, Haoyu Zhao

For Borda winner, we establish a reduction of the problem to the original CPE-MAB setting and design PAC and exact algorithms that achieve both the sample complexity similar to that in the CPE-MAB setting (which is nearly optimal for a subclass of problems) and polynomial running time per round.

Position

Combinatorial Pure Exploration with Full-Bandit or Partial Linear Feedback

no code implementations14 Jun 2020 Yihan Du, Yuko Kuroki, Wei Chen

In this paper, we first study the problem of combinatorial pure exploration with full-bandit feedback (CPE-BL), where a learner is given a combinatorial action space $\mathcal{X} \subseteq \{0, 1\}^d$, and in each round the learner pulls an action $x \in \mathcal{X}$ and receives a random reward with expectation $x^{\top} \theta$, with $\theta \in \mathbb{R}^d$ a latent and unknown environment vector.

Object-Adaptive LSTM Network for Real-time Visual Tracking with Adversarial Data Augmentation

no code implementations7 Feb 2020 Yihan Du, Yan Yan, Si Chen, Yang Hua

This strategy efficiently filters out some irrelevant proposals and avoids the redundant computation for feature extraction, which enables our method to operate faster than conventional classification-based tracking methods.

Computational Efficiency Data Augmentation +3

Direct Object Recognition Without Line-of-Sight Using Optical Coherence

no code implementations CVPR 2019 Xin Lei, Liangyu He, Yixuan Tan, Ken Xingze Wang, Xinggang Wang, Yihan Du, Shanhui Fan, Zongfu Yu

Visual object recognition under situations in which the direct line-of-sight is blocked, such as when it is occluded around the corner, is of practical importance in a wide range of applications.

Object Object Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.