Search Results for author: Zheng Wen

Found 66 papers, 9 papers with code

A Tutorial on Thompson Sampling

2 code implementations7 Jul 2017 Daniel Russo, Benjamin Van Roy, Abbas Kazerouni, Ian Osband, Zheng Wen

Thompson sampling is an algorithm for online decision problems where actions are taken sequentially in a manner that must balance between exploiting what is known to maximize immediate performance and investing to accumulate new information that may improve future performance.

Active Learning Product Recommendation +1

Epistemic Neural Networks

1 code implementation NeurIPS 2023 Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Morteza Ibrahimi, Xiuyuan Lu, Benjamin Van Roy

We introduce the epinet: an architecture that can supplement any conventional neural network, including large pretrained models, and can be trained with modest incremental computation to estimate uncertainty.

Approximate Thompson Sampling via Epistemic Neural Networks

1 code implementation18 Feb 2023 Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Morteza Ibrahimi, Xiuyuan Lu, Benjamin Van Roy

Further, we demonstrate that the \textit{epinet} -- a small additive network that estimates uncertainty -- matches the performance of large ensembles at orders of magnitude lower computational cost.

Thompson Sampling

Online Influence Maximization under Independent Cascade Model with Semi-Bandit Feedback

1 code implementation NeurIPS 2017 Zheng Wen, Branislav Kveton, Michal Valko, Sharan Vaswani

Specifically, we aim to learn the set of "best influencers" in a social network online while repeatedly interacting with it.

Cascading Bandits for Large-Scale Recommendation Problems

1 code implementation17 Mar 2016 Shi Zong, Hao Ni, Kenny Sung, Nan Rosemary Ke, Zheng Wen, Branislav Kveton

In this work, we study cascading bandits, an online learning variant of the cascade model where the goal is to recommend $K$ most attractive items from a large set of $L$ candidate items.

Multi-Armed Bandits Recommendation Systems +1

Generalization and Exploration via Randomized Value Functions

1 code implementation4 Feb 2014 Ian Osband, Benjamin Van Roy, Zheng Wen

We propose randomized least-squares value iteration (RLSVI) -- a new reinforcement learning algorithm designed to explore and generalize efficiently via linearly parameterized value functions.

Efficient Exploration reinforcement-learning +1

Offline Evaluation of Ranking Policies with Click Models

no code implementations27 Apr 2018 Shuai Li, Yasin Abbasi-Yadkori, Branislav Kveton, S. Muthukrishnan, Vishwa Vinay, Zheng Wen

We analyze our estimators and prove that they are more efficient than the estimators that do not use the structure of the click model, under the assumption that the click model holds.

Recommendation Systems

Deep Exploration via Randomized Value Functions

no code implementations22 Mar 2017 Ian Osband, Benjamin Van Roy, Daniel Russo, Zheng Wen

We study the use of randomized value functions to guide deep exploration in reinforcement learning.

Efficient Exploration reinforcement-learning +1

Conservative Exploration using Interleaving

no code implementations3 Jun 2018 Sumeet Katariya, Branislav Kveton, Zheng Wen, Vamsi K. Potluru

In many practical problems, a learning agent may want to learn the best action in hindsight without ever taking a bad action, which is significantly worse than the default production action.

Model-Independent Online Learning for Influence Maximization

no code implementations ICML 2017 Sharan Vaswani, Branislav Kveton, Zheng Wen, Mohammad Ghavamzadeh, Laks Lakshmanan, Mark Schmidt

We consider influence maximization (IM) in social networks, which is the problem of maximizing the number of users that become aware of a product by selecting a set of "seed" users to expose the product to.

Nearly Optimal Adaptive Procedure with Change Detection for Piecewise-Stationary Bandit

no code implementations11 Feb 2018 Yang Cao, Zheng Wen, Branislav Kveton, Yao Xie

Multi-armed bandit (MAB) is a class of online learning problems where a learning agent aims to maximize its expected cumulative reward while repeatedly selecting to pull arms with unknown reward distributions.

Change Detection

Posterior Sampling for Large Scale Reinforcement Learning

no code implementations21 Nov 2017 Georgios Theocharous, Zheng Wen, Yasin Abbasi-Yadkori, Nikos Vlassis

Our algorithm termed deterministic schedule PSRL (DS-PSRL) is efficient in terms of time, sample, and space complexity.

reinforcement-learning Reinforcement Learning (RL)

Does Weather Matter? Causal Analysis of TV Logs

no code implementations25 Jan 2017 Shi Zong, Branislav Kveton, Shlomo Berkovsky, Azin Ashkan, Nikos Vlassis, Zheng Wen

To the best of our knowledge, this is the first large-scale causal study of the impact of weather on TV watching patterns.

BIG-bench Machine Learning

Bernoulli Rank-$1$ Bandits for Click Feedback

no code implementations19 Mar 2017 Sumeet Katariya, Branislav Kveton, Csaba Szepesvári, Claire Vernade, Zheng Wen

The probability that a user will click a search result depends both on its relevance and its position on the results page.

Position

Stochastic Rank-1 Bandits

no code implementations10 Aug 2016 Sumeet Katariya, Branislav Kveton, Csaba Szepesvari, Claire Vernade, Zheng Wen

The main challenge of the problem is that the individual values of the row and column are unobserved.

Efficient Learning in Large-Scale Combinatorial Semi-Bandits

no code implementations28 Jun 2014 Zheng Wen, Branislav Kveton, Azin Ashkan

A stochastic combinatorial semi-bandit is an online learning problem where at each step a learning agent chooses a subset of ground items subject to combinatorial constraints, and then observes stochastic weights of these items and receives their sum as a payoff.

Thompson Sampling

Efficient Reinforcement Learning in Deterministic Systems with Value Function Generalization

no code implementations18 Jul 2013 Zheng Wen, Benjamin Van Roy

We consider the problem of reinforcement learning over episodes of a finite-horizon deterministic system and as a solution propose optimistic constraint propagation (OCP), an algorithm designed to synthesize efficient exploration and value function generalization.

Efficient Exploration reinforcement-learning +1

DCM Bandits: Learning to Rank with Multiple Clicks

1 code implementation9 Feb 2016 Sumeet Katariya, Branislav Kveton, Csaba Szepesvári, Zheng Wen

This work presents the first practical and regret-optimal online algorithm for learning to rank with multiple clicks in a cascade-like click model.

Learning-To-Rank

Combinatorial Cascading Bandits

no code implementations NeurIPS 2015 Branislav Kveton, Zheng Wen, Azin Ashkan, Csaba Szepesvari

The agent observes the index of the first chosen item whose weight is zero.

Cascading Bandits: Learning to Rank in the Cascade Model

no code implementations10 Feb 2015 Branislav Kveton, Csaba Szepesvari, Zheng Wen, Azin Ashkan

We also prove gap-dependent upper bounds on the regret of these algorithms and derive a lower bound on the regret in cascading bandits.

Learning-To-Rank

Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits

no code implementations3 Oct 2014 Branislav Kveton, Zheng Wen, Azin Ashkan, Csaba Szepesvari

A stochastic combinatorial semi-bandit is an online learning problem where at each step a learning agent chooses a subset of ground items subject to constraints, and then observes stochastic weights of these items and receives their sum as a payoff.

Learning to Act Greedily: Polymatroid Semi-Bandits

no code implementations30 May 2014 Branislav Kveton, Zheng Wen, Azin Ashkan, Michal Valko

Many important optimization problems, such as the minimum spanning tree and minimum-cost flow, can be solved optimally by a greedy method.

DUM: Diversity-Weighted Utility Maximization for Recommendations

no code implementations13 Nov 2014 Azin Ashkan, Branislav Kveton, Shlomo Berkovsky, Zheng Wen

The need for diversification of recommendation lists manifests in a number of recommender systems use cases.

Recommendation Systems

Optimal Demand Response Using Device Based Reinforcement Learning

no code implementations8 Jan 2014 Zheng Wen, Daniel O'Neill, Hamid Reza Maei

Demand response (DR) for residential and small commercial buildings is estimated to account for as much as 65% of the total energy savings potential of DR, and previous work shows that a fully automated Energy Management System (EMS) is a necessary prerequisite to DR in these areas.

energy management Management +4

Profit Maximization for Online Advertising Demand-Side Platforms

no code implementations6 Jun 2017 Paul Grigas, Alfonso Lobos, Zheng Wen, Kuang-Chih Lee

We develop an optimization model and corresponding algorithm for the management of a demand-side platform (DSP), whereby the DSP aims to maximize its own profit while acquiring valuable impressions for its advertiser clients.

Optimization and Control Computer Science and Game Theory

Online Diverse Learning to Rank from Partial-Click Feedback

no code implementations1 Nov 2018 Prakhar Gupta, Gaurush Hiranandani, Harvineet Singh, Branislav Kveton, Zheng Wen, Iftikhar Ahamath Burhanuddin

We assume that the user examines the list of recommended items until the user is attracted by an item, which is clicked, and does not examine the rest of the items.

Learning-To-Rank Recommendation Systems

Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits

no code implementations13 Nov 2018 Branislav Kveton, Csaba Szepesvari, Sharan Vaswani, Zheng Wen, Mohammad Ghavamzadeh, Tor Lattimore

Specifically, it pulls the arm with the highest mean reward in a non-parametric bootstrap sample of its history with pseudo rewards.

Multi-Armed Bandits

Scalar Posterior Sampling with Applications

no code implementations NeurIPS 2018 Georgios Theocharous, Zheng Wen, Yasin Abbasi, Nikos Vlassis

Our algorithm termed deterministic schedule PSRL (DS-PSRL) is efficient in terms of time, sample, and space complexity.

Adaptive Submodular Maximization in Bandit Setting

no code implementations NeurIPS 2013 Victor Gabillon, Branislav Kveton, Zheng Wen, Brian Eriksson, S. Muthukrishnan

Maximization of submodular functions has wide applications in machine learning and artificial intelligence.

Efficient Exploration and Value Function Generalization in Deterministic Systems

no code implementations NeurIPS 2013 Zheng Wen, Benjamin Van Roy

We consider the problem of reinforcement learning over episodes of a finite-horizon deterministic system and as a solution propose optimistic constraint propagation (OCP), an algorithm designed to synthesize efficient exploration and value function generalization.

Efficient Exploration reinforcement-learning +1

Scalable Thompson Sampling via Optimal Transport

no code implementations19 Feb 2019 Ruiyi Zhang, Zheng Wen, Changyou Chen, Lawrence Carin

Thompson sampling (TS) is a class of algorithms for sequential decision-making, which requires maintaining a posterior distribution over a model.

Decision Making Thompson Sampling

Stochastic Online Learning with Probabilistic Graph Feedback

no code implementations4 Mar 2019 Shuai Li, Wei Chen, Zheng Wen, Kwong-Sak Leung

We consider a problem of stochastic online learning with general probabilistic graph feedback, where each directed edge in the feedback graph has probability $p_{ij}$.

Waterfall Bandits: Learning to Sell Ads Online

no code implementations20 Apr 2019 Branislav Kveton, Saied Mahdian, S. Muthukrishnan, Zheng Wen, Yikun Xian

We design an online learning algorithm for solving this problem, which interleaves learning and optimization, and prove that this algorithm has sublinear regret.

Bootstrapping Upper Confidence Bound

no code implementations NeurIPS 2019 Botao Hao, Yasin Abbasi-Yadkori, Zheng Wen, Guang Cheng

Upper Confidence Bound (UCB) method is arguably the most celebrated one used in online decision making with partial information feedback.

Decision Making Multi-Armed Bandits

Structured Policy Iteration for Linear Quadratic Regulator

no code implementations ICML 2020 Youngsuk Park, Ryan A. Rossi, Zheng Wen, Gang Wu, Handong Zhao

In this paper, we introduce the \textit{Structured Policy Iteration} (S-PI) for LQR, a method capable of deriving a structured linear policy.

On the Sample Complexity of Reinforcement Learning with Policy Space Generalization

no code implementations17 Aug 2020 Wenlong Mou, Zheng Wen, Xi Chen

To avoid such undesirable dependence on the state and action space sizes, this paper proposes a new notion of eluder dimension for the policy space, which characterizes the intrinsic complexity of policy learning in an arbitrary Markov Decision Process (MDP).

reinforcement-learning Reinforcement Learning (RL)

Influence Diagram Bandits

no code implementations ICML 2020 Tong Yu, Branislav Kveton, Zheng Wen, Ruiyi Zhang, Ole J. Mengshoel

We experiment with three structured bandit problems: cascading bandits, online learning to rank in the position-based model, and rank-1 bandits.

Learning-To-Rank Position

Budgeted Online Influence Maximization

no code implementations ICML 2020 Pierre Perrault, Zheng Wen, Michal Valko, Jennifer Healey

We introduce a new budgeted framework for online influence maximization, considering the total cost of an advertising campaign instead of the common cardinality constraint on a chosen influencer set.

valid

A Benchmark and Baseline for Language-Driven Image Editing

no code implementations5 Oct 2020 Jing Shi, Ning Xu, Trung Bui, Franck Dernoncourt, Zheng Wen, Chenliang Xu

To solve this new task, we first present a new language-driven image editing dataset that supports both local and global editing with editing operation and mask annotations.

On Efficiency in Hierarchical Reinforcement Learning

no code implementations NeurIPS 2020 Zheng Wen, Doina Precup, Morteza Ibrahimi, Andre Barreto, Benjamin Van Roy, Satinder Singh

Hierarchical Reinforcement Learning (HRL) approaches promise to provide more efficient solutions to sequential decision making problems, both in terms of statistical as well as computational efficiency.

Computational Efficiency Decision Making +4

Neural Contextual Bandits with Deep Representation and Shallow Exploration

no code implementations NeurIPS 2021 Pan Xu, Zheng Wen, Handong Zhao, Quanquan Gu

We study a general class of contextual bandits, where each context-action pair is associated with a raw feature vector, but the reward generating function is unknown.

Multi-Armed Bandits Representation Learning

On the Approximation Relationship between Optimizing Ratio of Submodular (RS) and Difference of Submodular (DS) Functions

no code implementations5 Jan 2021 Pierre Perrault, Jennifer Healey, Zheng Wen, Michal Valko

We demonstrate that from an algorithm guaranteeing an approximation factor for the ratio of submodular (RS) optimization problem, we can build another algorithm having a different kind of approximation guarantee -- weaker than the classical one -- for the difference of submodular (DS) optimization problem, and vice versa.

Data Structures and Algorithms

Reinforcement Learning, Bit by Bit

no code implementations6 Mar 2021 Xiuyuan Lu, Benjamin Van Roy, Vikranth Dwaracherla, Morteza Ibrahimi, Ian Osband, Zheng Wen

To illustrate concepts, we design simple agents that build on them and present computational results that highlight data efficiency.

reinforcement-learning Reinforcement Learning (RL)

Joint Online Learning and Decision-making via Dual Mirror Descent

no code implementations20 Apr 2021 Alfonso Lobos, Paul Grigas, Zheng Wen

We consider an online revenue maximization problem over a finite time horizon subject to lower and upper bounds on cost.

Decision Making

An Analysis of Ensemble Sampling

no code implementations2 Mar 2022 Chao Qin, Zheng Wen, Xiuyuan Lu, Benjamin Van Roy

Ensemble sampling serves as a practical approximation to Thompson sampling when maintaining an exact posterior distribution over model parameters is computationally intractable.

Thompson Sampling

Ensembles for Uncertainty Estimation: Benefits of Prior Functions and Bootstrapping

no code implementations8 Jun 2022 Vikranth Dwaracherla, Zheng Wen, Ian Osband, Xiuyuan Lu, Seyed Mohammad Asghari, Benjamin Van Roy

In machine learning, an agent needs to estimate uncertainty to efficiently explore and adapt and to make effective decisions.

Robustness of Epinets against Distributional Shifts

no code implementations1 Jul 2022 Xiuyuan Lu, Ian Osband, Seyed Mohammad Asghari, Sven Gowal, Vikranth Dwaracherla, Zheng Wen, Benjamin Van Roy

However, these improvements are relatively small compared to the outstanding issues in distributionally-robust deep learning.

Leveraging Demonstrations to Improve Online Learning: Quality Matters

no code implementations7 Feb 2023 Botao Hao, Rahul Jain, Tor Lattimore, Benjamin Van Roy, Zheng Wen

This offers insight into how pretraining can greatly improve online performance and how the degree of improvement increases with the expert's competence level.

Thompson Sampling

Bridging Imitation and Online Reinforcement Learning: An Optimistic Tale

no code implementations20 Mar 2023 Botao Hao, Rahul Jain, Dengwang Tang, Zheng Wen

We first propose an Informed Posterior Sampling-based RL (iPSRL) algorithm that uses the offline dataset, and information about the expert's behavioral policy used to generate the offline dataset.

Imitation Learning reinforcement-learning +1

Active learning for effective Hamiltonian of super-large-scale atomic structures

no code implementations18 Jul 2023 Xingyue Ma, Hongying Chen, Ri He, Zhanbo Yu, Sergei Prokhorenko, Zheng Wen, Zhicheng Zhong, Jorge Iñiguez, L. Bellaiche, Di wu, Yurong Yang

However, the parametrization method of the effective Hamiltonian is complicated and hardly can resolve the systems with complex interactions and/or complex components.

Active Learning

Efficient Online Learning with Offline Datasets for Infinite Horizon MDPs: A Bayesian Approach

no code implementations17 Oct 2023 Dengwang Tang, Rahul Jain, Botao Hao, Zheng Wen

In this paper, we study the problem of efficient online reinforcement learning in the infinite horizon setting when there is an offline dataset to start with.

Imitation Learning

RLHF and IIA: Perverse Incentives

no code implementations2 Dec 2023 Wanqiao Xu, Shi Dong, Xiuyuan Lu, Grace Lam, Zheng Wen, Benjamin Van Roy

Existing algorithms for reinforcement learning from human feedback (RLHF) can incentivize responses at odds with preferences because they are based on models that assume independence of irrelevant alternatives (IIA).

reinforcement-learning

Cannot find the paper you are looking for? You can Submit a new open access paper.