Search Results for author: Zheng Wen

Found 55 papers, 7 papers with code

Budgeted Online Influence Maximization

no code implementations ICML 2020 Pierre Perrault, Zheng Wen, Michal Valko, Jennifer Healey

We introduce a new budgeted framework for online influence maximization, considering the total cost of an advertising campaign instead of the common cardinality constraint on a chosen influencer set.

Influence Diagram Bandits

no code implementations ICML 2020 Tong Yu, Branislav Kveton, Zheng Wen, Ruiyi Zhang, Ole J. Mengshoel

We experiment with three structured bandit problems: cascading bandits, online learning to rank in the position-based model, and rank-1 bandits.

Learning-To-Rank

Evaluating Predictive Distributions: Does Bayesian Deep Learning Work?

1 code implementation9 Oct 2021 Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Botao Hao, Morteza Ibrahimi, Dieterich Lawson, Xiuyuan Lu, Brendan O'Donoghue, Benjamin Van Roy

This paper introduces \textit{The Neural Testbed}, which provides tools for the systematic evaluation of agents that generate such predictions.

Evaluating Probabilistic Inference in Deep Learning: Beyond Marginal Predictions

no code implementations20 Jul 2021 Xiuyuan Lu, Ian Osband, Benjamin Van Roy, Zheng Wen

A fundamental challenge for any intelligent system is prediction: given some inputs $X_1,.., X_\tau$ can you predict outcomes $Y_1,.., Y_\tau$.

Epistemic Neural Networks

1 code implementation19 Jul 2021 Ian Osband, Zheng Wen, Mohammad Asghari, Morteza Ibrahimi, Xiyuan Lu, Benjamin Van Roy

All existing approaches to uncertainty modeling can be expressed as ENNs, and any ENN can be identified with a Bayesian neural network.

Joint Online Learning and Decision-making via Dual Mirror Descent

no code implementations20 Apr 2021 Alfonso Lobos, Paul Grigas, Zheng Wen

We consider an online revenue maximization problem over a finite time horizon subject to lower and upper bounds on cost.

Decision Making

Reinforcement Learning, Bit by Bit

no code implementations6 Mar 2021 Xiuyuan Lu, Benjamin Van Roy, Vikranth Dwaracherla, Morteza Ibrahimi, Ian Osband, Zheng Wen

Reinforcement learning agents have demonstrated remarkable achievements in simulated environments.

On the Approximation Relationship between Optimizing Ratio of Submodular (RS) and Difference of Submodular (DS) Functions

no code implementations5 Jan 2021 Pierre Perrault, Jennifer Healey, Zheng Wen, Michal Valko

We demonstrate that from an algorithm guaranteeing an approximation factor for the ratio of submodular (RS) optimization problem, we can build another algorithm having a different kind of approximation guarantee -- weaker than the classical one -- for the difference of submodular (DS) optimization problem, and vice versa.

Data Structures and Algorithms

Neural Contextual Bandits with Deep Representation and Shallow Exploration

no code implementations3 Dec 2020 Pan Xu, Zheng Wen, Handong Zhao, Quanquan Gu

We study a general class of contextual bandits, where each context-action pair is associated with a raw feature vector, but the reward generating function is unknown.

Multi-Armed Bandits Representation Learning

On Efficiency in Hierarchical Reinforcement Learning

no code implementations NeurIPS 2020 Zheng Wen, Doina Precup, Morteza Ibrahimi, Andre Barreto, Benjamin Van Roy, Satinder Singh

Hierarchical Reinforcement Learning (HRL) approaches promise to provide more efficient solutions to sequential decision making problems, both in terms of statistical as well as computational efficiency.

Decision Making Hierarchical Reinforcement Learning

A Benchmark and Baseline for Language-Driven Image Editing

no code implementations5 Oct 2020 Jing Shi, Ning Xu, Trung Bui, Franck Dernoncourt, Zheng Wen, Chenliang Xu

To solve this new task, we first present a new language-driven image editing dataset that supports both local and global editing with editing operation and mask annotations.

On the Sample Complexity of Reinforcement Learning with Policy Space Generalization

no code implementations17 Aug 2020 Wenlong Mou, Zheng Wen, Xi Chen

To avoid such undesirable dependence on the state and action space sizes, this paper proposes a new notion of eluder dimension for the policy space, which characterizes the intrinsic complexity of policy learning in an arbitrary Markov Decision Process (MDP).

Low-rank Tensor Bandits

no code implementations31 Jul 2020 Botao Hao, Jie zhou, Zheng Wen, Will Wei Sun

In recent years, multi-dimensional online decision making has been playing a crucial role in many practical applications such as online recommendation and digital marketing.

Decision Making

Structured Policy Iteration for Linear Quadratic Regulator

no code implementations ICML 2020 Youngsuk Park, Ryan A. Rossi, Zheng Wen, Gang Wu, Handong Zhao

In this paper, we introduce the \textit{Structured Policy Iteration} (S-PI) for LQR, a method capable of deriving a structured linear policy.

Influence Diagram Bandits: Variational Thompson Sampling for Structured Bandit Problems

no code implementations9 Jul 2020 Tong Yu, Branislav Kveton, Zheng Wen, Ruiyi Zhang, Ole J. Mengshoel

We propose a novel framework for structured bandits, which we call an influence diagram bandit.

Bootstrapping Upper Confidence Bound

no code implementations NeurIPS 2019 Botao Hao, Yasin Abbasi-Yadkori, Zheng Wen, Guang Cheng

Upper Confidence Bound (UCB) method is arguably the most celebrated one used in online decision making with partial information feedback.

Decision Making Multi-Armed Bandits

Waterfall Bandits: Learning to Sell Ads Online

no code implementations20 Apr 2019 Branislav Kveton, Saied Mahdian, S. Muthukrishnan, Zheng Wen, Yikun Xian

We design an online learning algorithm for solving this problem, which interleaves learning and optimization, and prove that this algorithm has sublinear regret.

Stochastic Online Learning with Probabilistic Graph Feedback

no code implementations4 Mar 2019 Shuai Li, Wei Chen, Zheng Wen, Kwong-Sak Leung

We consider a problem of stochastic online learning with general probabilistic graph feedback, where each directed edge in the feedback graph has probability $p_{ij}$.

Scalable Thompson Sampling via Optimal Transport

no code implementations19 Feb 2019 Ruiyi Zhang, Zheng Wen, Changyou Chen, Lawrence Carin

Thompson sampling (TS) is a class of algorithms for sequential decision-making, which requires maintaining a posterior distribution over a model.

Decision Making

Scalar Posterior Sampling with Applications

no code implementations NeurIPS 2018 Georgios Theocharous, Zheng Wen, Yasin Abbasi, Nikos Vlassis

Our algorithm termed deterministic schedule PSRL (DS-PSRL) is efficient in terms of time, sample, and space complexity.

Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits

no code implementations13 Nov 2018 Branislav Kveton, Csaba Szepesvari, Sharan Vaswani, Zheng Wen, Mohammad Ghavamzadeh, Tor Lattimore

Specifically, it pulls the arm with the highest mean reward in a non-parametric bootstrap sample of its history with pseudo rewards.

Multi-Armed Bandits

Online Diverse Learning to Rank from Partial-Click Feedback

no code implementations1 Nov 2018 Prakhar Gupta, Gaurush Hiranandani, Harvineet Singh, Branislav Kveton, Zheng Wen, Iftikhar Ahamath Burhanuddin

We assume that the user examines the list of recommended items until the user is attracted by an item, which is clicked, and does not examine the rest of the items.

Learning-To-Rank Recommendation Systems

Conservative Exploration using Interleaving

no code implementations3 Jun 2018 Sumeet Katariya, Branislav Kveton, Zheng Wen, Vamsi K. Potluru

In many practical problems, a learning agent may want to learn the best action in hindsight without ever taking a bad action, which is significantly worse than the default production action.

Offline Evaluation of Ranking Policies with Click Models

no code implementations27 Apr 2018 Shuai Li, Yasin Abbasi-Yadkori, Branislav Kveton, S. Muthukrishnan, Vishwa Vinay, Zheng Wen

We analyze our estimators and prove that they are more efficient than the estimators that do not use the structure of the click model, under the assumption that the click model holds.

Recommendation Systems

Nearly Optimal Adaptive Procedure with Change Detection for Piecewise-Stationary Bandit

no code implementations11 Feb 2018 Yang Cao, Zheng Wen, Branislav Kveton, Yao Xie

Multi-armed bandit (MAB) is a class of online learning problems where a learning agent aims to maximize its expected cumulative reward while repeatedly selecting to pull arms with unknown reward distributions.

Posterior Sampling for Large Scale Reinforcement Learning

no code implementations21 Nov 2017 Georgios Theocharous, Zheng Wen, Yasin Abbasi-Yadkori, Nikos Vlassis

Our algorithm termed deterministic schedule PSRL (DS-PSRL) is efficient in terms of time, sample, and space complexity.

A Tutorial on Thompson Sampling

3 code implementations7 Jul 2017 Daniel Russo, Benjamin Van Roy, Abbas Kazerouni, Ian Osband, Zheng Wen

Thompson sampling is an algorithm for online decision problems where actions are taken sequentially in a manner that must balance between exploiting what is known to maximize immediate performance and investing to accumulate new information that may improve future performance.

Active Learning Product Recommendation

Profit Maximization for Online Advertising Demand-Side Platforms

no code implementations6 Jun 2017 Paul Grigas, Alfonso Lobos, Zheng Wen, Kuang-Chih Lee

We develop an optimization model and corresponding algorithm for the management of a demand-side platform (DSP), whereby the DSP aims to maximize its own profit while acquiring valuable impressions for its advertiser clients.

Optimization and Control Computer Science and Game Theory

Deep Exploration via Randomized Value Functions

no code implementations22 Mar 2017 Ian Osband, Benjamin Van Roy, Daniel Russo, Zheng Wen

We study the use of randomized value functions to guide deep exploration in reinforcement learning.

Efficient Exploration

Bernoulli Rank-$1$ Bandits for Click Feedback

no code implementations19 Mar 2017 Sumeet Katariya, Branislav Kveton, Csaba Szepesvári, Claire Vernade, Zheng Wen

The probability that a user will click a search result depends both on its relevance and its position on the results page.

Model-Independent Online Learning for Influence Maximization

no code implementations ICML 2017 Sharan Vaswani, Branislav Kveton, Zheng Wen, Mohammad Ghavamzadeh, Laks Lakshmanan, Mark Schmidt

We consider influence maximization (IM) in social networks, which is the problem of maximizing the number of users that become aware of a product by selecting a set of "seed" users to expose the product to.

Does Weather Matter? Causal Analysis of TV Logs

no code implementations25 Jan 2017 Shi Zong, Branislav Kveton, Shlomo Berkovsky, Azin Ashkan, Nikos Vlassis, Zheng Wen

To the best of our knowledge, this is the first large-scale causal study of the impact of weather on TV watching patterns.

Stochastic Rank-1 Bandits

no code implementations10 Aug 2016 Sumeet Katariya, Branislav Kveton, Csaba Szepesvari, Claire Vernade, Zheng Wen

The main challenge of the problem is that the individual values of the row and column are unobserved.

Online Influence Maximization under Independent Cascade Model with Semi-Bandit Feedback

1 code implementation NeurIPS 2017 Zheng Wen, Branislav Kveton, Michal Valko, Sharan Vaswani

Specifically, we aim to learn the set of "best influencers" in a social network online while repeatedly interacting with it.

Cascading Bandits for Large-Scale Recommendation Problems

1 code implementation17 Mar 2016 Shi Zong, Hao Ni, Kenny Sung, Nan Rosemary Ke, Zheng Wen, Branislav Kveton

In this work, we study cascading bandits, an online learning variant of the cascade model where the goal is to recommend $K$ most attractive items from a large set of $L$ candidate items.

Recommendation Systems

DCM Bandits: Learning to Rank with Multiple Clicks

1 code implementation9 Feb 2016 Sumeet Katariya, Branislav Kveton, Csaba Szepesvári, Zheng Wen

This work presents the first practical and regret-optimal online algorithm for learning to rank with multiple clicks in a cascade-like click model.

Learning-To-Rank

Combinatorial Cascading Bandits

no code implementations NeurIPS 2015 Branislav Kveton, Zheng Wen, Azin Ashkan, Csaba Szepesvari

The agent observes the index of the first chosen item whose weight is zero.

Cascading Bandits: Learning to Rank in the Cascade Model

no code implementations10 Feb 2015 Branislav Kveton, Csaba Szepesvari, Zheng Wen, Azin Ashkan

We also prove gap-dependent upper bounds on the regret of these algorithms and derive a lower bound on the regret in cascading bandits.

Learning-To-Rank

DUM: Diversity-Weighted Utility Maximization for Recommendations

no code implementations13 Nov 2014 Azin Ashkan, Branislav Kveton, Shlomo Berkovsky, Zheng Wen

The need for diversification of recommendation lists manifests in a number of recommender systems use cases.

Recommendation Systems

Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits

no code implementations3 Oct 2014 Branislav Kveton, Zheng Wen, Azin Ashkan, Csaba Szepesvari

A stochastic combinatorial semi-bandit is an online learning problem where at each step a learning agent chooses a subset of ground items subject to constraints, and then observes stochastic weights of these items and receives their sum as a payoff.

Efficient Learning in Large-Scale Combinatorial Semi-Bandits

no code implementations28 Jun 2014 Zheng Wen, Branislav Kveton, Azin Ashkan

A stochastic combinatorial semi-bandit is an online learning problem where at each step a learning agent chooses a subset of ground items subject to combinatorial constraints, and then observes stochastic weights of these items and receives their sum as a payoff.

Learning to Act Greedily: Polymatroid Semi-Bandits

no code implementations30 May 2014 Branislav Kveton, Zheng Wen, Azin Ashkan, Michal Valko

Many important optimization problems, such as the minimum spanning tree and minimum-cost flow, can be solved optimally by a greedy method.

Generalization and Exploration via Randomized Value Functions

1 code implementation4 Feb 2014 Ian Osband, Benjamin Van Roy, Zheng Wen

We propose randomized least-squares value iteration (RLSVI) -- a new reinforcement learning algorithm designed to explore and generalize efficiently via linearly parameterized value functions.

Efficient Exploration

Optimal Demand Response Using Device Based Reinforcement Learning

no code implementations8 Jan 2014 Zheng Wen, Daniel O'Neill, Hamid Reza Maei

Demand response (DR) for residential and small commercial buildings is estimated to account for as much as 65% of the total energy savings potential of DR, and previous work shows that a fully automated Energy Management System (EMS) is a necessary prerequisite to DR in these areas.

Q-Learning

Adaptive Submodular Maximization in Bandit Setting

no code implementations NeurIPS 2013 Victor Gabillon, Branislav Kveton, Zheng Wen, Brian Eriksson, S. Muthukrishnan

Maximization of submodular functions has wide applications in machine learning and artificial intelligence.

Efficient Exploration and Value Function Generalization in Deterministic Systems

no code implementations NeurIPS 2013 Zheng Wen, Benjamin Van Roy

We consider the problem of reinforcement learning over episodes of a finite-horizon deterministic system and as a solution propose optimistic constraint propagation (OCP), an algorithm designed to synthesize efficient exploration and value function generalization.

Efficient Exploration

Efficient Reinforcement Learning in Deterministic Systems with Value Function Generalization

no code implementations18 Jul 2013 Zheng Wen, Benjamin Van Roy

We consider the problem of reinforcement learning over episodes of a finite-horizon deterministic system and as a solution propose optimistic constraint propagation (OCP), an algorithm designed to synthesize efficient exploration and value function generalization.

Efficient Exploration

Cannot find the paper you are looking for? You can Submit a new open access paper.