Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

TREND DATASET BEST METHOD PAPER TITLE PAPER CODE COMPARE

# Contextual Bandits with Sparse Data in Web setting

6 May 2021

Five categories of methods are described, making it easy to choose how to address sparse data using contextual bandits with a method available for modification in the specific setting of concern.

# Optimal Algorithms for Range Searching over Multi-Armed Bandits

4 May 2021

The sample complexities of our algorithms depend, in particular, on the size of the optimal hitting set of the given intervals.

# Online certification of preference-based fairness for personalized recommender systems

29 Apr 2021

We propose to assess the fairness of personalized recommender systems in the sense of envy-freeness: every (group of) user(s) should prefer their recommendations to the recommendations of other (groups of) users.

# Statistical Inference with M-Estimators on Bandit Data

29 Apr 2021

However there is a lack of general methods for conducting statistical inference using more complex models.

# Off-Policy Risk Assessment in Contextual Bandits

18 Apr 2021

Given a collection of Lipschitz risk functionals, OPRA provides estimates for each with corresponding error bounds that hold simultaneously.

# An Efficient Algorithm for Deep Stochastic Contextual Bandits

12 Apr 2021

In this work, we formulate the SCB that uses a DNN reward function as a non-convex stochastic optimization problem, and design a stage-wise stochastic gradient descent algorithm to optimize the problem and determine the action policy.

# Censored Semi-Bandits for Resource Allocation

12 Apr 2021

The loss depends on two hidden parameters, one specific to the arm but independent of the resource allocation, and the other depends on the allocated resource.

# Leveraging Good Representations in Linear Contextual Bandits

8 Apr 2021

We show that the regret is indeed never worse than the regret obtained by running LinUCB on the best representation (up to a $\ln M$ factor).

# Multinomial Logit Contextual Bandits: Provable Optimality and Practicality

25 Mar 2021

We propose upper confidence bound based algorithms for this MNL contextual bandit.

# Towards Optimal Algorithms for Multi-Player Bandits without Collision Sensing Information

24 Mar 2021

We propose a novel algorithm for multi-player multi-armed bandits without collision sensing information.