Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

# Policy Learning with Adaptively Collected Data

We complement this regret upper bound with a lower bound that characterizes the fundamental difficulty of policy learning with adaptive data.

# Combinatorial Bandits under Strategic Manipulations

We study the problem of combinatorial multi-armed bandits (CMAB) under strategic manipulations of rewards, where each arm can modify the emitted reward signals for its own interest.

# Federated Multi-armed Bandits with Personalization

A general framework of personalized federated multi-armed bandits (PF-MAB) is proposed, which is a new bandit paradigm analogous to the federated learning (FL) framework in supervised learning and enjoys the features of FL with personalization.

# Output-Weighted Sampling for Multi-Armed Bandits with Extreme Payoffs

We present a new type of acquisition functions for online decision making in multi-armed and contextual bandit problems with extreme payoffs.

# Federated Multi-Armed Bandits

We first study the approximate model where the heterogeneous local models are random realizations of the global model from an unknown distribution.

# An empirical evaluation of active inference in multi-armed bandits

This comparison is done on two types of bandit problems: a stationary and a dynamic switching bandit.

# Relational Boosted Bandits

Contextual bandits algorithms have become essential in real-world user interaction problems in recent years.

# Active Feature Selection for the Mutual Information Criterion

We study active feature selection, a novel feature selection setting in which unlabeled data is available, but the budget for labels is limited, and the examples to label can be actively selected by the algorithm.

# BanditPAM: Almost Linear Time k-Medoids Clustering via Multi-Armed Bandits

In these experiments, we observe that BanditPAM returns the same results as state-of-the-art PAM-like algorithms up to 4x faster while performing up to 200x fewer distance computations.

# Unreasonable Effectiveness of Greedy Algorithms in Multi-Armed Bandit with Many Arms

We study the structure of regret-minimizing policies in the {\em many-armed} Bayesian multi-armed bandit problem: in particular, with $k$ the number of arms and $T$ the time horizon, we consider the case where $k \geq \sqrt{T}$.

