The design of personalized incentives or recommendations to improve user
engagement is gaining prominence as digital platform providers continually
emerge. We propose a multi-armed bandit framework for matching incentives to
users, whose preferences are unknown a priori and evolving dynamically in time,
in a resource constrained environment...
We design an algorithm that combines
ideas from three distinct domains: (i) a greedy matching paradigm, (ii) the
upper confidence bound algorithm (UCB) for bandits, and (iii) mixing times from
the theory of Markov chains. For this algorithm, we provide theoretical bounds
on the regret and demonstrate its performance via both synthetic and realistic
(matching supply and demand in a bike-sharing platform) examples.