To achieve this, the recommender system conducts conversations with users, asking their preferences for different items or item categories.
Under this new condition, we propose a BCUCB-T algorithm with variance-aware confidence intervals and conduct regret analysis which reduces the $O(K)$ factor to $O(\log K)$ or $O(\log^2 K)$ in the regret bound, significantly improving the regret bounds for the above applications.
For the online learning setting, neither the network structure nor the node weights are known initially.
In this paper, we introduce a new Online Competitive Influence Maximization (OCIM) problem, where two competing items (e. g., products, news stories) propagate in the same network and influence probabilities on edges are unknown.
We consider the stochastic multi-armed bandit (MAB) problem in a setting where a player can pay to pre-observe arm rewards before playing an arm in each round.