Online Semi-Supervised Learning in Contextual Bandits with Episodic Reward

17 Sep 2020  ·  Baihan Lin ·

We considered a novel practical problem of online learning with episodically revealed rewards, motivated by several real-world applications, where the contexts are nonstationary over different episodes and the reward feedbacks are not always available to the decision making agents. For this online semi-supervised learning setting, we introduced Background Episodic Reward LinUCB (BerlinUCB), a solution that easily incorporates clustering as a self-supervision module to provide useful side information when rewards are not observed. Our experiments on a variety of datasets, both in stationary and nonstationary environments of six different scenarios, demonstrated clear advantages of the proposed approach over the standard contextual bandit. Lastly, we introduced a relevant real-life example where this problem setting is especially useful.

PDF Abstract


  Add Datasets introduced or used in this paper

Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.


No methods listed for this paper. Add relevant methods here