no code implementations • 10 Dec 2020 • Lingda Wang, Bingcong Li, Huozhi Zhou, Georgios B. Giannakis, Lav R. Varshney, Zhizhen Zhao
The second algorithm, \texttt{EXP3-LGC-IX}, is developed for a special class of problems, for which the regret is reduced to $\mathcal{O}(\sqrt{\alpha(G)dT\log{K}\log(KT)})$ for both directed as well as undirected feedback graphs.
no code implementations • 8 Oct 2020 • Huozhi Zhou, Jinglin Chen, Lav R. Varshney, Ashish Jagmohan
We consider reinforcement learning (RL) in episodic Markov decision processes (MDPs) with linear function approximation under drifting environment.
no code implementations • 12 Sep 2019 • Lingda Wang, Huozhi Zhou, Bingcong Li, Lav R. Varshney, Zhizhen Zhao
Cascading bandit (CB) is a popular model for web search and online advertising, where an agent aims to learn the $K$ most attractive items out of a ground set of size $L$ during the interaction with a user.
no code implementations • 27 Aug 2019 • Huozhi Zhou, Lingda Wang, Lav R. Varshney, Ee-Peng Lim
Compared to the original combinatorial semi-bandit problem, our setting assumes the reward distributions of base arms may change in a piecewise-stationary manner at unknown time steps.
no code implementations • 25 Nov 2018 • I Chien, Huozhi Zhou, Pan Li
We propose a hypergraph-based active learning scheme which we term $HS^2$, $HS^2$ generalizes the previously reported algorithm $S^2$ originally proposed for graph-based active learning with pointwise queries [Dasarathy et al., COLT 2015].