no code implementations • 1 Jul 2019 • Eralp Turgay, Cem Bulucu, Cem Tekin
As our learning model, we consider a structured contextual multi-armed bandit (CMAB) with high-dimensional arm (action) and context (data) sets, where the rewards depend only on a few relevant dimensions of the joint context-arm set, possibly in a non-linear way.