no code implementations • 1 Jul 2019 • Eralp Turgay, Cem Bulucu, Cem Tekin
As our learning model, we consider a structured contextual multi-armed bandit (CMAB) with high-dimensional arm (action) and context (data) sets, where the rewards depend only on a few relevant dimensions of the joint context-arm set, possibly in a non-linear way.
no code implementations • 18 Aug 2017 • Cem Tekin, Eralp Turgay
In this case, the optimal arm given a context is the one that maximizes the expected reward in the non-dominant objective among all arms that maximize the expected reward in the dominant objective.