no code implementations • 2 Oct 2023 • Gergely Neu, Julia Olkhovskaya, Sattar Vakili
We study a generalization of the problem of online learning in adversarial linear contextual bandits by incorporating loss functions that belong to a reproducing kernel Hilbert space, which allows for a more flexible modeling of complex decision-making scenarios.
no code implementations • NeurIPS 2023 • Sattar Vakili, Julia Olkhovskaya
In particular, with highly non-smooth kernels (such as Neural Tangent kernel or some Mat\'ern kernels) the existing results lead to trivial (superlinear in the number of episodes) regret bounds.
no code implementations • 27 May 2022 • Gergely Neu, Julia Olkhovskaya, Matteo Papini, Ludovic Schwartz
We study the Bayesian regret of the renowned Thompson Sampling algorithm in contextual bandits with binary losses and adversarially-selected contexts.
no code implementations • 24 Sep 2021 • Gábor Lugosi, Gergely Neu, Julia Olkhovskaya
The goal of the decision maker is to select the sequence of agents in a way that the total number of influenced nodes in the network.
no code implementations • NeurIPS 2021 • Gergely Neu, Julia Olkhovskaya
We consider the problem of online learning in an episodic Markov decision process, where the reward function is allowed to change between episodes in an adversarial manner and the learner only observes the rewards associated with its actions.
no code implementations • NeurIPS 2021 • Gergely Neu, Julia Olkhovskaya
We consider an online learning problem where the learner interacts with a Markov decision process in a sequence of episodes, where the reward function is allowed to change between episodes in an adversarial manner and the learner only gets to observe the rewards associated with its actions.
no code implementations • 1 Feb 2020 • Gergely Neu, Julia Olkhovskaya
We consider an adversarial variant of the classic $K$-armed linear contextual bandit problem where the sequence of loss functions associated with each arm are allowed to change without restriction over time.
no code implementations • 28 May 2018 • Julia Olkhovskaya, Gergely Neu, Gábor Lugosi
We consider an online influence maximization problem in which a decision maker selects a node among a large number of possibilities and places a piece of information at the node.