no code implementations • 23 Feb 2024 • Gergely Neu, Matteo Papini, Ludovic Schwartz

We study the problem of online learning in contextual bandit problems where the loss function is assumed to belong to a known parametric function class.

no code implementations • 21 Feb 2024 • Gergely Neu, Nneka Okolo

We study the performance of stochastic first-order methods for finding saddle points of convex-concave functions.

no code implementations • 2 Oct 2023 • Gergely Neu, Julia Olkhovskaya, Sattar Vakili

We study a generalization of the problem of online learning in adversarial linear contextual bandits by incorporating loss functions that belong to a reproducing kernel Hilbert space, which allows for a more flexible modeling of complex decision-making scenarios.

no code implementations • 27 Sep 2023 • Germano Gabbianelli, Gergely Neu, Matteo Papini

These improvements are made possible by the observation that the upper and lower tails importance-weighted estimators behave very differently from each other, and their careful control can massively improve on previous results that were all based on symmetric two-sided concentration inequalities.

no code implementations • 31 May 2023 • Gábor Lugosi, Gergely Neu

We establish a connection between the online and statistical learning setting by showing that the existence of an online learning algorithm with bounded regret in this game implies a bound on the generalization error of the statistical learning algorithm, up to a martingale concentration term that is independent of the complexity of the statistical learning method.

no code implementations • 22 May 2023 • Germano Gabbianelli, Gergely Neu, Nneka Okolo, Matteo Papini

Offline Reinforcement Learning (RL) aims to learn a near-optimal policy from a fixed dataset of transitions collected by another policy.

no code implementations • 27 Feb 2023 • Antoine Moulin, Gergely Neu

We propose a new method for optimistic planning in infinite-horizon discounted Markov decision processes based on the idea of adding regularization to the updates of an otherwise standard approximate value iteration procedure.

no code implementations • 21 Oct 2022 • Gergely Neu, Nneka Okolo

We propose a new stochastic primal-dual optimization algorithm for planning in a large discounted Markov decision process with a generative model and linear function approximation.

no code implementations • 17 Oct 2022 • Fan Lu, Prashant Mehta, Sean Meyn, Gergely Neu

The main contributions follow: (i) The dual of convex Q-learning is not precisely Manne's LP or a version of logistic Q-learning, but has similar structure that reveals the need for regularization to avoid over-fitting.

2 code implementations • 22 Sep 2022 • Luca Viano, Angeliki Kamoutsi, Gergely Neu, Igor Krawczuk, Volkan Cevher

Thanks to PPM, we avoid nested policy evaluation and cost updates for online IL appearing in the prior literature.

no code implementations • 18 Jul 2022 • Germano Gabbianelli, Matteo Papini, Gergely Neu

We study the problem of online learning in adversarial bandit problems under a partial observability model called off-policy feedback.

no code implementations • 27 May 2022 • Gergely Neu, Julia Olkhovskaya, Matteo Papini, Ludovic Schwartz

We study the Bayesian regret of the renowned Thompson Sampling algorithm in contextual bandits with binary losses and adversarially-selected contexts.

no code implementations • 10 Feb 2022 • Gábor Lugosi, Gergely Neu

Since the celebrated works of Russo and Zou (2016, 2019) and Xu and Raginsky (2017), it has been well known that the generalization error of supervised learning algorithms can be bounded in terms of the mutual information between their input and the output, given that the loss of any fixed hypothesis has a subgaussian tail.

no code implementations • 28 Dec 2021 • Mastane Achab, Gergely Neu

In dynamic programming (DP) and reinforcement learning (RL), an agent learns to act optimally in terms of expected long-term return by sequentially interacting with its environment modeled by a Markov decision process (MDP).

no code implementations • 24 Sep 2021 • Gábor Lugosi, Gergely Neu, Julia Olkhovskaya

The goal of the decision maker is to select the sequence of agents in a way that the total number of influenced nodes in the network.

no code implementations • NeurIPS 2021 • Gergely Neu, Julia Olkhovskaya

We consider the problem of online learning in an episodic Markov decision process, where the reward function is allowed to change between episodes in an adversarial manner and the learner only observes the rewards associated with its actions.

no code implementations • 1 Feb 2021 • Gergely Neu, Gintare Karolina Dziugaite, Mahdi Haghifam, Daniel M. Roy

The key factors our bounds depend on are the variance of the gradients (with respect to the data distribution) and the local smoothness of the objective function along the SGD path, and the sensitivity of the loss function to perturbations to the final output.

no code implementations • 21 Oct 2020 • Joan Bas-Serrano, Sebastian Curi, Andreas Krause, Gergely Neu

We propose a new reinforcement learning algorithm derived from a regularized linear-programming formulation of optimal control in MDPs.

no code implementations • NeurIPS 2021 • Gergely Neu, Julia Olkhovskaya

We consider an online learning problem where the learner interacts with a Markov decision process in a sequence of episodes, where the reward function is allowed to change between episodes in an adversarial manner and the learner only gets to observe the rewards associated with its actions.

no code implementations • NeurIPS 2020 • Gergely Neu, Ciara Pike-Burke

The principle of optimism in the face of uncertainty underpins many theoretically successful reinforcement learning algorithms.

no code implementations • 1 Feb 2020 • Gergely Neu, Julia Olkhovskaya

We consider an adversarial variant of the classic $K$-armed linear contextual bandit problem where the sequence of loss functions associated with each arm are allowed to change without restriction over time.

no code implementations • 28 Jan 2020 • Gergely Neu, Nikita Zhivotovskiy

In the setting of sequential prediction of individual $\{0, 1\}$-sequences with expert advice, we show that by allowing the learner to abstain from the prediction by paying a cost marginally smaller than $\frac 12$ (say, $0. 49$), it is possible to achieve expected regret bounds that are independent of the time horizon $T$.

no code implementations • L4DC 2020 • Joan Bas-Serrano, Gergely Neu

We consider the problem of computing optimal policies in average-reward Markov decision processes.

no code implementations • NeurIPS 2019 • Hugo Penedones, Carlos Riquelme, Damien Vincent, Hartmut Maennel, Timothy Mann, Andre Barreto, Sylvain Gelly, Gergely Neu

We consider the core reinforcement-learning problem of on-policy value function approximation from a batch of trajectory data, and focus on various issues of Temporal Difference (TD) learning and Monte Carlo (MC) policy evaluation.

no code implementations • NeurIPS 2019 • Nicole Mücke, Gergely Neu, Lorenzo Rosasco

While stochastic gradient descent (SGD) is one of the major workhorses in machine learning, the learning properties of many practically used variants are poorly understood.

no code implementations • 8 Feb 2019 • Wojciech Kotłowski, Gergely Neu

We consider a partial-feedback variant of the well-studied online PCA problem where a learner attempts to predict a sequence of $d$-dimensional vectors in terms of a quadratic loss, while only having limited feedback about the environment's choices.

no code implementations • 28 May 2018 • Julia Olkhovskaya, Gergely Neu, Gábor Lugosi

We consider an online influence maximization problem in which a decision maker selects a node among a large number of possibilities and places a piece of information at the node.

no code implementations • 22 Feb 2018 • Gergely Neu, Lorenzo Rosasco

We propose and analyze a variant of the classic Polyak-Ruppert averaging scheme, broadly used in stochastic gradient methods.

no code implementations • 16 Oct 2017 • Gábor Lugosi, Mihalis G. Markakis, Gergely Neu

Furthermore, we modify the proposed policy in order to perform well in terms of the tracking regret, that is, using as benchmark the best sequence of inventory decisions that switches a limited number of times.

no code implementations • NeurIPS 2017 • Nicolò Cesa-Bianchi, Claudio Gentile, Gábor Lugosi, Gergely Neu

Boltzmann exploration is a classic strategy for sequential decision-making under uncertainty, and is one of the most standard tools in Reinforcement Learning (RL).

no code implementations • 22 May 2017 • Gergely Neu, Anders Jonsson, Vicenç Gómez

We propose a general framework for entropy-regularized average-reward reinforcement learning in Markov decision processes (MDPs).

no code implementations • ICML 2017 • Tongliang Liu, Gábor Lugosi, Gergely Neu, DaCheng Tao

The bounds are based on martingale inequalities in the Banach space to which the hypotheses belong.

no code implementations • 21 Feb 2017 • Gergely Neu, Vicenç Gómez

We study the problem of online learning in a class of Markov decision processes known as linearly solvable MDPs.

no code implementations • NeurIPS 2015 • Gergely Neu

This work addresses the problem of regret minimization in non-stochastic multi-armed bandit problems, focusing on performance guarantees that hold with high probability.

no code implementations • 17 Mar 2015 • Gergely Neu, Gábor Bartók

We propose a sample-efficient alternative for importance weighting for situations where one only has sample access to the probability distribution that generates the observations.

no code implementations • 23 Feb 2015 • Gergely Neu

We consider the problem of online combinatorial optimization under semi-bandit feedback, where a learner has to repeatedly pick actions from a combinatorial decision set in order to minimize the total losses associated with its decisions.

no code implementations • NeurIPS 2014 • Tomáš Kocák, Gergely Neu, Michal Valko, Remi Munos

As the predictions of our first algorithm cannot be always computed efficiently in this setting, we propose another algorithm with similar properties and with the benefit of always being computationally efficient, at the price of a slightly more complicated tuning mechanism.

no code implementations • NeurIPS 2014 • Amir Sani, Gergely Neu, Alessandro Lazaric

We consider the problem of online optimization, where a learner chooses a decision from a given decision set and suffers some loss associated with the decision and the state of the environment.

no code implementations • NeurIPS 2014 • Gergely Neu, Michal Valko

Most work on sequential learning assumes a fixed set of actions that are available all the time.

no code implementations • 26 Jun 2014 • Yasin Abbasi-Yadkori, Gergely Neu

We study online learning of finite Markov decision process (MDP) problems when a side information vector is available.

no code implementations • NeurIPS 2013 • Alexander Zimin, Gergely Neu

We study the problem of online learning in finite episodic Markov decision processes where the loss function is allowed to change between episodes.

no code implementations • 13 May 2013 • Gergely Neu, Gábor Bartók

We consider the problem of online combinatorial optimization under semi-bandit feedback.

no code implementations • 20 Jun 2012 • Gergely Neu, Csaba Szepesvari

In this paper we propose a novel gradient algorithm to learn a policy from an expert's observed behavior assuming that the expert behaves optimally with respect to some unknown reward function of a Markovian Decision Problem.

no code implementations • NeurIPS 2010 • Gergely Neu, Andras Antos, András György, Csaba Szepesvári

We consider online learning in finite stochastic Markovian environments where in each time step a new reward function is chosen by an oblivious adversary.

Cannot find the paper you are looking for? You can
Submit a new open access paper.

Contact us on:
hello@paperswithcode.com
.
Papers With Code is a free resource with all data licensed under CC-BY-SA.