Search Results for author: Gergely Neu

Found 44 papers, 1 papers with code

Optimisic Information Directed Sampling

no code implementations23 Feb 2024 Gergely Neu, Matteo Papini, Ludovic Schwartz

We study the problem of online learning in contextual bandit problems where the loss function is assumed to belong to a known parametric function class.

Multi-Armed Bandits

Dealing with unbounded gradients in stochastic saddle-point optimization

no code implementations21 Feb 2024 Gergely Neu, Nneka Okolo

We study the performance of stochastic first-order methods for finding saddle points of convex-concave functions.

Adversarial Contextual Bandits Go Kernelized

no code implementations2 Oct 2023 Gergely Neu, Julia Olkhovskaya, Sattar Vakili

We study a generalization of the problem of online learning in adversarial linear contextual bandits by incorporating loss functions that belong to a reproducing kernel Hilbert space, which allows for a more flexible modeling of complex decision-making scenarios.

Decision Making Multi-Armed Bandits

Importance-Weighted Offline Learning Done Right

no code implementations27 Sep 2023 Germano Gabbianelli, Gergely Neu, Matteo Papini

These improvements are made possible by the observation that the upper and lower tails importance-weighted estimators behave very differently from each other, and their careful control can massively improve on previous results that were all based on symmetric two-sided concentration inequalities.

Online-to-PAC Conversions: Generalization Bounds via Regret Analysis

no code implementations31 May 2023 Gábor Lugosi, Gergely Neu

We establish a connection between the online and statistical learning setting by showing that the existence of an online learning algorithm with bounded regret in this game implies a bound on the generalization error of the statistical learning algorithm, up to a martingale concentration term that is independent of the complexity of the statistical learning method.

Generalization Bounds

Offline Primal-Dual Reinforcement Learning for Linear MDPs

no code implementations22 May 2023 Germano Gabbianelli, Gergely Neu, Nneka Okolo, Matteo Papini

Offline Reinforcement Learning (RL) aims to learn a near-optimal policy from a fixed dataset of transitions collected by another policy.

Offline RL reinforcement-learning +2

Optimistic Planning by Regularized Dynamic Programming

no code implementations27 Feb 2023 Antoine Moulin, Gergely Neu

We propose a new method for optimistic planning in infinite-horizon discounted Markov decision processes based on the idea of adding regularization to the updates of an otherwise standard approximate value iteration procedure.

Efficient Global Planning in Large MDPs via Stochastic Primal-Dual Optimization

no code implementations21 Oct 2022 Gergely Neu, Nneka Okolo

We propose a new stochastic primal-dual optimization algorithm for planning in a large discounted Markov decision process with a generative model and linear function approximation.

Sufficient Exploration for Convex Q-learning

no code implementations17 Oct 2022 Fan Lu, Prashant Mehta, Sean Meyn, Gergely Neu

The main contributions follow: (i) The dual of convex Q-learning is not precisely Manne's LP or a version of logistic Q-learning, but has similar structure that reveals the need for regularization to avoid over-fitting.

OpenAI Gym Q-Learning

Proximal Point Imitation Learning

2 code implementations22 Sep 2022 Luca Viano, Angeliki Kamoutsi, Gergely Neu, Igor Krawczuk, Volkan Cevher

Thanks to PPM, we avoid nested policy evaluation and cost updates for online IL appearing in the prior literature.

Imitation Learning

Online Learning with Off-Policy Feedback

no code implementations18 Jul 2022 Germano Gabbianelli, Matteo Papini, Gergely Neu

We study the problem of online learning in adversarial bandit problems under a partial observability model called off-policy feedback.

Decision Making Multi-Armed Bandits

Lifting the Information Ratio: An Information-Theoretic Analysis of Thompson Sampling for Contextual Bandits

no code implementations27 May 2022 Gergely Neu, Julia Olkhovskaya, Matteo Papini, Ludovic Schwartz

We study the Bayesian regret of the renowned Thompson Sampling algorithm in contextual bandits with binary losses and adversarially-selected contexts.

Multi-Armed Bandits Thompson Sampling

Generalization Bounds via Convex Analysis

no code implementations10 Feb 2022 Gábor Lugosi, Gergely Neu

Since the celebrated works of Russo and Zou (2016, 2019) and Xu and Raginsky (2017), it has been well known that the generalization error of supervised learning algorithms can be bounded in terms of the mutual information between their input and the output, given that the loss of any fixed hypothesis has a subgaussian tail.

Generalization Bounds

Robustness and risk management via distributional dynamic programming

no code implementations28 Dec 2021 Mastane Achab, Gergely Neu

In dynamic programming (DP) and reinforcement learning (RL), an agent learns to act optimally in terms of expected long-term return by sequentially interacting with its environment modeled by a Markov decision process (MDP).

Distributional Reinforcement Learning Management +2

Learning to maximize global influence from local observations

no code implementations24 Sep 2021 Gábor Lugosi, Gergely Neu, Julia Olkhovskaya

The goal of the decision maker is to select the sequence of agents in a way that the total number of influenced nodes in the network.

Online learning in MDPs with linear function approximation and bandit feedback.

no code implementations NeurIPS 2021 Gergely Neu, Julia Olkhovskaya

We consider the problem of online learning in an episodic Markov decision process, where the reward function is allowed to change between episodes in an adversarial manner and the learner only observes the rewards associated with its actions.

Information-Theoretic Generalization Bounds for Stochastic Gradient Descent

no code implementations1 Feb 2021 Gergely Neu, Gintare Karolina Dziugaite, Mahdi Haghifam, Daniel M. Roy

The key factors our bounds depend on are the variance of the gradients (with respect to the data distribution) and the local smoothness of the objective function along the SGD path, and the sensitivity of the loss function to perturbations to the final output.

Generalization Bounds Stochastic Optimization

Logistic Q-Learning

no code implementations21 Oct 2020 Joan Bas-Serrano, Sebastian Curi, Andreas Krause, Gergely Neu

We propose a new reinforcement learning algorithm derived from a regularized linear-programming formulation of optimal control in MDPs.

Q-Learning Reinforcement Learning (RL)

Online learning in MDPs with linear function approximation and bandit feedback

no code implementations NeurIPS 2021 Gergely Neu, Julia Olkhovskaya

We consider an online learning problem where the learner interacts with a Markov decision process in a sequence of episodes, where the reward function is allowed to change between episodes in an adversarial manner and the learner only gets to observe the rewards associated with its actions.

A Unifying View of Optimism in Episodic Reinforcement Learning

no code implementations NeurIPS 2020 Gergely Neu, Ciara Pike-Burke

The principle of optimism in the face of uncertainty underpins many theoretically successful reinforcement learning algorithms.

reinforcement-learning Reinforcement Learning (RL)

Efficient and Robust Algorithms for Adversarial Linear Contextual Bandits

no code implementations1 Feb 2020 Gergely Neu, Julia Olkhovskaya

We consider an adversarial variant of the classic $K$-armed linear contextual bandit problem where the sequence of loss functions associated with each arm are allowed to change without restriction over time.

Multi-Armed Bandits

Fast Rates for Online Prediction with Abstention

no code implementations28 Jan 2020 Gergely Neu, Nikita Zhivotovskiy

In the setting of sequential prediction of individual $\{0, 1\}$-sequences with expert advice, we show that by allowing the learner to abstain from the prediction by paying a cost marginally smaller than $\frac 12$ (say, $0. 49$), it is possible to achieve expected regret bounds that are independent of the time horizon $T$.

Faster saddle-point optimization for solving large-scale Markov decision processes

no code implementations L4DC 2020 Joan Bas-Serrano, Gergely Neu

We consider the problem of computing optimal policies in average-reward Markov decision processes.

Adaptive Temporal-Difference Learning for Policy Evaluation with Per-State Uncertainty Estimates

no code implementations NeurIPS 2019 Hugo Penedones, Carlos Riquelme, Damien Vincent, Hartmut Maennel, Timothy Mann, Andre Barreto, Sylvain Gelly, Gergely Neu

We consider the core reinforcement-learning problem of on-policy value function approximation from a batch of trajectory data, and focus on various issues of Temporal Difference (TD) learning and Monte Carlo (MC) policy evaluation.

Beating SGD Saturation with Tail-Averaging and Minibatching

no code implementations NeurIPS 2019 Nicole Mücke, Gergely Neu, Lorenzo Rosasco

While stochastic gradient descent (SGD) is one of the major workhorses in machine learning, the learning properties of many practically used variants are poorly understood.

Bandit Principal Component Analysis

no code implementations8 Feb 2019 Wojciech Kotłowski, Gergely Neu

We consider a partial-feedback variant of the well-studied online PCA problem where a learner attempts to predict a sequence of $d$-dimensional vectors in terms of a quadratic loss, while only having limited feedback about the environment's choices.

Decision Making

Online Influence Maximization with Local Observations

no code implementations28 May 2018 Julia Olkhovskaya, Gergely Neu, Gábor Lugosi

We consider an online influence maximization problem in which a decision maker selects a node among a large number of possibilities and places a piece of information at the node.

Iterate averaging as regularization for stochastic gradient descent

no code implementations22 Feb 2018 Gergely Neu, Lorenzo Rosasco

We propose and analyze a variant of the classic Polyak-Ruppert averaging scheme, broadly used in stochastic gradient methods.


On the Hardness of Inventory Management with Censored Demand Data

no code implementations16 Oct 2017 Gábor Lugosi, Mihalis G. Markakis, Gergely Neu

Furthermore, we modify the proposed policy in order to perform well in terms of the tracking regret, that is, using as benchmark the best sequence of inventory decisions that switches a limited number of times.


Boltzmann Exploration Done Right

no code implementations NeurIPS 2017 Nicolò Cesa-Bianchi, Claudio Gentile, Gábor Lugosi, Gergely Neu

Boltzmann exploration is a classic strategy for sequential decision-making under uncertainty, and is one of the most standard tools in Reinforcement Learning (RL).

Decision Making Decision Making Under Uncertainty +2

A unified view of entropy-regularized Markov decision processes

no code implementations22 May 2017 Gergely Neu, Anders Jonsson, Vicenç Gómez

We propose a general framework for entropy-regularized average-reward reinforcement learning in Markov decision processes (MDPs).

Policy Gradient Methods reinforcement-learning +1

Algorithmic stability and hypothesis complexity

no code implementations ICML 2017 Tongliang Liu, Gábor Lugosi, Gergely Neu, DaCheng Tao

The bounds are based on martingale inequalities in the Banach space to which the hypotheses belong.

Fast rates for online learning in Linearly Solvable Markov Decision Processes

no code implementations21 Feb 2017 Gergely Neu, Vicenç Gómez

We study the problem of online learning in a class of Markov decision processes known as linearly solvable MDPs.

Explore no more: Improved high-probability regret bounds for non-stochastic bandits

no code implementations NeurIPS 2015 Gergely Neu

This work addresses the problem of regret minimization in non-stochastic multi-armed bandit problems, focusing on performance guarantees that hold with high probability.

Importance weighting without importance weights: An efficient algorithm for combinatorial semi-bandits

no code implementations17 Mar 2015 Gergely Neu, Gábor Bartók

We propose a sample-efficient alternative for importance weighting for situations where one only has sample access to the probability distribution that generates the observations.

Combinatorial Optimization

First-order regret bounds for combinatorial semi-bandits

no code implementations23 Feb 2015 Gergely Neu

We consider the problem of online combinatorial optimization under semi-bandit feedback, where a learner has to repeatedly pick actions from a combinatorial decision set in order to minimize the total losses associated with its decisions.

Combinatorial Optimization

Efficient learning by implicit exploration in bandit problems with side observations

no code implementations NeurIPS 2014 Tomáš Kocák, Gergely Neu, Michal Valko, Remi Munos

As the predictions of our first algorithm cannot be always computed efficiently in this setting, we propose another algorithm with similar properties and with the benefit of always being computationally efficient, at the price of a slightly more complicated tuning mechanism.

Combinatorial Optimization

Exploiting easy data in online optimization

no code implementations NeurIPS 2014 Amir Sani, Gergely Neu, Alessandro Lazaric

We consider the problem of online optimization, where a learner chooses a decision from a given decision set and suffers some loss associated with the decision and the state of the environment.

Online learning in MDPs with side information

no code implementations26 Jun 2014 Yasin Abbasi-Yadkori, Gergely Neu

We study online learning of finite Markov decision process (MDP) problems when a side information vector is available.

Recommendation Systems

Online learning in episodic Markovian decision processes by relative entropy policy search

no code implementations NeurIPS 2013 Alexander Zimin, Gergely Neu

We study the problem of online learning in finite episodic Markov decision processes where the loss function is allowed to change between episodes.

An efficient algorithm for learning with semi-bandit feedback

no code implementations13 May 2013 Gergely Neu, Gábor Bartók

We consider the problem of online combinatorial optimization under semi-bandit feedback.

Combinatorial Optimization

Apprenticeship Learning using Inverse Reinforcement Learning and Gradient Methods

no code implementations20 Jun 2012 Gergely Neu, Csaba Szepesvari

In this paper we propose a novel gradient algorithm to learn a policy from an expert's observed behavior assuming that the expert behaves optimally with respect to some unknown reward function of a Markovian Decision Problem.

reinforcement-learning Reinforcement Learning (RL)

Online Markov Decision Processes under Bandit Feedback

no code implementations NeurIPS 2010 Gergely Neu, Andras Antos, András György, Csaba Szepesvári

We consider online learning in finite stochastic Markovian environments where in each time step a new reward function is chosen by an oblivious adversary.

Cannot find the paper you are looking for? You can Submit a new open access paper.