Search Results for author: Csaba Szepesvári

Found 79 papers, 7 papers with code

Regret Minimization via Saddle Point Optimization

no code implementations • NeurIPS 2023 • Johannes Kirschner, Seyed Alireza Bakhtiari, Kushagra Chandak, Volodymyr Tkachuk, Csaba Szepesvári

A long line of works characterizes the sample complexity of regret minimization in sequential decision-making by min-max programs.

Decision Making

Paper
Add Code

Switching the Loss Reduces the Cost in Batch Reinforcement Learning

no code implementations • 8 Mar 2024 • Alex Ayoub, Kaiwen Wang, Vincent Liu, Samuel Robertson, James McInerney, Dawen Liang, Nathan Kallus, Csaba Szepesvári

We propose training fitted Q-iteration with log-loss (FQI-LOG) for batch reinforcement learning (RL).

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Ensemble sampling for linear bandits: small ensembles suffice

no code implementations • 14 Nov 2023 • David Janz, Alexander E. Litvak, Csaba Szepesvári

We provide the first useful and rigorous analysis of ensemble sampling for the stochastic linear bandit setting.

Paper
Add Code

Exploration via linearly perturbed loss minimisation

no code implementations • 13 Nov 2023 • David Janz, Shuai Liu, Alex Ayoub, Csaba Szepesvári

We show that, for the case of generalised linear bandits, EVILL reduces to perturbed history exploration (PHE), a method where exploration is done by training on randomly perturbed rewards.

Thompson Sampling

Paper
Add Code

Stochastic Gradient Descent for Gaussian Processes Done Right

1 code implementation • 31 Oct 2023 • Jihao Andreas Lin, Shreyas Padhy, Javier Antorán, Austin Tripp, Alexander Terenin, Csaba Szepesvári, José Miguel Hernández-Lobato, David Janz

We study the optimisation problem associated with Gaussian process regression using squared loss.

Bayesian Optimisation Gaussian Processes +1

Paper
Code

Online RL in Linearly $q^π$-Realizable MDPs Is as Easy as in Linear MDPs If You Learn What to Ignore

no code implementations • 11 Oct 2023 • Gellért Weisz, András György, Csaba Szepesvári

We consider online reinforcement learning (RL) in episodic Markov decision processes (MDPs) under the linear $q^\pi$-realizability assumption, where it is assumed that the action-values of all policies can be expressed as linear functions of state-action features.

Reinforcement Learning (RL)

Paper
Add Code

The Optimal Approximation Factors in Misspecified Off-Policy Value Function Estimation

no code implementations • 25 Jul 2023 • Philip Amortila, Nan Jiang, Csaba Szepesvári

Theoretical guarantees in reinforcement learning (RL) are known to suffer multiplicative blow-up factors with respect to the misspecification error of function approximation.

Off-policy evaluation

Paper
Add Code

Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice

1 code implementation • 22 May 2023 • Toshinori Kitamura, Tadashi Kozuno, Yunhao Tang, Nino Vieillard, Michal Valko, Wenhao Yang, Jincheng Mei, Pierre Ménard, Mohammad Gheshlaghi Azar, Rémi Munos, Olivier Pietquin, Matthieu Geist, Csaba Szepesvári, Wataru Kumagai, Yutaka Matsuo

Mirror descent value iteration (MDVI), an abstraction of Kullback-Leibler (KL) and entropy-regularized reinforcement learning (RL), has served as the basis for recent high-performing practical RL algorithms.

regression Reinforcement Learning (RL)

Paper
Code

Optimistic Natural Policy Gradient: a Simple Efficient Policy Optimization Framework for Online RL

no code implementations • NeurIPS 2023 • Qinghua Liu, Gellért Weisz, András György, Chi Jin, Csaba Szepesvári

While policy optimization algorithms have played an important role in recent empirical success of Reinforcement Learning (RL), the existing theoretical understanding of policy optimization remains rather limited -- they are either restricted to tabular MDPs or suffer from highly suboptimal sample complexity, especial in online RL where exploration is necessary.

Reinforcement Learning (RL)

Paper
Add Code

Exponential Hardness of Reinforcement Learning with Linear Function Approximation

no code implementations • 25 Feb 2023 • Daniel Kane, Sihan Liu, Shachar Lovett, Gaurav Mahajan, Csaba Szepesvári, Gellért Weisz

The rewards in this game are chosen such that if the learner achieves large reward, then the learner's actions can be used to simulate solving a variant of 3-SAT, where (a) each variable shows up in a bounded number of clauses (b) if an instance has no solutions then it also has no solutions that satisfy more than (1-$\epsilon$)-fraction of clauses.

Learning Theory reinforcement-learning +1

Paper
Add Code

Efficient Planning in Combinatorial Action Spaces with Applications to Cooperative Multi-Agent Reinforcement Learning

no code implementations • 8 Feb 2023 • Volodymyr Tkachuk, Seyed Alireza Bakhtiari, Johannes Kirschner, Matej Jusup, Ilija Bogunovic, Csaba Szepesvári

A practical challenge in reinforcement learning are combinatorial action spaces that make planning computationally demanding.

Multi-agent Reinforcement Learning reinforcement-learning +1

Paper
Add Code

Learning Lipschitz Functions by GD-trained Shallow Overparameterized ReLU Neural Networks

no code implementations • 28 Dec 2022 • Ilja Kuzborskij, Csaba Szepesvári

We explore the ability of overparameterized shallow ReLU neural networks to learn Lipschitz, nondifferentiable, bounded functions with additive noise when trained by Gradient Descent (GD).

Paper
Add Code

Revisiting Simple Regret: Fast Rates for Returning a Good Arm

no code implementations • 30 Oct 2022 • Yao Zhao, Connor James Stephens, Csaba Szepesvári, Kwang-Sung Jun

Simple regret is a natural and parameter-free performance criterion for pure exploration in multi-armed bandits yet is less popular than the probability of missing the best arm or an $\epsilon$-good arm, perhaps due to lack of easy ways to characterize it.

Multi-Armed Bandits

Paper
Add Code

Confident Approximate Policy Iteration for Efficient Local Planning in $q^π$-realizable MDPs

no code implementations • 27 Oct 2022 • Gellért Weisz, András György, Tadashi Kozuno, Csaba Szepesvári

Our first contribution is a new variant of Approximate Policy Iteration (API), called Confident Approximate Policy Iteration (CAPI), which computes a deterministic stationary policy with an optimal error bound scaling linearly with the product of the effective horizon $H$ and the worst-case approximation error $\epsilon$ of the action-value functions of stationary policies.

Paper
Add Code

Optimistic MLE -- A Generic Model-based Algorithm for Partially Observable Sequential Decision Making

no code implementations • 29 Sep 2022 • Qinghua Liu, Praneeth Netrapalli, Csaba Szepesvári, Chi Jin

We prove that OMLE learns the near-optimal policies of an enormously rich class of sequential decision making problems in a polynomial number of samples.

Decision Making Model-based Reinforcement Learning +1

Paper
Add Code

Near-Optimal Sample Complexity Bounds for Constrained MDPs

no code implementations • 13 Jun 2022 • Sharan Vaswani, Lin F. Yang, Csaba Szepesvári

In particular, we design a model-based algorithm that addresses two settings: (i) relaxed feasibility, where small constraint violations are allowed, and (ii) strict feasibility, where the output policy is required to satisfy the constraint.

Paper
Add Code

Bandit Theory and Thompson Sampling-Guided Directed Evolution for Sequence Optimization

no code implementations • 5 Jun 2022 • Hui Yuan, Chengzhuo Ni, Huazheng Wang, Xuezhou Zhang, Le Cong, Csaba Szepesvári, Mengdi Wang

We propose a Thompson Sampling-guided Directed Evolution (TS-DE) framework for sequence optimization, where the sequence-to-function mapping is unknown and querying a single value is subject to costly and noisy measurements.

BIG-bench Machine Learning Evolutionary Algorithms +2

Paper
Add Code

Sample-Efficient Reinforcement Learning of Partially Observable Markov Games

no code implementations • 2 Jun 2022 • Qinghua Liu, Csaba Szepesvári, Chi Jin

This paper considers the challenging tasks of Multi-Agent Reinforcement Learning (MARL) under partial observability, where each agent only sees her own individual observations and actions that reveal incomplete information about the underlying state of system.

Multi-agent Reinforcement Learning reinforcement-learning +1

Paper
Add Code

KL-Entropy-Regularized RL with a Generative Model is Minimax Optimal

no code implementations • 27 May 2022 • Tadashi Kozuno, Wenhao Yang, Nino Vieillard, Toshinori Kitamura, Yunhao Tang, Jincheng Mei, Pierre Ménard, Mohammad Gheshlaghi Azar, Michal Valko, Rémi Munos, Olivier Pietquin, Matthieu Geist, Csaba Szepesvári

In this work, we consider and analyze the sample complexity of model-free reinforcement learning with a generative model.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

When Is Partially Observable Reinforcement Learning Not Scary?

no code implementations • 19 Apr 2022 • Qinghua Liu, Alan Chung, Csaba Szepesvári, Chi Jin

Applications of Reinforcement Learning (RL), in which agents learn to make a sequence of decisions despite lacking complete information about the latent states of the controlled system, that is, they act under partial observability of the states, are ubiquitous.

Partially Observable Reinforcement Learning reinforcement-learning +1

Paper
Add Code

A Free Lunch from the Noise: Provable and Practical Exploration for Representation Learning

no code implementations • 22 Nov 2021 • Tongzheng Ren, Tianjun Zhang, Csaba Szepesvári, Bo Dai

Representation learning lies at the heart of the empirical success of deep learning for dealing with the curse of dimensionality.

Reinforcement Learning (RL) Representation Learning

Paper
Add Code

Optimistic Policy Optimization is Provably Efficient in Non-stationary MDPs

no code implementations • 18 Oct 2021 • Han Zhong, Zhuoran Yang, Zhaoran Wang, Csaba Szepesvári

We study episodic reinforcement learning (RL) in non-stationary linear kernel Markov decision processes (MDPs).

Reinforcement Learning (RL)

Paper
Add Code

TensorPlan and the Few Actions Lower Bound for Planning in MDPs under Linear Realizability of Optimal Value Functions

no code implementations • 5 Oct 2021 • Gellért Weisz, Csaba Szepesvári, András György

Furthermore, we show that the upper bound of TensorPlan can be extended to hold under (iii) and, for MDPs with deterministic transitions and stochastic rewards, also under (ii).

Paper
Add Code

Efficient Local Planning with Linear Function Approximation

no code implementations • 12 Aug 2021 • Dong Yin, Botao Hao, Yasin Abbasi-Yadkori, Nevena Lazić, Csaba Szepesvári

Under the assumption that the Q-functions of all policies are linear in known features of the state-action pairs, we show that our algorithms have polynomial query and computational costs in the dimension of the features, the effective planning horizon, and the targeted sub-optimality, while these costs are independent of the size of the state space.

Paper
Add Code

On the Role of Optimization in Double Descent: A Least Squares Study

no code implementations • NeurIPS 2021 • Ilja Kuzborskij, Csaba Szepesvári, Omar Rivasplata, Amal Rannen-Triki, Razvan Pascanu

Empirically it has been observed that the performance of deep neural networks steadily improves as we increase model size, contradicting the classical view on overfitting and generalization.

Paper
Add Code

No Regrets for Learning the Prior in Bandits

no code implementations • NeurIPS 2021 • Soumya Basu, Branislav Kveton, Manzil Zaheer, Csaba Szepesvári

We propose ${\tt AdaTS}$, a Thompson sampling algorithm that adapts sequentially to bandit tasks that it interacts with.

Thompson Sampling

Paper
Add Code

Nonparametric Regression with Shallow Overparameterized Neural Networks Trained by GD with Early Stopping

no code implementations • 12 Jul 2021 • Ilja Kuzborskij, Csaba Szepesvári

We explore the ability of overparameterized shallow neural networks to learn Lipschitz regression functions with and without label noise when trained by Gradient Descent (GD).

regression

Paper
Add Code

Optimization Issues in KL-Constrained Approximate Policy Iteration

no code implementations • 11 Feb 2021 • Nevena Lazić, Botao Hao, Yasin Abbasi-Yadkori, Dale Schuurmans, Csaba Szepesvári

We compare the use of KL divergence as a constraint vs. as a regularizer, and point out several optimization issues with the widely-used constrained approach.

Paper
Add Code

Bootstrapping Fitted Q-Evaluation for Off-Policy Inference

no code implementations • 6 Feb 2021 • Botao Hao, Xiang Ji, Yaqi Duan, Hao Lu, Csaba Szepesvári, Mengdi Wang

Bootstrapping provides a flexible and effective approach for assessing the quality of batch reinforcement learning, yet its theoretical property is less understood.

Off-policy evaluation

Paper
Add Code

On Query-efficient Planning in MDPs under Linear Realizability of the Optimal State-value Function

no code implementations • 3 Feb 2021 • Gellért Weisz, Philip Amortila, Barnabás Janzer, Yasin Abbasi-Yadkori, Nan Jiang, Csaba Szepesvári

We consider local planning in fixed-horizon MDPs with a generative model under the assumption that the optimal value function lies close to the span of a feature map.

Open-Ended Question Answering

Paper
Add Code

Asymptotically Optimal Information-Directed Sampling

no code implementations • 11 Nov 2020 • Johannes Kirschner, Tor Lattimore, Claire Vernade, Csaba Szepesvári

We introduce a simple and efficient algorithm for stochastic linear bandits with finitely many actions that is asymptotically optimal and (nearly) worst-case optimal in finite time.

Paper
Add Code

Sparse Feature Selection Makes Batch Reinforcement Learning More Sample Efficient

no code implementations • 8 Nov 2020 • Botao Hao, Yaqi Duan, Tor Lattimore, Csaba Szepesvári, Mengdi Wang

To evaluate a new target policy, we analyze a Lasso fitted Q-evaluation method and establish a finite-sample error bound that has no polynomial dependence on the ambient dimension.

feature selection Model Selection +2

Paper
Add Code

Online Sparse Reinforcement Learning

no code implementations • 8 Nov 2020 • Botao Hao, Tor Lattimore, Csaba Szepesvári, Mengdi Wang

First, we provide a lower bound showing that linear regret is generally unavoidable in this case, even if there exists a policy that collects well-conditioned data.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

A Distribution-Dependent Analysis of Meta-Learning

no code implementations • 31 Oct 2020 • Mikhail Konobeev, Ilja Kuzborskij, Csaba Szepesvári

A key problem in the theory of meta-learning is to understand how the task distributions influence transfer risk, the expected error of a meta-learner on a new task drawn from the unknown task distribution.

Meta-Learning regression +1

Paper
Add Code

Online Algorithm for Unsupervised Sequential Selection with Contextual Information

no code implementations • NeurIPS 2020 • Arun Verma, Manjesh K. Hanawal, Csaba Szepesvári, Venkatesh Saligrama

In this paper, we study Contextual Unsupervised Sequential Selection (USS), a new variant of the stochastic contextual bandits problem where the loss of an arm cannot be inferred from the observed feedback.

Multi-Armed Bandits

Paper
Add Code

CoinDICE: Off-Policy Confidence Interval Estimation

no code implementations • NeurIPS 2020 • Bo Dai, Ofir Nachum, Yinlam Chow, Lihong Li, Csaba Szepesvári, Dale Schuurmans

We study high-confidence behavior-agnostic off-policy evaluation in reinforcement learning, where the goal is to estimate a confidence interval on a target policy's value, given only access to a static experience dataset collected by unknown behavior policies.

Off-policy evaluation valid

Paper
Add Code

Exponential Lower Bounds for Planning in MDPs With Linearly-Realizable Optimal Action-Value Functions

no code implementations • 3 Oct 2020 • Gellért Weisz, Philip Amortila, Csaba Szepesvári

We consider the problem of local planning in fixed-horizon and discounted Markov Decision Processes (MDPs) with linear function approximation and a generative model under the assumption that the optimal action-value function lies in the span of a feature map that is available to the planner.

Paper
Add Code

Tighter risk certificates for neural networks

1 code implementation • 25 Jul 2020 • María Pérez-Ortiz, Omar Rivasplata, John Shawe-Taylor, Csaba Szepesvári

In the context of probabilistic neural networks, the output of training is a probability distribution over network weights.

Model Selection

Paper
Code

Efficient Planning in Large MDPs with Weak Linear Function Approximation

no code implementations • NeurIPS 2020 • Roshan Shariff, Csaba Szepesvári

Large-scale Markov decision processes (MDPs) require planning algorithms with runtime independent of the number of states of the MDP.

Paper
Add Code

Confident Off-Policy Evaluation and Selection through Self-Normalized Importance Weighting

1 code implementation • 18 Jun 2020 • Ilja Kuzborskij, Claire Vernade, András György, Csaba Szepesvári

We consider off-policy evaluation in the contextual bandit setting for the purpose of obtaining a robust off-policy selection strategy, where the selection strategy is evaluated based on the value of the chosen policy in a set of proposal (target) policies.

Multi-Armed Bandits Off-policy evaluation

Paper
Code

Efron-Stein PAC-Bayesian Inequalities

no code implementations • 4 Sep 2019 • Ilja Kuzborskij, Csaba Szepesvári

We prove semi-empirical concentration inequalities for random variables which are given as possibly nonlinear functions of independent random variables.

Generalization Bounds Off-policy evaluation

Paper
Add Code

Detecting Overfitting via Adversarial Examples

no code implementations • NeurIPS 2019 • Roman Werpachowski, András György, Csaba Szepesvári

It utilizes a new unbiased error estimate that is based on adversarial examples generated from the test data and importance weighting.

General Classification Image Classification

Paper
Add Code

Distribution-Dependent Analysis of Gibbs-ERM Principle

no code implementations • 5 Feb 2019 • Ilja Kuzborskij, Nicolò Cesa-Bianchi, Csaba Szepesvári

This is a well-established notion of effective dimension appearing in several previous works, including the analyses of SGD and ridge regression, but ours is the first work that brings this dimension to the analysis of learning using Gibbs densities.

Stochastic Optimization

Paper
Add Code

Online Algorithm for Unsupervised Sensor Selection

no code implementations • 15 Jan 2019 • Arun Verma, Manjesh K. Hanawal, Csaba Szepesvári, Venkatesh Saligrama

We set up the USS problem as a stochastic partial monitoring problem and develop an algorithm with sub-linear regret under the WD property.

Paper
Add Code

Online Learning to Rank with Features

no code implementations • 5 Oct 2018 • Shuai Li, Tor Lattimore, Csaba Szepesvári

We introduce a new model for online ranking in which the click probability factors into an examination and attractiveness function and the attractiveness function is a linear function of a feature vector and an unknown parameter.

Learning-To-Rank

Paper
Add Code

LeapsAndBounds: A Method for Approximately Optimal Algorithm Configuration

1 code implementation • ICML 2018 • Gellért Weisz, András György, Csaba Szepesvári

We consider the problem of configuring general-purpose solvers to run efficiently on problem instances drawn from an unknown distribution.

Paper
Code

Linear Stochastic Approximation: Constant Step-Size and Iterate Averaging

no code implementations • 12 Sep 2017 • Chandrashekar Lakshminarayanan, Csaba Szepesvári

For a given LSA with PR averaging, and data distribution $P$ satisfying the said assumptions, we show that there exists a range of constant step-sizes such that its MSE decays as $O(\frac{1}{t})$.

Reinforcement Learning (RL)

Paper
Add Code

A Modular Analysis of Adaptive (Non-)Convex Optimization: Optimism, Composite Objectives, and Variational Bounds

no code implementations • 8 Sep 2017 • Pooria Joulani, András György, Csaba Szepesvári

Recently, much work has been done on extending the scope of online learning and incremental stochastic optimization algorithms.

Stochastic Optimization

Paper
Add Code

Mixing time estimation in reversible Markov chains from a single sample path

no code implementations • NeurIPS 2015 • Daniel Hsu, Aryeh Kontorovich, David A. Levin, Yuval Peres, Csaba Szepesvári

The interval is constructed around the relaxation time $t_{\text{relax}} = 1/\gamma$, which is strongly related to the mixing time, and the width of the interval converges to zero roughly at a $1/\sqrt{n}$ rate, where $n$ is the length of the sample path.

Paper
Add Code

Structured Best Arm Identification with Fixed Confidence

no code implementations • 16 Jun 2017 • Ruitong Huang, Mohammad M. Ajallooeian, Csaba Szepesvári, Martin Müller

We study the problem of identifying the best action among a set of possible options when the value of each action is given by a mapping from a number of noisy micro-observables in the so-called fixed confidence setting.

Paper
Add Code

Bernoulli Rank-$1$ Bandits for Click Feedback

no code implementations • 19 Mar 2017 • Sumeet Katariya, Branislav Kveton, Csaba Szepesvári, Claire Vernade, Zheng Wen

The probability that a user will click a search result depends both on its relevance and its position on the results page.

Position

Paper
Add Code

Following the Leader and Fast Rates in Linear Prediction: Curved Constraint Sets and Other Regularities

no code implementations • NeurIPS 2016 • Ruitong Huang, Tor Lattimore, András György, Csaba Szepesvári

The follow the leader (FTL) algorithm, perhaps the simplest of all online learning algorithms, is known to perform well when the loss functions it is used on are convex and positively curved.

Paper
Add Code

SDP Relaxation with Randomized Rounding for Energy Disaggregation

2 code implementations • NeurIPS 2016 • Kiarash Shaloudegi, András György, Csaba Szepesvári, Wilsun Xu

We develop a scalable, computationally efficient method for the task of energy disaggregation for home appliance monitoring.

Total Energy

Paper
Code

(Bandit) Convex Optimization with Biased Noisy Gradient Oracles

no code implementations • 22 Sep 2016 • Xiaowei Hu, Prashanth L. A., András György, Csaba Szepesvári

Algorithms for bandit convex optimization and online learning often rely on constructing noisy gradient estimates, which are then used in appropriately adjusted first-order algorithms, replacing actual gradients.

Paper
Add Code

Multiclass Classification Calibration Functions

no code implementations • 20 Sep 2016 • Bernardo Ávila Pires, Csaba Szepesvári

We devise a streamlined analysis that simplifies the process of deriving calibration functions for a large number of surrogate losses that have been proposed in the literature.

Classification General Classification

Paper
Add Code

Chaining Bounds for Empirical Risk Minimization

no code implementations • 7 Sep 2016 • Gábor Balázs, András György, Csaba Szepesvári

This paper extends the standard chaining technique to prove excess risk upper bounds for empirical risk minimization with random design settings even if the magnitude of the noise and the estimates is unbounded.

regression

Paper
Add Code

Policy Error Bounds for Model-Based Reinforcement Learning with Factored Linear Models

no code implementations • 19 Feb 2016 • Bernardo Ávila Pires, Csaba Szepesvári

In this paper we study a model-based approach to calculating approximately optimal policies in Markovian Decision Processes.

Model-based Reinforcement Learning reinforcement-learning +1

Paper
Add Code

Conservative Bandits

no code implementations • 13 Feb 2016 • Yifan Wu, Roshan Shariff, Tor Lattimore, Csaba Szepesvári

We consider both the stochastic and the adversarial settings, where we propose, natural, yet novel strategies and analyze the price for maintaining the constraints.

Paper
Add Code

DCM Bandits: Learning to Rank with Multiple Clicks

1 code implementation • 9 Feb 2016 • Sumeet Katariya, Branislav Kveton, Csaba Szepesvári, Zheng Wen

This work presents the first practical and regret-optimal online algorithm for learning to rank with multiple clicks in a cascade-like click model.

Learning-To-Rank

Paper
Code

Online Learning with Gaussian Payoffs and Side Observations

no code implementations • NeurIPS 2015 • Yifan Wu, András György, Csaba Szepesvári

For the first time in the literature, we provide non-asymptotic problem-dependent lower bounds on the regret of any algorithm, which recover existing asymptotic problem-dependent lower bounds and finite-time minimax lower bounds available in the literature.

Paper
Add Code

Fast Cross-Validation for Incremental Learning

no code implementations • 30 Jun 2015 • Pooria Joulani, András György, Csaba Szepesvári

Cross-validation (CV) is one of the main tools for performance estimation and parameter tuning in machine learning.

Incremental Learning

Paper
Add Code

Mixing Time Estimation in Reversible Markov Chains from a Single Sample Path

no code implementations • NeurIPS 2015 • Daniel Hsu, Aryeh Kontorovich, Csaba Szepesvári

The interval is constructed around the relaxation time $t_{\text{relax}}$, which is strongly related to the mixing time, and the width of the interval converges to zero roughly at a $\sqrt{n}$ rate, where $n$ is the length of the sample path.

Paper
Add Code

Cumulative Prospect Theory Meets Reinforcement Learning: Prediction and Control

no code implementations • 8 Jun 2015 • Prashanth L. A., Cheng Jie, Michael Fu, Steve Marcus, Csaba Szepesvári

Cumulative prospect theory (CPT) is known to model human decisions well, with substantial empirical evidence supporting this claim.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Optimal Resource Allocation with Semi-Bandit Feedback

no code implementations • 15 Jun 2014 • Tor Lattimore, Koby Crammer, Csaba Szepesvári

We study a sequential resource allocation problem involving a fixed number of recurring jobs.

Paper
Add Code

Adaptive Monte Carlo via Bandit Allocation

no code implementations • 13 May 2014 • James Neufeld, András György, Dale Schuurmans, Csaba Szepesvári

We consider the problem of sequentially choosing between a set of unbiased Monte Carlo estimators to minimize the mean-squared-error (MSE) of a final combined estimate.

Paper
Add Code

Online Learning under Delayed Feedback

no code implementations • 4 Jun 2013 • Pooria Joulani, András György, Csaba Szepesvári

Online learning with delayed feedback has received increasing attention recently due to its several applications in distributed, web-based learning problems.

Paper
Add Code

Deep Representations and Codes for Image Auto-Annotation

no code implementations • NeurIPS 2012 • Ryan Kiros, Csaba Szepesvári

The task of assigning a set of relevant tags to an image is challenging due to the size and variability of tag vocabularies.

feature selection TAG

Paper
Add Code

Improved Algorithms for Linear Stochastic Bandits

no code implementations • NeurIPS 2011 • Yasin Abbasi-Yadkori, Dávid Pál, Csaba Szepesvári

We improve the theoretical analysis and empirical performance of algorithms for the stochastic multi-armed bandit problem and the linear stochastic multi-armed bandit problem.

Paper
Add Code

Parametric Bandits: The Generalized Linear Case

no code implementations • NeurIPS 2010 • Sarah Filippi, Olivier Cappe, Aurélien Garivier, Csaba Szepesvári

We consider structured multi-armed bandit tasks in which the agent is guided by prior structural knowledge that can be exploited to efficiently select the optimal arm(s) in situations where the number of arms is large, or even infinite.

Paper
Add Code

Online Markov Decision Processes under Bandit Feedback

no code implementations • NeurIPS 2010 • Gergely Neu, Andras Antos, András György, Csaba Szepesvári

We consider online learning in finite stochastic Markovian environments where in each time step a new reward function is chosen by an oblivious adversary.

Paper
Add Code

Error Propagation for Approximate Policy and Value Iteration

no code implementations • NeurIPS 2010 • Amir-Massoud Farahmand, Csaba Szepesvári, Rémi Munos

We address the question of how the approximation error/Bellman residual at each iteration of the Approximate Policy/Value Iteration algorithms influences the quality of the resulted policy.

Paper
Add Code

Estimation of Rényi Entropy and Mutual Information Based on Generalized Nearest-Neighbor Graphs

no code implementations • NeurIPS 2010 • Dávid Pál, Barnabás Póczos, Csaba Szepesvári

We present simple and computationally efficient nonparametric estimators of R\'enyi entropy and mutual information based on an i. i. d.

Paper
Add Code

A General Projection Property for Distribution Families

no code implementations • NeurIPS 2009 • Yao-Liang Yu, Yuxi Li, Dale Schuurmans, Csaba Szepesvári

We prove that linear projections between distribution families with fixed first and second moments are surjective, regardless of dimension.

Paper
Add Code

Multi-Step Dyna Planning for Policy Evaluation and Control

no code implementations • NeurIPS 2009 • Hengshuai Yao, Shalabh Bhatnagar, Dongcui Diao, Richard S. Sutton, Csaba Szepesvári

We extend Dyna planning architecture for policy evaluation and control in two significant aspects.

Paper
Add Code

Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation

no code implementations • NeurIPS 2009 • Shalabh Bhatnagar, Doina Precup, David Silver, Richard S. Sutton, Hamid R. Maei, Csaba Szepesvári

We introduce the first temporal-difference learning algorithms that converge with smooth value function approximators, such as neural networks.

Q-Learning

Paper
Add Code

A Convergent O(n) Temporal-difference Algorithm for Off-policy Learning with Linear Function Approximation

no code implementations • NeurIPS 2008 • Richard S. Sutton, Hamid R. Maei, Csaba Szepesvári

We introduce the first temporal-difference learning algorithm that is stable with linear function approximation and off-policy training, for any finite Markov decision process, target policy, and exciting behavior policy, and whose complexity scales linearly in the number of parameters.

Paper
Add Code

Regularized Policy Iteration

no code implementations • NeurIPS 2008 • Amir M. Farahmand, Mohammad Ghavamzadeh, Shie Mannor, Csaba Szepesvári

In this paper we consider approximate policy-iteration-based reinforcement learning algorithms.

L2 Regularization reinforcement-learning +1

Paper
Add Code

Online Optimization in X-Armed Bandits

no code implementations • NeurIPS 2008 • Sébastien Bubeck, Gilles Stoltz, Csaba Szepesvári, Rémi Munos

We consider a generalization of stochastic bandit problems where the set of arms, X, is allowed to be a generic topological space.

Paper
Add Code

Fitted Q-iteration in continuous action-space MDPs

no code implementations • NeurIPS 2007 • András Antos, Csaba Szepesvári, Rémi Munos

We consider continuous state, continuous action batch reinforcement learning where the goal is to learn a good policy from a sufficiently rich trajectory generated by another policy.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.