Search Results for author: Csaba Szepesvári

Found 77 papers, 7 papers with code

Ensemble sampling for linear bandits: small ensembles suffice

no code implementations14 Nov 2023 David Janz, Alexander E. Litvak, Csaba Szepesvári

We provide the first useful, rigorous analysis of ensemble sampling for the stochastic linear bandit setting.

Exploration via linearly perturbed loss minimisation

1 code implementation13 Nov 2023 David Janz, Shuai Liu, Alex Ayoub, Csaba Szepesvári

We show that, for the case of generalised linear bandits, EVILL reduces to perturbed history exploration (PHE), a method where exploration is done by training on randomly perturbed rewards.

Thompson Sampling

Online RL in Linearly $q^π$-Realizable MDPs Is as Easy as in Linear MDPs If You Learn What to Ignore

no code implementations11 Oct 2023 Gellért Weisz, András György, Csaba Szepesvári

We consider online reinforcement learning (RL) in episodic Markov decision processes (MDPs) under the linear $q^\pi$-realizability assumption, where it is assumed that the action-values of all policies can be expressed as linear functions of state-action features.

Reinforcement Learning (RL)

The Optimal Approximation Factors in Misspecified Off-Policy Value Function Estimation

no code implementations25 Jul 2023 Philip Amortila, Nan Jiang, Csaba Szepesvári

Theoretical guarantees in reinforcement learning (RL) are known to suffer multiplicative blow-up factors with respect to the misspecification error of function approximation.

Off-policy evaluation

Optimistic Natural Policy Gradient: a Simple Efficient Policy Optimization Framework for Online RL

no code implementations NeurIPS 2023 Qinghua Liu, Gellért Weisz, András György, Chi Jin, Csaba Szepesvári

While policy optimization algorithms have played an important role in recent empirical success of Reinforcement Learning (RL), the existing theoretical understanding of policy optimization remains rather limited -- they are either restricted to tabular MDPs or suffer from highly suboptimal sample complexity, especial in online RL where exploration is necessary.

Reinforcement Learning (RL)

Exponential Hardness of Reinforcement Learning with Linear Function Approximation

no code implementations25 Feb 2023 Daniel Kane, Sihan Liu, Shachar Lovett, Gaurav Mahajan, Csaba Szepesvári, Gellért Weisz

The rewards in this game are chosen such that if the learner achieves large reward, then the learner's actions can be used to simulate solving a variant of 3-SAT, where (a) each variable shows up in a bounded number of clauses (b) if an instance has no solutions then it also has no solutions that satisfy more than (1-$\epsilon$)-fraction of clauses.

Learning Theory reinforcement-learning +1

Learning Lipschitz Functions by GD-trained Shallow Overparameterized ReLU Neural Networks

no code implementations28 Dec 2022 Ilja Kuzborskij, Csaba Szepesvári

We explore the ability of overparameterized shallow ReLU neural networks to learn Lipschitz, nondifferentiable, bounded functions with additive noise when trained by Gradient Descent (GD).

Revisiting Simple Regret: Fast Rates for Returning a Good Arm

no code implementations30 Oct 2022 Yao Zhao, Connor James Stephens, Csaba Szepesvári, Kwang-Sung Jun

Simple regret is a natural and parameter-free performance criterion for pure exploration in multi-armed bandits yet is less popular than the probability of missing the best arm or an $\epsilon$-good arm, perhaps due to lack of easy ways to characterize it.

Multi-Armed Bandits

Confident Approximate Policy Iteration for Efficient Local Planning in $q^π$-realizable MDPs

no code implementations27 Oct 2022 Gellért Weisz, András György, Tadashi Kozuno, Csaba Szepesvári

Our first contribution is a new variant of Approximate Policy Iteration (API), called Confident Approximate Policy Iteration (CAPI), which computes a deterministic stationary policy with an optimal error bound scaling linearly with the product of the effective horizon $H$ and the worst-case approximation error $\epsilon$ of the action-value functions of stationary policies.

Optimistic MLE -- A Generic Model-based Algorithm for Partially Observable Sequential Decision Making

no code implementations29 Sep 2022 Qinghua Liu, Praneeth Netrapalli, Csaba Szepesvári, Chi Jin

We prove that OMLE learns the near-optimal policies of an enormously rich class of sequential decision making problems in a polynomial number of samples.

Decision Making Model-based Reinforcement Learning +1

Near-Optimal Sample Complexity Bounds for Constrained MDPs

no code implementations13 Jun 2022 Sharan Vaswani, Lin F. Yang, Csaba Szepesvári

In particular, we design a model-based algorithm that addresses two settings: (i) relaxed feasibility, where small constraint violations are allowed, and (ii) strict feasibility, where the output policy is required to satisfy the constraint.

Bandit Theory and Thompson Sampling-Guided Directed Evolution for Sequence Optimization

no code implementations5 Jun 2022 Hui Yuan, Chengzhuo Ni, Huazheng Wang, Xuezhou Zhang, Le Cong, Csaba Szepesvári, Mengdi Wang

We propose a Thompson Sampling-guided Directed Evolution (TS-DE) framework for sequence optimization, where the sequence-to-function mapping is unknown and querying a single value is subject to costly and noisy measurements.

BIG-bench Machine Learning Evolutionary Algorithms +2

Sample-Efficient Reinforcement Learning of Partially Observable Markov Games

no code implementations2 Jun 2022 Qinghua Liu, Csaba Szepesvári, Chi Jin

This paper considers the challenging tasks of Multi-Agent Reinforcement Learning (MARL) under partial observability, where each agent only sees her own individual observations and actions that reveal incomplete information about the underlying state of system.

Multi-agent Reinforcement Learning reinforcement-learning +1

When Is Partially Observable Reinforcement Learning Not Scary?

no code implementations19 Apr 2022 Qinghua Liu, Alan Chung, Csaba Szepesvári, Chi Jin

Applications of Reinforcement Learning (RL), in which agents learn to make a sequence of decisions despite lacking complete information about the latent states of the controlled system, that is, they act under partial observability of the states, are ubiquitous.

Partially Observable Reinforcement Learning reinforcement-learning +1

A Free Lunch from the Noise: Provable and Practical Exploration for Representation Learning

no code implementations22 Nov 2021 Tongzheng Ren, Tianjun Zhang, Csaba Szepesvári, Bo Dai

Representation learning lies at the heart of the empirical success of deep learning for dealing with the curse of dimensionality.

Reinforcement Learning (RL) Representation Learning

Optimistic Policy Optimization is Provably Efficient in Non-stationary MDPs

no code implementations18 Oct 2021 Han Zhong, Zhuoran Yang, Zhaoran Wang, Csaba Szepesvári

We study episodic reinforcement learning (RL) in non-stationary linear kernel Markov decision processes (MDPs).

Reinforcement Learning (RL)

TensorPlan and the Few Actions Lower Bound for Planning in MDPs under Linear Realizability of Optimal Value Functions

no code implementations5 Oct 2021 Gellért Weisz, Csaba Szepesvári, András György

Furthermore, we show that the upper bound of TensorPlan can be extended to hold under (iii) and, for MDPs with deterministic transitions and stochastic rewards, also under (ii).

Efficient Local Planning with Linear Function Approximation

no code implementations12 Aug 2021 Dong Yin, Botao Hao, Yasin Abbasi-Yadkori, Nevena Lazić, Csaba Szepesvári

Under the assumption that the Q-functions of all policies are linear in known features of the state-action pairs, we show that our algorithms have polynomial query and computational costs in the dimension of the features, the effective planning horizon, and the targeted sub-optimality, while these costs are independent of the size of the state space.

On the Role of Optimization in Double Descent: A Least Squares Study

no code implementations NeurIPS 2021 Ilja Kuzborskij, Csaba Szepesvári, Omar Rivasplata, Amal Rannen-Triki, Razvan Pascanu

Empirically it has been observed that the performance of deep neural networks steadily improves as we increase model size, contradicting the classical view on overfitting and generalization.

No Regrets for Learning the Prior in Bandits

no code implementations NeurIPS 2021 Soumya Basu, Branislav Kveton, Manzil Zaheer, Csaba Szepesvári

We propose ${\tt AdaTS}$, a Thompson sampling algorithm that adapts sequentially to bandit tasks that it interacts with.

Thompson Sampling

Nonparametric Regression with Shallow Overparameterized Neural Networks Trained by GD with Early Stopping

no code implementations12 Jul 2021 Ilja Kuzborskij, Csaba Szepesvári

We explore the ability of overparameterized shallow neural networks to learn Lipschitz regression functions with and without label noise when trained by Gradient Descent (GD).


Optimization Issues in KL-Constrained Approximate Policy Iteration

no code implementations11 Feb 2021 Nevena Lazić, Botao Hao, Yasin Abbasi-Yadkori, Dale Schuurmans, Csaba Szepesvári

We compare the use of KL divergence as a constraint vs. as a regularizer, and point out several optimization issues with the widely-used constrained approach.

Bootstrapping Fitted Q-Evaluation for Off-Policy Inference

no code implementations6 Feb 2021 Botao Hao, Xiang Ji, Yaqi Duan, Hao Lu, Csaba Szepesvári, Mengdi Wang

Bootstrapping provides a flexible and effective approach for assessing the quality of batch reinforcement learning, yet its theoretical property is less understood.

Off-policy evaluation

On Query-efficient Planning in MDPs under Linear Realizability of the Optimal State-value Function

no code implementations3 Feb 2021 Gellért Weisz, Philip Amortila, Barnabás Janzer, Yasin Abbasi-Yadkori, Nan Jiang, Csaba Szepesvári

We consider local planning in fixed-horizon MDPs with a generative model under the assumption that the optimal value function lies close to the span of a feature map.

Open-Ended Question Answering

Asymptotically Optimal Information-Directed Sampling

no code implementations11 Nov 2020 Johannes Kirschner, Tor Lattimore, Claire Vernade, Csaba Szepesvári

We introduce a simple and efficient algorithm for stochastic linear bandits with finitely many actions that is asymptotically optimal and (nearly) worst-case optimal in finite time.

Online Sparse Reinforcement Learning

no code implementations8 Nov 2020 Botao Hao, Tor Lattimore, Csaba Szepesvári, Mengdi Wang

First, we provide a lower bound showing that linear regret is generally unavoidable in this case, even if there exists a policy that collects well-conditioned data.

reinforcement-learning Reinforcement Learning (RL)

Sparse Feature Selection Makes Batch Reinforcement Learning More Sample Efficient

no code implementations8 Nov 2020 Botao Hao, Yaqi Duan, Tor Lattimore, Csaba Szepesvári, Mengdi Wang

To evaluate a new target policy, we analyze a Lasso fitted Q-evaluation method and establish a finite-sample error bound that has no polynomial dependence on the ambient dimension.

feature selection Model Selection +2

A Distribution-Dependent Analysis of Meta-Learning

no code implementations31 Oct 2020 Mikhail Konobeev, Ilja Kuzborskij, Csaba Szepesvári

A key problem in the theory of meta-learning is to understand how the task distributions influence transfer risk, the expected error of a meta-learner on a new task drawn from the unknown task distribution.

Meta-Learning regression +1

Online Algorithm for Unsupervised Sequential Selection with Contextual Information

no code implementations NeurIPS 2020 Arun Verma, Manjesh K. Hanawal, Csaba Szepesvári, Venkatesh Saligrama

In this paper, we study Contextual Unsupervised Sequential Selection (USS), a new variant of the stochastic contextual bandits problem where the loss of an arm cannot be inferred from the observed feedback.

Multi-Armed Bandits

CoinDICE: Off-Policy Confidence Interval Estimation

no code implementations NeurIPS 2020 Bo Dai, Ofir Nachum, Yinlam Chow, Lihong Li, Csaba Szepesvári, Dale Schuurmans

We study high-confidence behavior-agnostic off-policy evaluation in reinforcement learning, where the goal is to estimate a confidence interval on a target policy's value, given only access to a static experience dataset collected by unknown behavior policies.

Off-policy evaluation valid

Exponential Lower Bounds for Planning in MDPs With Linearly-Realizable Optimal Action-Value Functions

no code implementations3 Oct 2020 Gellért Weisz, Philip Amortila, Csaba Szepesvári

We consider the problem of local planning in fixed-horizon and discounted Markov Decision Processes (MDPs) with linear function approximation and a generative model under the assumption that the optimal action-value function lies in the span of a feature map that is available to the planner.

Tighter risk certificates for neural networks

1 code implementation25 Jul 2020 María Pérez-Ortiz, Omar Rivasplata, John Shawe-Taylor, Csaba Szepesvári

In the context of probabilistic neural networks, the output of training is a probability distribution over network weights.

Model Selection

Efficient Planning in Large MDPs with Weak Linear Function Approximation

no code implementations NeurIPS 2020 Roshan Shariff, Csaba Szepesvári

Large-scale Markov decision processes (MDPs) require planning algorithms with runtime independent of the number of states of the MDP.

Confident Off-Policy Evaluation and Selection through Self-Normalized Importance Weighting

1 code implementation18 Jun 2020 Ilja Kuzborskij, Claire Vernade, András György, Csaba Szepesvári

We consider off-policy evaluation in the contextual bandit setting for the purpose of obtaining a robust off-policy selection strategy, where the selection strategy is evaluated based on the value of the chosen policy in a set of proposal (target) policies.

Multi-Armed Bandits Off-policy evaluation

Efron-Stein PAC-Bayesian Inequalities

no code implementations4 Sep 2019 Ilja Kuzborskij, Csaba Szepesvári

We prove semi-empirical concentration inequalities for random variables which are given as possibly nonlinear functions of independent random variables.

Generalization Bounds Off-policy evaluation

Detecting Overfitting via Adversarial Examples

no code implementations NeurIPS 2019 Roman Werpachowski, András György, Csaba Szepesvári

It utilizes a new unbiased error estimate that is based on adversarial examples generated from the test data and importance weighting.

General Classification Image Classification +1

Distribution-Dependent Analysis of Gibbs-ERM Principle

no code implementations5 Feb 2019 Ilja Kuzborskij, Nicolò Cesa-Bianchi, Csaba Szepesvári

This is a well-established notion of effective dimension appearing in several previous works, including the analyses of SGD and ridge regression, but ours is the first work that brings this dimension to the analysis of learning using Gibbs densities.

Stochastic Optimization

Online Algorithm for Unsupervised Sensor Selection

no code implementations15 Jan 2019 Arun Verma, Manjesh K. Hanawal, Csaba Szepesvári, Venkatesh Saligrama

We set up the USS problem as a stochastic partial monitoring problem and develop an algorithm with sub-linear regret under the WD property.


Online Learning to Rank with Features

no code implementations5 Oct 2018 Shuai Li, Tor Lattimore, Csaba Szepesvári

We introduce a new model for online ranking in which the click probability factors into an examination and attractiveness function and the attractiveness function is a linear function of a feature vector and an unknown parameter.


LeapsAndBounds: A Method for Approximately Optimal Algorithm Configuration

1 code implementation ICML 2018 Gellért Weisz, András György, Csaba Szepesvári

We consider the problem of configuring general-purpose solvers to run efficiently on problem instances drawn from an unknown distribution.

Linear Stochastic Approximation: Constant Step-Size and Iterate Averaging

no code implementations12 Sep 2017 Chandrashekar Lakshminarayanan, Csaba Szepesvári

For a given LSA with PR averaging, and data distribution $P$ satisfying the said assumptions, we show that there exists a range of constant step-sizes such that its MSE decays as $O(\frac{1}{t})$.

Reinforcement Learning (RL)

A Modular Analysis of Adaptive (Non-)Convex Optimization: Optimism, Composite Objectives, and Variational Bounds

no code implementations8 Sep 2017 Pooria Joulani, András György, Csaba Szepesvári

Recently, much work has been done on extending the scope of online learning and incremental stochastic optimization algorithms.

Stochastic Optimization

Mixing time estimation in reversible Markov chains from a single sample path

no code implementations NeurIPS 2015 Daniel Hsu, Aryeh Kontorovich, David A. Levin, Yuval Peres, Csaba Szepesvári

The interval is constructed around the relaxation time $t_{\text{relax}} = 1/\gamma$, which is strongly related to the mixing time, and the width of the interval converges to zero roughly at a $1/\sqrt{n}$ rate, where $n$ is the length of the sample path.

Structured Best Arm Identification with Fixed Confidence

no code implementations16 Jun 2017 Ruitong Huang, Mohammad M. Ajallooeian, Csaba Szepesvári, Martin Müller

We study the problem of identifying the best action among a set of possible options when the value of each action is given by a mapping from a number of noisy micro-observables in the so-called fixed confidence setting.

Bernoulli Rank-$1$ Bandits for Click Feedback

no code implementations19 Mar 2017 Sumeet Katariya, Branislav Kveton, Csaba Szepesvári, Claire Vernade, Zheng Wen

The probability that a user will click a search result depends both on its relevance and its position on the results page.


Following the Leader and Fast Rates in Linear Prediction: Curved Constraint Sets and Other Regularities

no code implementations NeurIPS 2016 Ruitong Huang, Tor Lattimore, András György, Csaba Szepesvári

The follow the leader (FTL) algorithm, perhaps the simplest of all online learning algorithms, is known to perform well when the loss functions it is used on are convex and positively curved.

SDP Relaxation with Randomized Rounding for Energy Disaggregation

2 code implementations NeurIPS 2016 Kiarash Shaloudegi, András György, Csaba Szepesvári, Wilsun Xu

We develop a scalable, computationally efficient method for the task of energy disaggregation for home appliance monitoring.

Total Energy

(Bandit) Convex Optimization with Biased Noisy Gradient Oracles

no code implementations22 Sep 2016 Xiaowei Hu, Prashanth L. A., András György, Csaba Szepesvári

Algorithms for bandit convex optimization and online learning often rely on constructing noisy gradient estimates, which are then used in appropriately adjusted first-order algorithms, replacing actual gradients.

Multiclass Classification Calibration Functions

no code implementations20 Sep 2016 Bernardo Ávila Pires, Csaba Szepesvári

We devise a streamlined analysis that simplifies the process of deriving calibration functions for a large number of surrogate losses that have been proposed in the literature.

Classification General Classification

Chaining Bounds for Empirical Risk Minimization

no code implementations7 Sep 2016 Gábor Balázs, András György, Csaba Szepesvári

This paper extends the standard chaining technique to prove excess risk upper bounds for empirical risk minimization with random design settings even if the magnitude of the noise and the estimates is unbounded.


Conservative Bandits

no code implementations13 Feb 2016 Yifan Wu, Roshan Shariff, Tor Lattimore, Csaba Szepesvári

We consider both the stochastic and the adversarial settings, where we propose, natural, yet novel strategies and analyze the price for maintaining the constraints.

DCM Bandits: Learning to Rank with Multiple Clicks

1 code implementation9 Feb 2016 Sumeet Katariya, Branislav Kveton, Csaba Szepesvári, Zheng Wen

This work presents the first practical and regret-optimal online algorithm for learning to rank with multiple clicks in a cascade-like click model.


Online Learning with Gaussian Payoffs and Side Observations

no code implementations NeurIPS 2015 Yifan Wu, András György, Csaba Szepesvári

For the first time in the literature, we provide non-asymptotic problem-dependent lower bounds on the regret of any algorithm, which recover existing asymptotic problem-dependent lower bounds and finite-time minimax lower bounds available in the literature.

Fast Cross-Validation for Incremental Learning

no code implementations30 Jun 2015 Pooria Joulani, András György, Csaba Szepesvári

Cross-validation (CV) is one of the main tools for performance estimation and parameter tuning in machine learning.

Incremental Learning

Mixing Time Estimation in Reversible Markov Chains from a Single Sample Path

no code implementations NeurIPS 2015 Daniel Hsu, Aryeh Kontorovich, Csaba Szepesvári

The interval is constructed around the relaxation time $t_{\text{relax}}$, which is strongly related to the mixing time, and the width of the interval converges to zero roughly at a $\sqrt{n}$ rate, where $n$ is the length of the sample path.

Cumulative Prospect Theory Meets Reinforcement Learning: Prediction and Control

no code implementations8 Jun 2015 Prashanth L. A., Cheng Jie, Michael Fu, Steve Marcus, Csaba Szepesvári

Cumulative prospect theory (CPT) is known to model human decisions well, with substantial empirical evidence supporting this claim.

reinforcement-learning Reinforcement Learning (RL)

Optimal Resource Allocation with Semi-Bandit Feedback

no code implementations15 Jun 2014 Tor Lattimore, Koby Crammer, Csaba Szepesvári

We study a sequential resource allocation problem involving a fixed number of recurring jobs.

Adaptive Monte Carlo via Bandit Allocation

no code implementations13 May 2014 James Neufeld, András György, Dale Schuurmans, Csaba Szepesvári

We consider the problem of sequentially choosing between a set of unbiased Monte Carlo estimators to minimize the mean-squared-error (MSE) of a final combined estimate.

Online Learning under Delayed Feedback

no code implementations4 Jun 2013 Pooria Joulani, András György, Csaba Szepesvári

Online learning with delayed feedback has received increasing attention recently due to its several applications in distributed, web-based learning problems.

Deep Representations and Codes for Image Auto-Annotation

no code implementations NeurIPS 2012 Ryan Kiros, Csaba Szepesvári

The task of assigning a set of relevant tags to an image is challenging due to the size and variability of tag vocabularies.

feature selection TAG

Improved Algorithms for Linear Stochastic Bandits

no code implementations NeurIPS 2011 Yasin Abbasi-Yadkori, Dávid Pál, Csaba Szepesvári

We improve the theoretical analysis and empirical performance of algorithms for the stochastic multi-armed bandit problem and the linear stochastic multi-armed bandit problem.

Parametric Bandits: The Generalized Linear Case

no code implementations NeurIPS 2010 Sarah Filippi, Olivier Cappe, Aurélien Garivier, Csaba Szepesvári

We consider structured multi-armed bandit tasks in which the agent is guided by prior structural knowledge that can be exploited to efficiently select the optimal arm(s) in situations where the number of arms is large, or even infinite.

Online Markov Decision Processes under Bandit Feedback

no code implementations NeurIPS 2010 Gergely Neu, Andras Antos, András György, Csaba Szepesvári

We consider online learning in finite stochastic Markovian environments where in each time step a new reward function is chosen by an oblivious adversary.

Estimation of Rényi Entropy and Mutual Information Based on Generalized Nearest-Neighbor Graphs

no code implementations NeurIPS 2010 Dávid Pál, Barnabás Póczos, Csaba Szepesvári

We present simple and computationally efficient nonparametric estimators of R\'enyi entropy and mutual information based on an i. i. d.

Error Propagation for Approximate Policy and Value Iteration

no code implementations NeurIPS 2010 Amir-Massoud Farahmand, Csaba Szepesvári, Rémi Munos

We address the question of how the approximation error/Bellman residual at each iteration of the Approximate Policy/Value Iteration algorithms influences the quality of the resulted policy.

Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation

no code implementations NeurIPS 2009 Shalabh Bhatnagar, Doina Precup, David Silver, Richard S. Sutton, Hamid R. Maei, Csaba Szepesvári

We introduce the first temporal-difference learning algorithms that converge with smooth value function approximators, such as neural networks.


A General Projection Property for Distribution Families

no code implementations NeurIPS 2009 Yao-Liang Yu, Yuxi Li, Dale Schuurmans, Csaba Szepesvári

We prove that linear projections between distribution families with fixed first and second moments are surjective, regardless of dimension.

A Convergent O(n) Temporal-difference Algorithm for Off-policy Learning with Linear Function Approximation

no code implementations NeurIPS 2008 Richard S. Sutton, Hamid R. Maei, Csaba Szepesvári

We introduce the first temporal-difference learning algorithm that is stable with linear function approximation and off-policy training, for any finite Markov decision process, target policy, and exciting behavior policy, and whose complexity scales linearly in the number of parameters.

Online Optimization in X-Armed Bandits

no code implementations NeurIPS 2008 Sébastien Bubeck, Gilles Stoltz, Csaba Szepesvári, Rémi Munos

We consider a generalization of stochastic bandit problems where the set of arms, X, is allowed to be a generic topological space.

Fitted Q-iteration in continuous action-space MDPs

no code implementations NeurIPS 2007 András Antos, Csaba Szepesvári, Rémi Munos

We consider continuous state, continuous action batch reinforcement learning where the goal is to learn a good policy from a sufficiently rich trajectory generated by another policy.

reinforcement-learning Reinforcement Learning (RL)

Cannot find the paper you are looking for? You can Submit a new open access paper.