no code implementations • NeurIPS 2023 • Johannes Kirschner, Seyed Alireza Bakhtiari, Kushagra Chandak, Volodymyr Tkachuk, Csaba Szepesvári
A long line of works characterizes the sample complexity of regret minimization in sequential decision-making by min-max programs.
no code implementations • 8 Mar 2024 • Alex Ayoub, Kaiwen Wang, Vincent Liu, Samuel Robertson, James McInerney, Dawen Liang, Nathan Kallus, Csaba Szepesvári
We propose training fitted Q-iteration with log-loss (FQI-LOG) for batch reinforcement learning (RL).
no code implementations • 14 Nov 2023 • David Janz, Alexander E. Litvak, Csaba Szepesvári
We provide the first useful and rigorous analysis of ensemble sampling for the stochastic linear bandit setting.
no code implementations • 13 Nov 2023 • David Janz, Shuai Liu, Alex Ayoub, Csaba Szepesvári
We show that, for the case of generalised linear bandits, EVILL reduces to perturbed history exploration (PHE), a method where exploration is done by training on randomly perturbed rewards.
1 code implementation • 31 Oct 2023 • Jihao Andreas Lin, Shreyas Padhy, Javier Antorán, Austin Tripp, Alexander Terenin, Csaba Szepesvári, José Miguel Hernández-Lobato, David Janz
We study the optimisation problem associated with Gaussian process regression using squared loss.
no code implementations • 11 Oct 2023 • Gellért Weisz, András György, Csaba Szepesvári
We consider online reinforcement learning (RL) in episodic Markov decision processes (MDPs) under the linear $q^\pi$-realizability assumption, where it is assumed that the action-values of all policies can be expressed as linear functions of state-action features.
no code implementations • 25 Jul 2023 • Philip Amortila, Nan Jiang, Csaba Szepesvári
Theoretical guarantees in reinforcement learning (RL) are known to suffer multiplicative blow-up factors with respect to the misspecification error of function approximation.
1 code implementation • 22 May 2023 • Toshinori Kitamura, Tadashi Kozuno, Yunhao Tang, Nino Vieillard, Michal Valko, Wenhao Yang, Jincheng Mei, Pierre Ménard, Mohammad Gheshlaghi Azar, Rémi Munos, Olivier Pietquin, Matthieu Geist, Csaba Szepesvári, Wataru Kumagai, Yutaka Matsuo
Mirror descent value iteration (MDVI), an abstraction of Kullback-Leibler (KL) and entropy-regularized reinforcement learning (RL), has served as the basis for recent high-performing practical RL algorithms.
no code implementations • NeurIPS 2023 • Qinghua Liu, Gellért Weisz, András György, Chi Jin, Csaba Szepesvári
While policy optimization algorithms have played an important role in recent empirical success of Reinforcement Learning (RL), the existing theoretical understanding of policy optimization remains rather limited -- they are either restricted to tabular MDPs or suffer from highly suboptimal sample complexity, especial in online RL where exploration is necessary.
no code implementations • 25 Feb 2023 • Daniel Kane, Sihan Liu, Shachar Lovett, Gaurav Mahajan, Csaba Szepesvári, Gellért Weisz
The rewards in this game are chosen such that if the learner achieves large reward, then the learner's actions can be used to simulate solving a variant of 3-SAT, where (a) each variable shows up in a bounded number of clauses (b) if an instance has no solutions then it also has no solutions that satisfy more than (1-$\epsilon$)-fraction of clauses.
no code implementations • 8 Feb 2023 • Volodymyr Tkachuk, Seyed Alireza Bakhtiari, Johannes Kirschner, Matej Jusup, Ilija Bogunovic, Csaba Szepesvári
A practical challenge in reinforcement learning are combinatorial action spaces that make planning computationally demanding.
Multi-agent Reinforcement Learning reinforcement-learning +1
no code implementations • 28 Dec 2022 • Ilja Kuzborskij, Csaba Szepesvári
We explore the ability of overparameterized shallow ReLU neural networks to learn Lipschitz, nondifferentiable, bounded functions with additive noise when trained by Gradient Descent (GD).
no code implementations • 30 Oct 2022 • Yao Zhao, Connor James Stephens, Csaba Szepesvári, Kwang-Sung Jun
Simple regret is a natural and parameter-free performance criterion for pure exploration in multi-armed bandits yet is less popular than the probability of missing the best arm or an $\epsilon$-good arm, perhaps due to lack of easy ways to characterize it.
no code implementations • 27 Oct 2022 • Gellért Weisz, András György, Tadashi Kozuno, Csaba Szepesvári
Our first contribution is a new variant of Approximate Policy Iteration (API), called Confident Approximate Policy Iteration (CAPI), which computes a deterministic stationary policy with an optimal error bound scaling linearly with the product of the effective horizon $H$ and the worst-case approximation error $\epsilon$ of the action-value functions of stationary policies.
no code implementations • 29 Sep 2022 • Qinghua Liu, Praneeth Netrapalli, Csaba Szepesvári, Chi Jin
We prove that OMLE learns the near-optimal policies of an enormously rich class of sequential decision making problems in a polynomial number of samples.
no code implementations • 13 Jun 2022 • Sharan Vaswani, Lin F. Yang, Csaba Szepesvári
In particular, we design a model-based algorithm that addresses two settings: (i) relaxed feasibility, where small constraint violations are allowed, and (ii) strict feasibility, where the output policy is required to satisfy the constraint.
no code implementations • 5 Jun 2022 • Hui Yuan, Chengzhuo Ni, Huazheng Wang, Xuezhou Zhang, Le Cong, Csaba Szepesvári, Mengdi Wang
We propose a Thompson Sampling-guided Directed Evolution (TS-DE) framework for sequence optimization, where the sequence-to-function mapping is unknown and querying a single value is subject to costly and noisy measurements.
no code implementations • 2 Jun 2022 • Qinghua Liu, Csaba Szepesvári, Chi Jin
This paper considers the challenging tasks of Multi-Agent Reinforcement Learning (MARL) under partial observability, where each agent only sees her own individual observations and actions that reveal incomplete information about the underlying state of system.
Multi-agent Reinforcement Learning reinforcement-learning +1
no code implementations • 27 May 2022 • Tadashi Kozuno, Wenhao Yang, Nino Vieillard, Toshinori Kitamura, Yunhao Tang, Jincheng Mei, Pierre Ménard, Mohammad Gheshlaghi Azar, Michal Valko, Rémi Munos, Olivier Pietquin, Matthieu Geist, Csaba Szepesvári
In this work, we consider and analyze the sample complexity of model-free reinforcement learning with a generative model.
no code implementations • 19 Apr 2022 • Qinghua Liu, Alan Chung, Csaba Szepesvári, Chi Jin
Applications of Reinforcement Learning (RL), in which agents learn to make a sequence of decisions despite lacking complete information about the latent states of the controlled system, that is, they act under partial observability of the states, are ubiquitous.
Partially Observable Reinforcement Learning reinforcement-learning +1
no code implementations • 22 Nov 2021 • Tongzheng Ren, Tianjun Zhang, Csaba Szepesvári, Bo Dai
Representation learning lies at the heart of the empirical success of deep learning for dealing with the curse of dimensionality.
no code implementations • 18 Oct 2021 • Han Zhong, Zhuoran Yang, Zhaoran Wang, Csaba Szepesvári
We study episodic reinforcement learning (RL) in non-stationary linear kernel Markov decision processes (MDPs).
no code implementations • 5 Oct 2021 • Gellért Weisz, Csaba Szepesvári, András György
Furthermore, we show that the upper bound of TensorPlan can be extended to hold under (iii) and, for MDPs with deterministic transitions and stochastic rewards, also under (ii).
no code implementations • 12 Aug 2021 • Dong Yin, Botao Hao, Yasin Abbasi-Yadkori, Nevena Lazić, Csaba Szepesvári
Under the assumption that the Q-functions of all policies are linear in known features of the state-action pairs, we show that our algorithms have polynomial query and computational costs in the dimension of the features, the effective planning horizon, and the targeted sub-optimality, while these costs are independent of the size of the state space.
no code implementations • NeurIPS 2021 • Ilja Kuzborskij, Csaba Szepesvári, Omar Rivasplata, Amal Rannen-Triki, Razvan Pascanu
Empirically it has been observed that the performance of deep neural networks steadily improves as we increase model size, contradicting the classical view on overfitting and generalization.
no code implementations • NeurIPS 2021 • Soumya Basu, Branislav Kveton, Manzil Zaheer, Csaba Szepesvári
We propose ${\tt AdaTS}$, a Thompson sampling algorithm that adapts sequentially to bandit tasks that it interacts with.
no code implementations • 12 Jul 2021 • Ilja Kuzborskij, Csaba Szepesvári
We explore the ability of overparameterized shallow neural networks to learn Lipschitz regression functions with and without label noise when trained by Gradient Descent (GD).
no code implementations • 11 Feb 2021 • Nevena Lazić, Botao Hao, Yasin Abbasi-Yadkori, Dale Schuurmans, Csaba Szepesvári
We compare the use of KL divergence as a constraint vs. as a regularizer, and point out several optimization issues with the widely-used constrained approach.
no code implementations • 6 Feb 2021 • Botao Hao, Xiang Ji, Yaqi Duan, Hao Lu, Csaba Szepesvári, Mengdi Wang
Bootstrapping provides a flexible and effective approach for assessing the quality of batch reinforcement learning, yet its theoretical property is less understood.
no code implementations • 3 Feb 2021 • Gellért Weisz, Philip Amortila, Barnabás Janzer, Yasin Abbasi-Yadkori, Nan Jiang, Csaba Szepesvári
We consider local planning in fixed-horizon MDPs with a generative model under the assumption that the optimal value function lies close to the span of a feature map.
no code implementations • 11 Nov 2020 • Johannes Kirschner, Tor Lattimore, Claire Vernade, Csaba Szepesvári
We introduce a simple and efficient algorithm for stochastic linear bandits with finitely many actions that is asymptotically optimal and (nearly) worst-case optimal in finite time.
no code implementations • 8 Nov 2020 • Botao Hao, Yaqi Duan, Tor Lattimore, Csaba Szepesvári, Mengdi Wang
To evaluate a new target policy, we analyze a Lasso fitted Q-evaluation method and establish a finite-sample error bound that has no polynomial dependence on the ambient dimension.
no code implementations • 8 Nov 2020 • Botao Hao, Tor Lattimore, Csaba Szepesvári, Mengdi Wang
First, we provide a lower bound showing that linear regret is generally unavoidable in this case, even if there exists a policy that collects well-conditioned data.
no code implementations • 31 Oct 2020 • Mikhail Konobeev, Ilja Kuzborskij, Csaba Szepesvári
A key problem in the theory of meta-learning is to understand how the task distributions influence transfer risk, the expected error of a meta-learner on a new task drawn from the unknown task distribution.
no code implementations • NeurIPS 2020 • Arun Verma, Manjesh K. Hanawal, Csaba Szepesvári, Venkatesh Saligrama
In this paper, we study Contextual Unsupervised Sequential Selection (USS), a new variant of the stochastic contextual bandits problem where the loss of an arm cannot be inferred from the observed feedback.
no code implementations • NeurIPS 2020 • Bo Dai, Ofir Nachum, Yinlam Chow, Lihong Li, Csaba Szepesvári, Dale Schuurmans
We study high-confidence behavior-agnostic off-policy evaluation in reinforcement learning, where the goal is to estimate a confidence interval on a target policy's value, given only access to a static experience dataset collected by unknown behavior policies.
no code implementations • 3 Oct 2020 • Gellért Weisz, Philip Amortila, Csaba Szepesvári
We consider the problem of local planning in fixed-horizon and discounted Markov Decision Processes (MDPs) with linear function approximation and a generative model under the assumption that the optimal action-value function lies in the span of a feature map that is available to the planner.
1 code implementation • 25 Jul 2020 • María Pérez-Ortiz, Omar Rivasplata, John Shawe-Taylor, Csaba Szepesvári
In the context of probabilistic neural networks, the output of training is a probability distribution over network weights.
no code implementations • NeurIPS 2020 • Roshan Shariff, Csaba Szepesvári
Large-scale Markov decision processes (MDPs) require planning algorithms with runtime independent of the number of states of the MDP.
1 code implementation • 18 Jun 2020 • Ilja Kuzborskij, Claire Vernade, András György, Csaba Szepesvári
We consider off-policy evaluation in the contextual bandit setting for the purpose of obtaining a robust off-policy selection strategy, where the selection strategy is evaluated based on the value of the chosen policy in a set of proposal (target) policies.
no code implementations • 4 Sep 2019 • Ilja Kuzborskij, Csaba Szepesvári
We prove semi-empirical concentration inequalities for random variables which are given as possibly nonlinear functions of independent random variables.
no code implementations • NeurIPS 2019 • Roman Werpachowski, András György, Csaba Szepesvári
It utilizes a new unbiased error estimate that is based on adversarial examples generated from the test data and importance weighting.
no code implementations • 5 Feb 2019 • Ilja Kuzborskij, Nicolò Cesa-Bianchi, Csaba Szepesvári
This is a well-established notion of effective dimension appearing in several previous works, including the analyses of SGD and ridge regression, but ours is the first work that brings this dimension to the analysis of learning using Gibbs densities.
no code implementations • 15 Jan 2019 • Arun Verma, Manjesh K. Hanawal, Csaba Szepesvári, Venkatesh Saligrama
We set up the USS problem as a stochastic partial monitoring problem and develop an algorithm with sub-linear regret under the WD property.
no code implementations • 5 Oct 2018 • Shuai Li, Tor Lattimore, Csaba Szepesvári
We introduce a new model for online ranking in which the click probability factors into an examination and attractiveness function and the attractiveness function is a linear function of a feature vector and an unknown parameter.
1 code implementation • ICML 2018 • Gellért Weisz, András György, Csaba Szepesvári
We consider the problem of configuring general-purpose solvers to run efficiently on problem instances drawn from an unknown distribution.
no code implementations • 12 Sep 2017 • Chandrashekar Lakshminarayanan, Csaba Szepesvári
For a given LSA with PR averaging, and data distribution $P$ satisfying the said assumptions, we show that there exists a range of constant step-sizes such that its MSE decays as $O(\frac{1}{t})$.
no code implementations • 8 Sep 2017 • Pooria Joulani, András György, Csaba Szepesvári
Recently, much work has been done on extending the scope of online learning and incremental stochastic optimization algorithms.
no code implementations • NeurIPS 2015 • Daniel Hsu, Aryeh Kontorovich, David A. Levin, Yuval Peres, Csaba Szepesvári
The interval is constructed around the relaxation time $t_{\text{relax}} = 1/\gamma$, which is strongly related to the mixing time, and the width of the interval converges to zero roughly at a $1/\sqrt{n}$ rate, where $n$ is the length of the sample path.
no code implementations • 16 Jun 2017 • Ruitong Huang, Mohammad M. Ajallooeian, Csaba Szepesvári, Martin Müller
We study the problem of identifying the best action among a set of possible options when the value of each action is given by a mapping from a number of noisy micro-observables in the so-called fixed confidence setting.
no code implementations • 19 Mar 2017 • Sumeet Katariya, Branislav Kveton, Csaba Szepesvári, Claire Vernade, Zheng Wen
The probability that a user will click a search result depends both on its relevance and its position on the results page.
no code implementations • NeurIPS 2016 • Ruitong Huang, Tor Lattimore, András György, Csaba Szepesvári
The follow the leader (FTL) algorithm, perhaps the simplest of all online learning algorithms, is known to perform well when the loss functions it is used on are convex and positively curved.
2 code implementations • NeurIPS 2016 • Kiarash Shaloudegi, András György, Csaba Szepesvári, Wilsun Xu
We develop a scalable, computationally efficient method for the task of energy disaggregation for home appliance monitoring.
no code implementations • 22 Sep 2016 • Xiaowei Hu, Prashanth L. A., András György, Csaba Szepesvári
Algorithms for bandit convex optimization and online learning often rely on constructing noisy gradient estimates, which are then used in appropriately adjusted first-order algorithms, replacing actual gradients.
no code implementations • 20 Sep 2016 • Bernardo Ávila Pires, Csaba Szepesvári
We devise a streamlined analysis that simplifies the process of deriving calibration functions for a large number of surrogate losses that have been proposed in the literature.
no code implementations • 7 Sep 2016 • Gábor Balázs, András György, Csaba Szepesvári
This paper extends the standard chaining technique to prove excess risk upper bounds for empirical risk minimization with random design settings even if the magnitude of the noise and the estimates is unbounded.
no code implementations • 19 Feb 2016 • Bernardo Ávila Pires, Csaba Szepesvári
In this paper we study a model-based approach to calculating approximately optimal policies in Markovian Decision Processes.
Model-based Reinforcement Learning reinforcement-learning +1
no code implementations • 13 Feb 2016 • Yifan Wu, Roshan Shariff, Tor Lattimore, Csaba Szepesvári
We consider both the stochastic and the adversarial settings, where we propose, natural, yet novel strategies and analyze the price for maintaining the constraints.
1 code implementation • 9 Feb 2016 • Sumeet Katariya, Branislav Kveton, Csaba Szepesvári, Zheng Wen
This work presents the first practical and regret-optimal online algorithm for learning to rank with multiple clicks in a cascade-like click model.
no code implementations • NeurIPS 2015 • Yifan Wu, András György, Csaba Szepesvári
For the first time in the literature, we provide non-asymptotic problem-dependent lower bounds on the regret of any algorithm, which recover existing asymptotic problem-dependent lower bounds and finite-time minimax lower bounds available in the literature.
no code implementations • 30 Jun 2015 • Pooria Joulani, András György, Csaba Szepesvári
Cross-validation (CV) is one of the main tools for performance estimation and parameter tuning in machine learning.
no code implementations • NeurIPS 2015 • Daniel Hsu, Aryeh Kontorovich, Csaba Szepesvári
The interval is constructed around the relaxation time $t_{\text{relax}}$, which is strongly related to the mixing time, and the width of the interval converges to zero roughly at a $\sqrt{n}$ rate, where $n$ is the length of the sample path.
no code implementations • 8 Jun 2015 • Prashanth L. A., Cheng Jie, Michael Fu, Steve Marcus, Csaba Szepesvári
Cumulative prospect theory (CPT) is known to model human decisions well, with substantial empirical evidence supporting this claim.
no code implementations • 15 Jun 2014 • Tor Lattimore, Koby Crammer, Csaba Szepesvári
We study a sequential resource allocation problem involving a fixed number of recurring jobs.
no code implementations • 13 May 2014 • James Neufeld, András György, Dale Schuurmans, Csaba Szepesvári
We consider the problem of sequentially choosing between a set of unbiased Monte Carlo estimators to minimize the mean-squared-error (MSE) of a final combined estimate.
no code implementations • 4 Jun 2013 • Pooria Joulani, András György, Csaba Szepesvári
Online learning with delayed feedback has received increasing attention recently due to its several applications in distributed, web-based learning problems.
no code implementations • NeurIPS 2012 • Ryan Kiros, Csaba Szepesvári
The task of assigning a set of relevant tags to an image is challenging due to the size and variability of tag vocabularies.
no code implementations • NeurIPS 2011 • Yasin Abbasi-Yadkori, Dávid Pál, Csaba Szepesvári
We improve the theoretical analysis and empirical performance of algorithms for the stochastic multi-armed bandit problem and the linear stochastic multi-armed bandit problem.
no code implementations • NeurIPS 2010 • Sarah Filippi, Olivier Cappe, Aurélien Garivier, Csaba Szepesvári
We consider structured multi-armed bandit tasks in which the agent is guided by prior structural knowledge that can be exploited to efficiently select the optimal arm(s) in situations where the number of arms is large, or even infinite.
no code implementations • NeurIPS 2010 • Gergely Neu, Andras Antos, András György, Csaba Szepesvári
We consider online learning in finite stochastic Markovian environments where in each time step a new reward function is chosen by an oblivious adversary.
no code implementations • NeurIPS 2010 • Amir-Massoud Farahmand, Csaba Szepesvári, Rémi Munos
We address the question of how the approximation error/Bellman residual at each iteration of the Approximate Policy/Value Iteration algorithms influences the quality of the resulted policy.
no code implementations • NeurIPS 2010 • Dávid Pál, Barnabás Póczos, Csaba Szepesvári
We present simple and computationally efficient nonparametric estimators of R\'enyi entropy and mutual information based on an i. i. d.
no code implementations • NeurIPS 2009 • Yao-Liang Yu, Yuxi Li, Dale Schuurmans, Csaba Szepesvári
We prove that linear projections between distribution families with fixed first and second moments are surjective, regardless of dimension.
no code implementations • NeurIPS 2009 • Hengshuai Yao, Shalabh Bhatnagar, Dongcui Diao, Richard S. Sutton, Csaba Szepesvári
We extend Dyna planning architecture for policy evaluation and control in two significant aspects.
no code implementations • NeurIPS 2009 • Shalabh Bhatnagar, Doina Precup, David Silver, Richard S. Sutton, Hamid R. Maei, Csaba Szepesvári
We introduce the first temporal-difference learning algorithms that converge with smooth value function approximators, such as neural networks.
no code implementations • NeurIPS 2008 • Richard S. Sutton, Hamid R. Maei, Csaba Szepesvári
We introduce the first temporal-difference learning algorithm that is stable with linear function approximation and off-policy training, for any finite Markov decision process, target policy, and exciting behavior policy, and whose complexity scales linearly in the number of parameters.
no code implementations • NeurIPS 2008 • Amir M. Farahmand, Mohammad Ghavamzadeh, Shie Mannor, Csaba Szepesvári
In this paper we consider approximate policy-iteration-based reinforcement learning algorithms.
no code implementations • NeurIPS 2008 • Sébastien Bubeck, Gilles Stoltz, Csaba Szepesvári, Rémi Munos
We consider a generalization of stochastic bandit problems where the set of arms, X, is allowed to be a generic topological space.
no code implementations • NeurIPS 2007 • András Antos, Csaba Szepesvári, Rémi Munos
We consider continuous state, continuous action batch reinforcement learning where the goal is to learn a good policy from a sufficiently rich trajectory generated by another policy.