no code implementations • ICML 2020 • Claire Vernade, András György, Timothy Mann
In fact, if the timescale of the change is comparable to the delay, it is impossible to learn about the environment, since the available observations are already obsolete.
no code implementations • ICML 2020 • Pooria Joulani, Anant Raj, András György, Csaba Szepesvari
In this paper, we show that there is a simpler approach to obtaining accelerated rates: applying generic, well-known optimistic online learning algorithms and using the online average of their predictions to query the (deterministic or stochastic) first-order optimization oracle at each time step.
no code implementations • 6 Nov 2024 • Alexandre Galashov, Michalis K. Titsias, András György, Clare Lyle, Razvan Pascanu, Yee Whye Teh, Maneesh Sahani
Neural networks are traditionally trained under the assumption that data come from a stationary distribution.
no code implementations • 30 Oct 2024 • Bryan Chan, Xinyi Chen, András György, Dale Schuurmans
These theoretical findings are then corroborated experimentally by comparing the behaviour of a full transformer on the simplified distributions to that of the stylized model, demonstrating aligned results.
no code implementations • 10 Jun 2024 • Alex Lewandowski, Michał Bortkiewicz, Saurabh Kumar, András György, Dale Schuurmans, Mateusz Ostaszewski, Marlos C. Machado
From this perspective, we derive a new spectral regularizer for continual learning that better sustains these beneficial initialization properties throughout training.
no code implementations • 4 Jun 2024 • Yasin Abbasi Yadkori, Ilja Kuzborskij, András György, Csaba Szepesvári
Such quantification, for instance, allows to detect hallucinations (cases when epistemic uncertainty is high) in both single- and multi-answer responses.
no code implementations • 4 Apr 2024 • Yasin Abbasi Yadkori, Ilja Kuzborskij, David Stutz, András György, Adam Fisch, Arnaud Doucet, Iuliya Beloshapka, Wei-Hung Weng, Yao-Yuan Yang, Csaba Szepesvári, Ali Taylan Cemgil, Nenad Tomasev
We develop a principled procedure for determining when a large language model (LLM) should abstain from responding (e. g., by saying "I don't know") in a general domain, instead of resorting to possibly "hallucinating" a non-sensical or incorrect answer.
no code implementations • 8 Feb 2024 • Nicolas Nguyen, Imad Aouali, András György, Claire Vernade
We study the problem of Bayesian fixed-budget best-arm identification (BAI) in structured bandits.
no code implementations • 11 Oct 2023 • Gellért Weisz, András György, Csaba Szepesvári
We consider online reinforcement learning (RL) in episodic Markov decision processes (MDPs) under the linear $q^\pi$-realizability assumption, where it is assumed that the action-values of all policies can be expressed as linear functions of state-action features.
no code implementations • NeurIPS 2023 • Qinghua Liu, Gellért Weisz, András György, Chi Jin, Csaba Szepesvári
While policy optimization algorithms have played an important role in recent empirical success of Reinforcement Learning (RL), the existing theoretical understanding of policy optimization remains rather limited -- they are either restricted to tabular MDPs or suffer from highly suboptimal sample complexity, especial in online RL where exploration is necessary.
no code implementations • 10 Feb 2023 • Tor Lattimore, András György
We introduce a simple and efficient algorithm for unconstrained zeroth-order stochastic convex bandits and prove its regret is at most $(1 + r/d)[d^{1. 5} \sqrt{n} + d^3] polylog(n, d, r)$ where $n$ is the horizon, $d$ the dimension and $r$ is the radius of a known ball containing the minimiser of the loss.
no code implementations • 23 Dec 2022 • Tomer Galanti, András György, Marcus Hutter
We study the ability of foundation models to learn representations for classification that are transferable to new, unseen classes.
no code implementations • 6 Dec 2022 • Yunhao Tang, Zhaohan Daniel Guo, Pierre Harvey Richemond, Bernardo Ávila Pires, Yash Chandak, Rémi Munos, Mark Rowland, Mohammad Gheshlaghi Azar, Charline Le Lan, Clare Lyle, András György, Shantanu Thakoor, Will Dabney, Bilal Piot, Daniele Calandriello, Michal Valko
We identify that a faster paced optimization of the predictor and semi-gradient updates on the representation, are crucial to preventing the representation collapse.
no code implementations • 27 Oct 2022 • Gellért Weisz, András György, Tadashi Kozuno, Csaba Szepesvári
Our first contribution is a new variant of Approximate Policy Iteration (API), called Confident Approximate Policy Iteration (CAPI), which computes a deterministic stationary policy with an optimal error bound scaling linearly with the product of the effective horizon $H$ and the worst-case approximation error $\epsilon$ of the action-value functions of stationary policies.
no code implementations • 26 May 2022 • Sanae Amani, Tor Lattimore, András György, Lin F. Yang
In particular, for scenarios with known context distribution, the communication cost of DisBE-LUCB is only $\tilde{\mathcal{O}}(dN)$ and its regret is ${\tilde{\mathcal{O}}}(\sqrt{dNT})$, which is of the same order as that incurred by an optimal single-agent algorithm for $NT$ rounds.
1 code implementation • 25 Feb 2022 • MohammadJavad Azizi, Thang Duong, Yasin Abbasi-Yadkori, András György, Claire Vernade, Mohammad Ghavamzadeh
We study a sequential decision problem where the learner faces a sequence of $K$-armed bandit tasks.
no code implementations • ICLR 2022 • Tomer Galanti, András György, Marcus Hutter
We study the ability of foundation models to learn representations for classification that are transferable to new, unseen classes.
no code implementations • 5 Oct 2021 • Gellért Weisz, Csaba Szepesvári, András György
Furthermore, we show that the upper bound of TensorPlan can be extended to hold under (iii) and, for MDPs with deterministic transitions and stochastic rewards, also under (ii).
no code implementations • 1 Dec 2020 • Gábor Melis, András György, Phil Blunsom
A common failure mode of density models trained as variational autoencoders is to model the data without relying on their latent variables, rendering these variables useless.
no code implementations • NeurIPS 2020 • Gellert Weisz, András György, Wei-I Lin, Devon Graham, Kevin Leyton-Brown, Csaba Szepesvari, Brendan Lucier
Algorithm configuration procedures optimize parameters of a given algorithm to perform well over a distribution of inputs.
no code implementations • 12 Oct 2020 • András György, Pooria Joulani
We consider the adversarial multi-armed bandit problem under delayed feedback.
no code implementations • 25 Sep 2020 • Tor Lattimore, András György
We establish a connection between the stability of mirror descent and the information ratio by Russo and Van Roy [2014].
2 code implementations • 18 Jun 2020 • Ilja Kuzborskij, Claire Vernade, András György, Csaba Szepesvári
We consider off-policy evaluation in the contextual bandit setting for the purpose of obtaining a robust off-policy selection strategy, where the selection strategy is evaluated based on the value of the chosen policy in a set of proposal (target) policies.
no code implementations • NeurIPS 2019 • Pooria Joulani, András György, Csaba Szepesvari
ASYNCADA is, to our knowledge, the first asynchronous stochastic optimization algorithm with finite-time data-dependent convergence guarantees for generic convex constraints.
no code implementations • 8 May 2019 • Pedro A. Ortega, Jane. X. Wang, Mark Rowland, Tim Genewein, Zeb Kurth-Nelson, Razvan Pascanu, Nicolas Heess, Joel Veness, Alex Pritzel, Pablo Sprechmann, Siddhant M. Jayakumar, Tom McGrath, Kevin Miller, Mohammad Azar, Ian Osband, Neil Rabinowitz, András György, Silvia Chiappa, Simon Osindero, Yee Whye Teh, Hado van Hasselt, Nando de Freitas, Matthew Botvinick, Shane Legg
In this report we review memory-based meta-learning as a tool for building sample-efficient strategies that learn from past experience to adapt to any task within a target class.
no code implementations • NeurIPS 2019 • Roman Werpachowski, András György, Csaba Szepesvári
It utilizes a new unbiased error estimate that is based on adversarial examples generated from the test data and importance weighting.
no code implementations • 27 Feb 2019 • Ray Jiang, Silvia Chiappa, Tor Lattimore, András György, Pushmeet Kohli
Machine learning is used extensively in recommender systems deployed in products.
no code implementations • 24 Jul 2018 • Timothy A. Mann, Sven Gowal, András György, Ray Jiang, Huiyi Hu, Balaji Lakshminarayanan, Prav Srinivasan
Predicting delayed outcomes is an important problem in recommender systems (e. g., if customers will finish reading an ebook).
1 code implementation • ICML 2018 • Gellért Weisz, András György, Csaba Szepesvári
We consider the problem of configuring general-purpose solvers to run efficiently on problem instances drawn from an unknown distribution.
no code implementations • 11 Jun 2018 • Kiarash Shaloudegi, András György
Here we take a different approach and, similarly to parallel MCMC methods, instead of trying to find a single chain that samples from the whole distribution, we combine samples from several chains run in parallel, each exploring only parts of the state space (e. g., a few modes only).
no code implementations • 1 Jun 2018 • Elif Tuğçe Ceran, Deniz Gündüz, András György
Scheduling the transmission of time-sensitive data to multiple users over error-prone communication channels is studied with the goal of minimizing the long-term average age of information (AoI) at the users under a constraint on the average number of transmissions at the source node.
no code implementations • 8 Sep 2017 • Pooria Joulani, András György, Csaba Szepesvári
Recently, much work has been done on extending the scope of online learning and incremental stochastic optimization algorithms.
no code implementations • NeurIPS 2016 • Ruitong Huang, Tor Lattimore, András György, Csaba Szepesvári
The follow the leader (FTL) algorithm, perhaps the simplest of all online learning algorithms, is known to perform well when the loss functions it is used on are convex and positively curved.
2 code implementations • NeurIPS 2016 • Kiarash Shaloudegi, András György, Csaba Szepesvári, Wilsun Xu
We develop a scalable, computationally efficient method for the task of energy disaggregation for home appliance monitoring.
no code implementations • 22 Sep 2016 • Xiaowei Hu, Prashanth L. A., András György, Csaba Szepesvári
Algorithms for bandit convex optimization and online learning often rely on constructing noisy gradient estimates, which are then used in appropriately adjusted first-order algorithms, replacing actual gradients.
no code implementations • 7 Sep 2016 • Gábor Balázs, András György, Csaba Szepesvári
This paper extends the standard chaining technique to prove excess risk upper bounds for empirical risk minimization with random design settings even if the magnitude of the noise and the estimates is unbounded.
no code implementations • NeurIPS 2015 • Yifan Wu, András György, Csaba Szepesvári
For the first time in the literature, we provide non-asymptotic problem-dependent lower bounds on the regret of any algorithm, which recover existing asymptotic problem-dependent lower bounds and finite-time minimax lower bounds available in the literature.
no code implementations • 30 Jun 2015 • Pooria Joulani, András György, Csaba Szepesvári
Cross-validation (CV) is one of the main tools for performance estimation and parameter tuning in machine learning.
no code implementations • 13 May 2014 • James Neufeld, András György, Dale Schuurmans, Csaba Szepesvári
We consider the problem of sequentially choosing between a set of unbiased Monte Carlo estimators to minimize the mean-squared-error (MSE) of a final combined estimate.
no code implementations • 16 Jan 2014 • András György, Levente Kocsis
In particular, we prove that at most a quadratic increase in the number of times the target function is evaluated is needed to achieve the performance of a local search algorithm started from the attraction region of the optimum.
no code implementations • 4 Jun 2013 • Pooria Joulani, András György, Csaba Szepesvári
Online learning with delayed feedback has received increasing attention recently due to its several applications in distributed, web-based learning problems.
no code implementations • NeurIPS 2010 • Gergely Neu, Andras Antos, András György, Csaba Szepesvári
We consider online learning in finite stochastic Markovian environments where in each time step a new reward function is chosen by an oblivious adversary.