Search Results for author: András György

Found 37 papers, 4 papers with code

Online Markov Decision Processes under Bandit Feedback

no code implementations NeurIPS 2010 Gergely Neu, Andras Antos, András György, Csaba Szepesvári

We consider online learning in finite stochastic Markovian environments where in each time step a new reward function is chosen by an oblivious adversary.

Online Learning under Delayed Feedback

no code implementations4 Jun 2013 Pooria Joulani, András György, Csaba Szepesvári

Online learning with delayed feedback has received increasing attention recently due to its several applications in distributed, web-based learning problems.

Efficient Multi-Start Strategies for Local Search Algorithms

no code implementations16 Jan 2014 András György, Levente Kocsis

In particular, we prove that at most a quadratic increase in the number of times the target function is evaluated is needed to achieve the performance of a local search algorithm started from the attraction region of the optimum.

Adaptive Monte Carlo via Bandit Allocation

no code implementations13 May 2014 James Neufeld, András György, Dale Schuurmans, Csaba Szepesvári

We consider the problem of sequentially choosing between a set of unbiased Monte Carlo estimators to minimize the mean-squared-error (MSE) of a final combined estimate.

Fast Cross-Validation for Incremental Learning

no code implementations30 Jun 2015 Pooria Joulani, András György, Csaba Szepesvári

Cross-validation (CV) is one of the main tools for performance estimation and parameter tuning in machine learning.

Incremental Learning

Online Learning with Gaussian Payoffs and Side Observations

no code implementations NeurIPS 2015 Yifan Wu, András György, Csaba Szepesvári

For the first time in the literature, we provide non-asymptotic problem-dependent lower bounds on the regret of any algorithm, which recover existing asymptotic problem-dependent lower bounds and finite-time minimax lower bounds available in the literature.

Chaining Bounds for Empirical Risk Minimization

no code implementations7 Sep 2016 Gábor Balázs, András György, Csaba Szepesvári

This paper extends the standard chaining technique to prove excess risk upper bounds for empirical risk minimization with random design settings even if the magnitude of the noise and the estimates is unbounded.

regression

(Bandit) Convex Optimization with Biased Noisy Gradient Oracles

no code implementations22 Sep 2016 Xiaowei Hu, Prashanth L. A., András György, Csaba Szepesvári

Algorithms for bandit convex optimization and online learning often rely on constructing noisy gradient estimates, which are then used in appropriately adjusted first-order algorithms, replacing actual gradients.

SDP Relaxation with Randomized Rounding for Energy Disaggregation

2 code implementations NeurIPS 2016 Kiarash Shaloudegi, András György, Csaba Szepesvári, Wilsun Xu

We develop a scalable, computationally efficient method for the task of energy disaggregation for home appliance monitoring.

Total Energy

Following the Leader and Fast Rates in Linear Prediction: Curved Constraint Sets and Other Regularities

no code implementations NeurIPS 2016 Ruitong Huang, Tor Lattimore, András György, Csaba Szepesvári

The follow the leader (FTL) algorithm, perhaps the simplest of all online learning algorithms, is known to perform well when the loss functions it is used on are convex and positively curved.

A Modular Analysis of Adaptive (Non-)Convex Optimization: Optimism, Composite Objectives, and Variational Bounds

no code implementations8 Sep 2017 Pooria Joulani, András György, Csaba Szepesvári

Recently, much work has been done on extending the scope of online learning and incremental stochastic optimization algorithms.

Stochastic Optimization

A Reinforcement Learning Approach to Age of Information in Multi-User Networks

no code implementations1 Jun 2018 Elif Tuğçe Ceran, Deniz Gündüz, András György

Scheduling the transmission of time-sensitive data to multiple users over error-prone communication channels is studied with the goal of minimizing the long-term average age of information (AoI) at the users under a constraint on the average number of transmissions at the source node.

reinforcement-learning Reinforcement Learning (RL) +1

Adaptive MCMC via Combining Local Samplers

no code implementations11 Jun 2018 Kiarash Shaloudegi, András György

Here we take a different approach and, similarly to parallel MCMC methods, instead of trying to find a single chain that samples from the whole distribution, we combine samples from several chains run in parallel, each exploring only parts of the state space (e. g., a few modes only).

LeapsAndBounds: A Method for Approximately Optimal Algorithm Configuration

1 code implementation ICML 2018 Gellért Weisz, András György, Csaba Szepesvári

We consider the problem of configuring general-purpose solvers to run efficiently on problem instances drawn from an unknown distribution.

Learning from Delayed Outcomes via Proxies with Applications to Recommender Systems

no code implementations24 Jul 2018 Timothy A. Mann, Sven Gowal, András György, Ray Jiang, Huiyi Hu, Balaji Lakshminarayanan, Prav Srinivasan

Predicting delayed outcomes is an important problem in recommender systems (e. g., if customers will finish reading an ebook).

Recommendation Systems

Detecting Overfitting via Adversarial Examples

no code implementations NeurIPS 2019 Roman Werpachowski, András György, Csaba Szepesvári

It utilizes a new unbiased error estimate that is based on adversarial examples generated from the test data and importance weighting.

General Classification Image Classification

Think out of the "Box": Generically-Constrained Asynchronous Composite Optimization and Hedging

no code implementations NeurIPS 2019 Pooria Joulani, András György, Csaba Szepesvari

ASYNCADA is, to our knowledge, the first asynchronous stochastic optimization algorithm with finite-time data-dependent convergence guarantees for generic convex constraints.

Stochastic Optimization

Confident Off-Policy Evaluation and Selection through Self-Normalized Importance Weighting

1 code implementation18 Jun 2020 Ilja Kuzborskij, Claire Vernade, András György, Csaba Szepesvári

We consider off-policy evaluation in the contextual bandit setting for the purpose of obtaining a robust off-policy selection strategy, where the selection strategy is evaluated based on the value of the chosen policy in a set of proposal (target) policies.

Multi-Armed Bandits Off-policy evaluation

Mirror Descent and the Information Ratio

no code implementations25 Sep 2020 Tor Lattimore, András György

We establish a connection between the stability of mirror descent and the information ratio by Russo and Van Roy [2014].

Adapting to Delays and Data in Adversarial Multi-Armed Bandits

no code implementations12 Oct 2020 András György, Pooria Joulani

We consider the adversarial multi-armed bandit problem under delayed feedback.

Multi-Armed Bandits

Mutual Information Constraints for Monte-Carlo Objectives

no code implementations1 Dec 2020 Gábor Melis, András György, Phil Blunsom

A common failure mode of density models trained as variational autoencoders is to model the data without relying on their latent variables, rendering these variables useless.

TensorPlan and the Few Actions Lower Bound for Planning in MDPs under Linear Realizability of Optimal Value Functions

no code implementations5 Oct 2021 Gellért Weisz, Csaba Szepesvári, András György

Furthermore, we show that the upper bound of TensorPlan can be extended to hold under (iii) and, for MDPs with deterministic transitions and stochastic rewards, also under (ii).

On the Role of Neural Collapse in Transfer Learning

no code implementations ICLR 2022 Tomer Galanti, András György, Marcus Hutter

We study the ability of foundation models to learn representations for classification that are transferable to new, unseen classes.

Clustering Few-Shot Learning +1

Distributed Contextual Linear Bandits with Minimax Optimal Communication Cost

no code implementations26 May 2022 Sanae Amani, Tor Lattimore, András György, Lin F. Yang

In particular, for scenarios with known context distribution, the communication cost of DisBE-LUCB is only $\tilde{\mathcal{O}}(dN)$ and its regret is ${\tilde{\mathcal{O}}}(\sqrt{dNT})$, which is of the same order as that incurred by an optimal single-agent algorithm for $NT$ rounds.

Confident Approximate Policy Iteration for Efficient Local Planning in $q^π$-realizable MDPs

no code implementations27 Oct 2022 Gellért Weisz, András György, Tadashi Kozuno, Csaba Szepesvári

Our first contribution is a new variant of Approximate Policy Iteration (API), called Confident Approximate Policy Iteration (CAPI), which computes a deterministic stationary policy with an optimal error bound scaling linearly with the product of the effective horizon $H$ and the worst-case approximation error $\epsilon$ of the action-value functions of stationary policies.

Generalization Bounds for Few-Shot Transfer Learning with Pretrained Classifiers

no code implementations23 Dec 2022 Tomer Galanti, András György, Marcus Hutter

We study the ability of foundation models to learn representations for classification that are transferable to new, unseen classes.

Few-Shot Learning Generalization Bounds +1

A Second-Order Method for Stochastic Bandit Convex Optimisation

no code implementations10 Feb 2023 Tor Lattimore, András György

We introduce a simple and efficient algorithm for unconstrained zeroth-order stochastic convex bandits and prove its regret is at most $(1 + r/d)[d^{1. 5} \sqrt{n} + d^3] polylog(n, d, r)$ where $n$ is the horizon, $d$ the dimension and $r$ is the radius of a known ball containing the minimiser of the loss.

Optimistic Natural Policy Gradient: a Simple Efficient Policy Optimization Framework for Online RL

no code implementations NeurIPS 2023 Qinghua Liu, Gellért Weisz, András György, Chi Jin, Csaba Szepesvári

While policy optimization algorithms have played an important role in recent empirical success of Reinforcement Learning (RL), the existing theoretical understanding of policy optimization remains rather limited -- they are either restricted to tabular MDPs or suffer from highly suboptimal sample complexity, especial in online RL where exploration is necessary.

Reinforcement Learning (RL)

Online RL in Linearly $q^π$-Realizable MDPs Is as Easy as in Linear MDPs If You Learn What to Ignore

no code implementations11 Oct 2023 Gellért Weisz, András György, Csaba Szepesvári

We consider online reinforcement learning (RL) in episodic Markov decision processes (MDPs) under the linear $q^\pi$-realizability assumption, where it is assumed that the action-values of all policies can be expressed as linear functions of state-action features.

Reinforcement Learning (RL)

A simpler approach to accelerated optimization: iterative averaging meets optimism

no code implementations ICML 2020 Pooria Joulani, Anant Raj, András György, Csaba Szepesvari

In this paper, we show that there is a simpler approach to obtaining accelerated rates: applying generic, well-known optimistic online learning algorithms and using the online average of their predictions to query the (deterministic or stochastic) first-order optimization oracle at each time step.

Non-Stationary Bandits with Intermediate Observations

no code implementations ICML 2020 Claire Vernade, András György, Timothy Mann

In fact, if the timescale of the change is comparable to the delay, it is impossible to learn about the environment, since the available observations are already obsolete.

Recommendation Systems

Cannot find the paper you are looking for? You can Submit a new open access paper.