In fact, if the timescale of the change is comparable to the delay, it is impossible to learn about the environment, since the available observations are already obsolete.

In this paper, we show that there is a simpler approach to obtaining accelerated rates: applying generic, well-known optimistic online learning algorithms and using the online average of their predictions to query the (deterministic or stochastic) first-order optimization oracle at each time step.

Algorithm configuration procedures optimize parameters of a given algorithm to perform well over a distribution of inputs.

A common failure mode of density models trained as variational autoencoders is to model the data without relying on their latent variables, rendering these variables useless.

We consider the adversarial multi-armed bandit problem under delayed feedback.

We establish a connection between the stability of mirror descent and the information ratio by Russo and Van Roy [2014].

We consider off-policy evaluation in the contextual bandit setting for the purpose of obtaining a robust off-policy selection strategy, where the selection strategy is evaluated based on the value of the chosen policy in a set of proposal (target) policies.

ASYNCADA is, to our knowledge, the first asynchronous stochastic optimization algorithm with finite-time data-dependent convergence guarantees for generic convex constraints.

In this report we review memory-based meta-learning as a tool for building sample-efficient strategies that learn from past experience to adapt to any task within a target class.

It utilizes a new unbiased error estimate that is based on adversarial examples generated from the test data and importance weighting.

Machine learning is used extensively in recommender systems deployed in products.

Predicting delayed outcomes is an important problem in recommender systems (e. g., if customers will finish reading an ebook).

We consider the problem of configuring general-purpose solvers to run efficiently on problem instances drawn from an unknown distribution.

Here we take a different approach and, similarly to parallel MCMC methods, instead of trying to find a single chain that samples from the whole distribution, we combine samples from several chains run in parallel, each exploring only parts of the state space (e. g., a few modes only).

Scheduling the transmission of time-sensitive data to multiple users over error-prone communication channels is studied with the goal of minimizing the long-term average age of information (AoI) at the users under a constraint on the average number of transmissions at the source node.

Recently, much work has been done on extending the scope of online learning and incremental stochastic optimization algorithms.

The follow the leader (FTL) algorithm, perhaps the simplest of all online learning algorithms, is known to perform well when the loss functions it is used on are convex and positively curved.

We develop a scalable, computationally efficient method for the task of energy disaggregation for home appliance monitoring.

Algorithms for bandit convex optimization and online learning often rely on constructing noisy gradient estimates, which are then used in appropriately adjusted first-order algorithms, replacing actual gradients.

This paper extends the standard chaining technique to prove excess risk upper bounds for empirical risk minimization with random design settings even if the magnitude of the noise and the estimates is unbounded.

For the first time in the literature, we provide non-asymptotic problem-dependent lower bounds on the regret of any algorithm, which recover existing asymptotic problem-dependent lower bounds and finite-time minimax lower bounds available in the literature.

Cross-validation (CV) is one of the main tools for performance estimation and parameter tuning in machine learning.

We consider the problem of sequentially choosing between a set of unbiased Monte Carlo estimators to minimize the mean-squared-error (MSE) of a final combined estimate.

In particular, we prove that at most a quadratic increase in the number of times the target function is evaluated is needed to achieve the performance of a local search algorithm started from the attraction region of the optimum.

Online learning with delayed feedback has received increasing attention recently due to its several applications in distributed, web-based learning problems.

We consider online learning in finite stochastic Markovian environments where in each time step a new reward function is chosen by an oblivious adversary.

Cannot find the paper you are looking for? You can
Submit a new open access paper.