We address offline reinforcement learning with privacy guarantees, where the goal is to train a policy that is differentially private with respect to individual trajectories in the dataset.
We introduce the safe best-arm identification framework with linear feedback, where the agent is subject to some stage-wise safety constraint that linearly depends on an unknown parameter vector.
We propose the first regret-based approach to the Graphical Bilinear Bandits problem, where $n$ agents in a graph play a stochastic bilinear bandit game with each of their neighbors.
Experimental design is an approach for selecting samples among a given set so as to obtain the best estimator for a given criterion.
We study the best arm identification problem in which the learner wants to find the graph allocation maximizing the sum of the bilinear rewards.
This novel smoothing method is then used to improve first-order non-smooth optimization (both convex and non-convex) by allowing for a local exploration of the search space.
For smooth convex and non-convex objective functions, we provide matching lower and upper complexity bounds and show that a naive pipeline parallelization of Nesterov's accelerated gradient descent is optimal.
As cellular networks become denser, a scalable and dynamic tuning of wireless base station parameters can only be achieved through automated optimization.
Privacy preserving networks can be modelled as decentralized networks (e. g., sensors, connected objects, smartphones), where communication between nodes of the network is not controlled by an all-knowing, central node.
In a wide range of statistical learning problems such as ranking, clustering or metric learning among others, the risk is accurately estimated by $U$-statistics of degree $d\geq 1$, i. e. functionals of the training data with low variance that take the form of averages over $k$-tuples.