Search Results for author: Yann Ollivier

Found 31 papers, 9 papers with code

Learning with Random Learning Rates

1 code implementation2 Oct 2018 Léonard Blier, Pierre Wolinski, Yann Ollivier

Hyperparameter tuning is a bothersome step in the training of deep learning models.

Learning One Representation to Optimize All Rewards

2 code implementations NeurIPS 2021 Ahmed Touati, Yann Ollivier

In the test phase, a reward representation is estimated either from observations or an explicit reward description (e. g., a target state).

Does Zero-Shot Reinforcement Learning Exist?

1 code implementation29 Sep 2022 Ahmed Touati, Jérémy Rapin, Yann Ollivier

A zero-shot RL agent is an agent that can solve any RL task in a given environment, instantly with no additional planning or learning, after an initial reward-free learning phase.

Contrastive Learning reinforcement-learning +2

Making Deep Q-learning methods robust to time discretization

1 code implementation28 Jan 2019 Corentin Tallec, Léonard Blier, Yann Ollivier

Despite remarkable successes, Deep Reinforcement Learning (DRL) is not robust to hyperparameterization, implementation details, or small environment changes (Henderson et al. 2017, Zhang et al. 2018).

Q-Learning

Separating value functions across time-scales

1 code implementation5 Feb 2019 Joshua Romoff, Peter Henderson, Ahmed Touati, Emma Brunskill, Joelle Pineau, Yann Ollivier

In settings where this bias is unacceptable - where the system must optimize for longer horizons at higher discounts - the target of the value function approximator may increase in variance leading to difficulties in learning.

Reinforcement Learning (RL)

First-order Adversarial Vulnerability of Neural Networks and Input Dimension

1 code implementation ICLR 2019 Carl-Johann Simon-Gabriel, Yann Ollivier, Léon Bottou, Bernhard Schölkopf, David Lopez-Paz

Over the past few years, neural networks were proven vulnerable to adversarial images: targeted but imperceptible image perturbations lead to drastically different predictions.

Natural Langevin Dynamics for Neural Networks

1 code implementation4 Dec 2017 Gaétan Marceau-Caron, Yann Ollivier

The resulting natural Langevin dynamics combines the advantages of Amari's natural gradient descent and Fisher-preconditioned Langevin dynamics for large neural networks.

Unbiased Online Recurrent Optimization

1 code implementation ICLR 2018 Corentin Tallec, Yann Ollivier

The novel Unbiased Online Recurrent Optimization (UORO) algorithm allows for online learning of general recurrent computational graphs such as recurrent network models.

The Description Length of Deep Learning Models

no code implementations NeurIPS 2018 Léonard Blier, Yann Ollivier

This might explain the relatively poor practical performance of variational methods in deep learning.

Approximate Temporal Difference Learning is a Gradient Descent for Reversible Policies

no code implementations2 May 2018 Yann Ollivier

In this case, approximate TD is exactly a gradient descent of the \emph{Dirichlet norm}, the norm of the difference of \emph{gradients} between the true and approximate value functions.

Can recurrent neural networks warp time?

1 code implementation ICLR 2018 Corentin Tallec, Yann Ollivier

Successful recurrent models such as long short-term memories (LSTMs) and gated recurrent units (GRUs) use ad hoc gating mechanisms.

True Asymptotic Natural Gradient Optimization

no code implementations22 Dec 2017 Yann Ollivier

We introduce a simple algorithm, True Asymptotic Natural Gradient Optimization (TANGO), that converges to a true natural gradient descent in the limit of small learning rates, without explicit Fisher matrix estimation.

Unbiasing Truncated Backpropagation Through Time

no code implementations ICLR 2018 Corentin Tallec, Yann Ollivier

Truncated BPTT keeps the computational benefits of Backpropagation Through Time (BPTT) while relieving the need for a complete backtrack through the whole data sequence at every step.

Language Modelling

Online Natural Gradient as a Kalman Filter

no code implementations1 Mar 2017 Yann Ollivier

case, we prove that the joint Kalman filter over states and parameters is a natural gradient on top of real-time recurrent learning (RTRL), a classical algorithm to train recurrent models.

Practical Riemannian Neural Networks

no code implementations25 Feb 2016 Gaétan Marceau-Caron, Yann Ollivier

We provide the first experimental results on non-synthetic datasets for the quasi-diagonal Riemannian gradient descents for neural networks introduced in [Ollivier, 2015].

Training recurrent networks online without backtracking

no code implementations28 Jul 2015 Yann Ollivier, Corentin Tallec, Guillaume Charpiat

The evolution of this search direction is partly stochastic and is constructed in such a way to provide, at every time, an unbiased random estimate of the gradient of the loss function with respect to the parameters.

Speed learning on the fly

no code implementations8 Nov 2015 Pierre-Yves Massé, Yann Ollivier

The practical performance of online stochastic gradient descent algorithms is highly dependent on the chosen step size, which must be tediously hand-tuned in many applications.

Riemannian metrics for neural networks II: recurrent networks and learning symbolic data sequences

no code implementations3 Jun 2013 Yann Ollivier

Recurrent neural networks are powerful models for sequential data, able to represent complex dependencies in the sequence that simpler models such as hidden Markov models cannot handle.

Riemannian metrics for neural networks I: feedforward networks

no code implementations4 Mar 2013 Yann Ollivier

We describe four algorithms for neural network training, each adapted to different scalability constraints.

Auto-encoders: reconstruction versus compression

no code implementations30 Mar 2014 Yann Ollivier

We discuss the similarities and differences between training an auto-encoder to minimize the reconstruction error, and training the same auto-encoder to compress the data via a generative model.

Denoising

Mixed batches and symmetric discriminators for GAN training

no code implementations ICML 2018 Thomas Lucas, Corentin Tallec, Jakob Verbeek, Yann Ollivier

We propose to feed the discriminator with mixed batches of true and fake samples, and train it to predict the ratio of true samples in the batch.

Adversarial Vulnerability of Neural Networks Increases with Input Dimension

no code implementations ICLR 2019 Carl-Johann Simon-Gabriel, Yann Ollivier, Léon Bottou, Bernhard Schölkopf, David Lopez-Paz

Over the past four years, neural networks have been proven vulnerable to adversarial images: targeted but imperceptible image perturbations lead to drastically different predictions.

The Extended Kalman Filter is a Natural Gradient Descent in Trajectory Space

no code implementations3 Jan 2019 Yann Ollivier

In principle this makes it possible to treat the underlying trajectory as the parameter of a statistical model of the observations.

White-box vs Black-box: Bayes Optimal Strategies for Membership Inference

no code implementations29 Aug 2019 Alexandre Sablayrolles, Matthijs Douze, Yann Ollivier, Cordelia Schmid, Hervé Jégou

Membership inference determines, given a sample and trained parameters of a machine learning model, whether the sample was part of the training set.

An Equivalence between Bayesian Priors and Penalties in Variational Inference

no code implementations1 Feb 2020 Pierre Wolinski, Guillaume Charpiat, Yann Ollivier

We fully characterize the regularizers that can arise according to this procedure, and provide a systematic way to compute the prior corresponding to a given penalty.

Variational Inference

Convergence of Online Adaptive and Recurrent Optimization Algorithms

no code implementations12 May 2020 Pierre-Yves Massé, Yann Ollivier

This is more data-agnostic and creates differences with respect to standard SGD theory, especially for the range of possible learning rates.

Learning Successor States and Goal-Dependent Values: A Mathematical Viewpoint

no code implementations18 Jan 2021 Léonard Blier, Corentin Tallec, Yann Ollivier

In reinforcement learning, temporal difference-based algorithms can be sample-inefficient: for instance, with sparse rewards, no learning occurs until a reward is observed.

Unbiased Methods for Multi-Goal Reinforcement Learning

no code implementations16 Jun 2021 Léonard Blier, Yann Ollivier

We introduce unbiased deep Q-learning and actor-critic algorithms that can handle such infinitely sparse rewards, and test them in toy environments.

Multi-Goal Reinforcement Learning Q-Learning +2

Learning with Random Learning Rates.

no code implementations27 Sep 2018 Léonard Blier, Pierre Wolinski, Yann Ollivier

Hyperparameter tuning is a bothersome step in the training of deep learning mod- els.

Agnostic Physics-Driven Deep Learning

no code implementations30 May 2022 Benjamin Scellier, Siddhartha Mishra, Yoshua Bengio, Yann Ollivier

This work establishes that a physical system can perform statistical learning without gradient computations, via an Agnostic Equilibrium Propagation (Aeqprop) procedure that combines energy minimization, homeostatic control, and nudging towards the correct response.

Simple Ingredients for Offline Reinforcement Learning

no code implementations19 Mar 2024 Edoardo Cetin, Andrea Tirinzoni, Matteo Pirotta, Alessandro Lazaric, Yann Ollivier, Ahmed Touati

Offline reinforcement learning algorithms have proven effective on datasets highly connected to the target downstream task.

D4RL reinforcement-learning

Cannot find the paper you are looking for? You can Submit a new open access paper.