Search Results for author: Giancarlo Kerg

Found 7 papers, 2 papers with code

Continuous-Time Meta-Learning with Forward Mode Differentiation

no code implementations ICLR 2022 Tristan Deleu, David Kanaa, Leo Feng, Giancarlo Kerg, Yoshua Bengio, Guillaume Lajoie, Pierre-Luc Bacon

Drawing inspiration from gradient-based meta-learning methods with infinitely small gradient steps, we introduce Continuous-Time Meta-Learning (COMLN), a meta-learning algorithm where adaptation follows the dynamics of a gradient vector field.

Few-Shot Image Classification Meta-Learning

Advantages of biologically-inspired adaptive neural activation in RNNs during learning

no code implementations22 Jun 2020 Victor Geadah, Giancarlo Kerg, Stefan Horoi, Guy Wolf, Guillaume Lajoie

Dynamic adaptation in single-neuron response plays a fundamental role in neural coding in biological neural networks.

Transfer Learning

Untangling tradeoffs between recurrence and self-attention in neural networks

no code implementations16 Jun 2020 Giancarlo Kerg, Bhargav Kanuparthi, Anirudh Goyal, Kyle Goyette, Yoshua Bengio, Guillaume Lajoie

Attention and self-attention mechanisms, are now central to state-of-the-art deep learning on sequential tasks.

Non-normal Recurrent Neural Network (nnRNN): learning long time dependencies while improving expressivity with transient dynamics

1 code implementation NeurIPS 2019 Giancarlo Kerg, Kyle Goyette, Maximilian Puelma Touzel, Gauthier Gidel, Eugene Vorontsov, Yoshua Bengio, Guillaume Lajoie

A recent strategy to circumvent the exploding and vanishing gradient problem in RNNs, and to allow the stable propagation of signals over long time scales, is to constrain recurrent connectivity matrices to be orthogonal or unitary.

h-detach: Modifying the LSTM Gradient Towards Better Optimization

1 code implementation ICLR 2019 Devansh Arpit, Bhargav Kanuparthi, Giancarlo Kerg, Nan Rosemary Ke, Ioannis Mitliagkas, Yoshua Bengio

This problem becomes more evident in tasks where the information needed to correctly solve them exist over long time scales, because EVGP prevents important gradient components from being back-propagated adequately over a large number of steps.

Cannot find the paper you are looking for? You can Submit a new open access paper.